Can AI revolutionize automated game testing?

Artificial Intelligence has progressed tremendously to the point we can automate entire categories of tasks. The idea of AI playing a game and automatically finding defects is not new. But is it possible to automate game testing with AI today?

What do we mean by Artificial Intelligence?

Artificial Intelligence is a very large field that encompasses a wide variety of purposes. So before anything else, we need to define a few terms.

Machine Learning (ML)

Machine learning is a set of methods and algorithms that improves automatically through training. The training can be supervised or unsupervised. It is the branch of AI that grew the most in the past decade.

Supervised learning

Supervised machine learning is done via the use of training data. Humans label data and feed it to the ML algorithm. The algorithm can then extrapolate on data it hasn't seen before. For example the data could be a set of labelled pictures. Feed these to the algorithm and then it will be able to identify which has or hasn't cats in it.

Reinforcement learning

Reinforcement machine learning isn't trained using data but trial and error instead. Successful attempts reinforce the algorithm into a particular direction. It is up to the humans designing the learning method to decide what counts as a success.

Artificial General Intelligence (AGI)

AGI is the ability of a machine (or set of machines) to learn and reason like humans do. It is the holy grail of AI since it would be hard to distinguish AGI from a human. It is hypothetical and doesn't exist yet.

As you can suspect ML is very far from being AGI. In the rest of this article we use AI and ML interchangeably to mean the same thing.

What can AI do today?

While we're still far from general intelligence, current algorithms can already achieve interesting results. For example here is a video showing an artificial neural network learning to play Mario:

The video explains the use of Neural Networks and Genetic Algorithms to train an AI against a level of Mario. The AI itself is trained to complete the level, and doesn't explore it. But nothing prevents us from changing the reward mechanism to serve this purpose instead.

More recently OpenAI demonstrated how competing agents can explore, learn, and exploit a given playground:

The video demonstrate how agents can progressively learn what is possible in a given game of hide and seek, and ultimately lead to bug exploitation.

In the context of game testing, AI seems to be well suited for exploring the limits of gameplay. Here are example scenarios in which it can be useful:

  • Checking if uncommon input combination breaks the game (Monkey testing)
  • Checking if existing bugs can be exploited by players
  • Checking if different factions of a game are balanced
  • Checking if the game can sustain long play sessions

The limits of AI

While the current state of AI could in theory expose bugs in a game, there are still critical elements it can't overcome.

For example, imagine you're building an FPS game with multiple weapons, but one of them is broken. A human player will likely see the weapon doesn't do any damage, but an AI cannot. For the algorithm the broken weapon is simply weak, and thus it will stop using it.

The notion of a bug is always relative to the game design. For example, while platformers usually don't allow players to go through walls, it can be part of a secret room, or even an entire game mechanic. How in this context an AI would be able to distinguish between these cases? It would have to understand the design of the game.

This leads to another point: even if an AI can be used to identify bugs, reporting them is another story. With the above example, let's assume an AI identified the broken weapon as a bug. How does the AI reports it? We could imagine the AI recording a brief video showing the bug, but it might not be visually significant. A textual explanation would be better.

One of the valued skills of QA testers is to produce good bug reports. And their communication skills is paramount to fulfill this goal. This is something an AI would struggle to achieve.

Above all that, what an AI can't do is create tests scenario validating a design. This is why authored tests are valuable. If we take the weapon example once again, a QA engineer would write a specific scenario to validate that the weapon is working. A manual tester, or an automated test case can then ensure this specific test passes. This level of specificity is currently out of reach for AI algorithms, and would require extensive human intervention.

The cost of AI

Often, AI is envisioned as a black box you can simply turn on to automate a process. The traditional chat bot comes to mind. But in the context of automated game testing, off-the-shelf solutions would likely be inefficient:

  • Every game is different
  • Identifying bugs and communicating about them is very specific
  • AI algorithms require dedicated people and infrastructure

This last point raises a very important issue: AI is not free. Unsupervised machine learning requires millions of simulated runs (as mentioned in the videos). Moreover, any major change in the game might require to re-train the AI.

So with that in mind, it is important to ponder the cost between an AI solution and more traditional QA processes.


As we saw, the current state of AI can produce interesting results. However, game testing requires extreme specificity. This creates a gap between the current state of the art, and what would be required for AI automation to be useful.

It is nonetheless a space to keep an eye on. Middle-ground solutions rather than full automation could prove beneficial for QA engineers.

This article was updated on May 14, 2021