Post Mortem

AI Playtesting was a student pitched project with the vision of using machine learning to train an AI agent that could help with playtesting for combat card games. The game we chose to base our AI on was Slay the Spire. The idea was to implement this method for one game so that we can understand the techniques and methods that go into creating an AI based playtesting agent. Currently, our application supports several stages of the design iteration and playtesting process. Our insights and learnings from the project can be divided into the following parts:

Reinforcement Learning Agent

This was the experimental part of our project since reinforcement learning is still an area of active research. We had to test a bunch of different techniques to see what worked. The problem that we were trying to solve was complex because there are a lot of different cards and game mechanics that work together. The sequence of playing cards makes a big impact on how much damage is done in a turn. The agent not only needs to learn the intricacies and strategies of the boss, it also needs to learn which cards to play and the correct sequence of playing them.

Since OpenAI gym is a popular library used to test reinforcement learning algorithms, we decided to create an environment with a similar API. This allowed the AI script to treat the entire game as a representation of states, next states, rewards, done signals, and some additional information.

The state space for the reinforcement learning agent was a collection of all the important values that represent the game state. This included player’s health, player’s block, boss’ health, boss’ block, active buffs on the player, active buffs on the boss, player’s energy, cards in hand, cards in hand, cards in draw pile, cards in discard pile, boss intent and remaining damage for boss’ phase change. Taking in all of this information, the AI learned to predict which card to play.

When we started, our approach was naive.  We directly used the Q-Learning algorithm and hoped everything would work out, but it did not. The agent was not learning the q-values correctly and we faced a weird problem wherein the q-values would spiral out to become very large (10^10).

The next strategy we tried involved creating three separate models each with a partial state representations. Each of the model was supposed to learn different things and we would take the weighted average of them all towards the end. In hindsight, this was a terrible idea because it broke the Markov Assumption which serves as the foundation for Q-Learning. Because of the partial state representation, all information to make the current decision is not available to the AI.

 Since our main problem was that the AI was not able to learn the q-values correctly, we moved on to trying to use a customized reward function. Again, this was a bad idea but it seemed good at the time because nothing else was working. The idea was to calculate how much damage each card was doing and then reward the AI based on that. The clear limitation of this approach is that the AI never went beyond the strategy that we taught it using this custom reward function. This made the AI extremely dependent on the arbitrary rewards we decided to give it. As a result, it could not work for new cards or new game mechanics.  

These limitations led us to the generalization phase of the AI which involved working towards replacing the custom reward function with something more general. We decided to use the Monte Carlo return method to calculate the reward that should be attributed to each card. Here, we waited for the game to get over, looked at the cards that were played during the course of the game and then rewarded each card after discounting it (cards played towards the end of the game were discounted less and cards played towards the start of the game were discounted more). We configured the neural network to predict these values instead of q-values. Looking back, starting with this approach would have saved a lot of time.

Application Development to Package the AI

Our application had a bunch of different sub-systems that performed the following functions:

  1. GUI for a human player to the game
  2. An interface for AI to play the game
  3. Automation of the AI training and testing process
  4. Visualization of the data generated by the AI
  5. Easy to use interface for designers to modify aspects of the game

Our architecture divided the application into 6 modules:

  • Gameplay Logic : We implemented this in python for ease of interfacing with tensorflow (machine learning library in python used by the AI)
  • AI : Uses python and tenforlow
  • Player GUI : Used Unity for rendering the graphics and collecting user input.
  • Designer GUI : This is the frontend of our application built using Electron (nodejs) which acted as an entry point for our application.
  • Database : We created a simple database module to act as a manager for all the game data as well as the data generated by the AI during playtesting.
  • Data Visualization : This module was used to visualize the data from playtesting in order to deliver large quantities of data to the designer in a easy to consume format.

One of our biggest challenges was that we wanted to package the AI in a single application and deliver it to game designers. This was challenging because the different parts of our application were all programmed using a variety of different technologies.

The most important aspects of our product is the underlying game itself. We had to create our own version of Slay the Spire in Python so that the AI can easily can easily interact with it. Apart from the AI, a player should also be able to play the same game. Hence our core gameplay code was decoupled from everything else such that the source of inputs to it did not matter. When a game is developed in a game engine like Unity or Unreal, all the code (gameplay, graphics, physics, VFX, etc.) is a part of the game engine. However for us, because the game needed to run both with and without graphics, we decided to decouple the game programming logic from the graphics rendering engine (Unity in our case).

Conclusion

Looking back we are really happy about how the project turned out. During the initial weeks of the project we were very concerned about whether the AI would ever work or not. We were even preparing about potential directions we could head into if the AI did not end up working. Thankfully, it worked very well by the end.

We also got a chance to speak to MegaCrit (studio that developed Slay the Spire). It validated our idea and gave us confidence that the foundation of our project was strong. Going forward we think that AI is the future technology and it has a lot of potential to make a big impact in how games are developed.