Last week, we had our final design of implementing the crocodile, hippo and fish relationship. Given the choices for hippo to gain individual rewards by eating the fish or to gain group rewards by moving to the bitten hippo and help them get away from the crocodile, we would like to see how the hippo agents will make the decision.
During the training, we have to adjust the rewards and punishments a couple times, and for each iteration our programmer and designer have marked down some interesting results.
At first, We want to encourage the sharing rewards scenario. We increase the rewards of saving other bitten hippos to see whether hippos will help each other. However, the result is surprising. Hippos begin to swim around the crocodile and to prevent anyone from being bitten by the crocodile. The hippo shows the altruism.
Afterwards, we want to discourage abusing the helping system. Therefore, we decreased rewards for saving other hippo, increased punishment for being bitten by crocodile, increased fish reward and cut struggle time & freeze to in half. The result we had for this time is that the hippos tend to run away from crocodile. The hippos like to seek food in safe areas or hide in the corner of the island. They becomes passive for the fish and the helping mechanism rarely triggered. Our hippos become more timid.
We then want to see how hippos will behave when the rewards of eating a fish and the rewards of saving a bitten hippo is the same. Initially, we thought they will begin to show a decision making behavior, however, the hippos tend to eat fishes rather than helping other hippos because of the risk of being bitten by the crocodile while saving other hippos. It is still showing a very obvious decision over another choice. And we believe there are more elements influencing the results.
We want to encourage more food seeking, hiding, and occasional helping. Therefore we trained our agents in a new map with more variety and obstacles (rocks). We changed higher reward for saving other hippos, longer struggle time and kept punishment for being bitten and rewards for eating fish the same. The result is quite good. Hippos will run away and hide from crocodile and they are good at seeking food. They also will help each other when nearby. The only thing we are not too satisfied with is due to environment, the hippos might get stuck and keep swimming in a loop.
With the good results from last training, we want to train the RL agents again using the new models and the adjusted environment to avoid the loop issue. The result is stable for playtesting and we started to think of what will player interact in this system.
We then decided to have a sandbox game of ecosystem simulation. The player could drag and drop different animal icons and rock icon to put in the water and observe what will happen in the water. Besides observing, the elements they player puts will influence the agents in the water as well.
Below is the gameplay using the last trained system with temporary user interface, we planned to let the player choose between two different maps, one without any obstacles at the first place and one has more complex environment design. Their goal is to keep the hippos from extinction at a certain time.
We showed all the results above this week to our client, and they are exciting and happy to see the different results we had during training. These are all the important learnings in this project and we will make sure to document them into our design documentation. Our client recommended us to think of the goal for player, besides saving the hippos, is there other more indirect goal with more emotional direction in it. Taking our client’s feedback, we are thinking about the ending of this game and we are also at the stage polishing the game and be ready for next week’s soft opening!