Project Journal

Fireboy and Watergirl AI

Watch the video to see where the AI is currently at and/or read the build log to see my progress.

Build Summary

Framework

Gymnasium + PPO

Focus

Multi-character cooperation, reward design, environment instrumentation, and parallel experimentation.

Video Demonstration

Current project demo

Build Log

Day 1

I set out to create the Fireboy & Watergirl AI. Since it was a Flash game, I was able to find the game's .swf files and get it running through Ruffle. For the model, I used Stable-Baselines3 with PPO on top of the OpenAI Gymnasium framework.

I spent the rest of the day setting up the automation pipeline and the model itself. The initial rewards were based on gem proximity, gem collection, door proximity, and level completion, while the penalties were based on time spent and death.

It was able to collect gems in under 50 episodes. However, it only seemed to consistently collect the red gem and not the blue gem.

Day 2

I let the AI run overnight, and there were several cases where it spent more than 20 minutes without going for any gems. That made me think the time penalty was too small and the gem reward was too weak. However, none of the reward tweaks seemed to work.

I decided I needed a way to test multiple models with different variables in parallel, so I built run_parallel_training.py. I also used AutoHotkey so multiple Ruffle windows could stay open and receive their own keyboard and mouse inputs without interfering with one another.

I also added an exploration feature that rewarded the AI for visiting new x-y coordinates during a run. It still did not work reliably.

I thought the model might benefit from recognizing more environmental features like buttons, hazards, and barriers, so I added support for those and spent time debugging the OpenCV pipeline and building a visual overlay to make sure the detections were correct.

Day 3

I let the AI run overnight, and it turned out that all of the models only collected the red gem while Watergirl kept falling behind. I decided that I needed to add more targeted rewards and penalties. Here are some of the changes I made:

Made the time penalty become exponential after one minute
Added both a proximity reward and a direct reward for Watergirl entering water
Added alternating gem rewards so Watergirl would be rewarded much more strongly for moving toward her gem once Fireboy had already collected his

None of those changes seemed to solve the problem. Fireboy kept getting his gem, but Watergirl was still behind, which led me to create a setup where Fireboy stopped moving and only Watergirl moved. That was when I started noticing strange behaviors, like Watergirl repeatedly jumping in place and moving back and forth.

To better understand this I added a reward log to the visual overlay to debug exactly which rewards and penalties were being triggered in real time. It turned out that Watergirl was not being tracked consistently, and by jumping up and down or moving back and forth she could briefly become null, which caused the system to reset and accidentally reward that behavior.

I fixed that problem and decided to run another parallelized setup overnight to see how it performed the next day.

Day 4

It still did not work, so I tried a number of different approaches. One of them was scattershot tuning, where I would give a model a low and high range for a variable and run ten instances of it for 35 episodes each to see which settings performed best. Eventually, I realized that instead of manually running multiple scattershots, I could build a script called recursive_reward_search.py that automatically tweaked variables across multiple rounds and changed what it focused on each time in order to search for the best setup. However, neither of those approaches solved the core issue.

I ended up watching replays through replay_episode.py and realized that the movement itself was off. Whenever Watergirl attempted a diagonal jump, she would begin the jump but then fall straight down or drift the other way in the middle. This was happening because the model was sending inputs so quickly that it struggled to commit to a risky jump long enough to learn how to clear the fire hazard properly.

Because of that, I changed the diagonal jump logic so the input would be held for 1.5 seconds, and that was the breakthrough. Once it finally started working, I left a recursive search script running overnight to try to find the best-performing model.

Day 5

The recursive search finally started producing models that could collect both gems without stalling. That was a huge step forward. However, it still could not do so reliably.

At first, I thought the issue was simply that the models had not trained long enough, so I tried training different ones for anywhere from 100 to 1,000 episodes. None of them ever became consistently reliable.

That was the point where I realized I would probably need imitation learning to make the behavior stable. I created review_good_episodes.py so I could review successful episodes and select them to be used for training.

Day 6

There ended up being a large number of episodes, so I made a script that sorted episodes with a certain number of gems or that progressed to a certain area into an archive for review. Training a model with this worked, but not at the pace I wanted.

I decided to switch things up. Previously, the model was based only on positional information for things like the characters and level objects, so I decided to make different versions. I created a vision-based version that applied grayscale to everything except important parts that needed color distinctions, like gems and hazards, and a hybrid version that combined both vision and positional information.

Day 7

I left them training overnight, but there didn't seem to be much of a difference. Even though I didn't want to use human data in the project, I thought it might still be useful to see whether the model could actually reach the end. That led me to record human data and train the model on it.

Surprisingly, it did not perform better at all. It became too afraid of dying from the green hazard above it, so instead of going up, it tried to replicate what I had done on the upper levels while staying on the lower level.

This taught me that I might need to split the problem into segments: train a model until it gets two gems, use imitation learning on that progress, train it until it clears the green hazard, and then repeat. That led me to create a way to train one model across parallel instances so it could gather data faster with train_parallel_model.py. In the meantime, I also thought training might benefit from running faster, so I looked through Ruffle, found a way to speed up the game, and applied it.

Day 8

Speeding up the game had ripple effects on the OpenCV pipeline, which struggled to keep up. I spent some time trying to fix that before realizing it would be difficult to visually track the characters accurately at that pace. That led me to look for a way into the source code, and I eventually found that with JPEXS decompiler I could get the code inside the .swf file.

After using that, I implemented actual position-based tracking. That introduced a couple of bugs in the overlay, which I worked around. Once I realized we had access to the source code, I decided to choose an easier level so I could test whether the models were actually capable of completing any level at all.

I picked another level, and the model was able to breeze through it, collect all the gems, and reach the door. The visual and hybrid models seemed to get to that point faster, although the position-based model could still do it as well. However, the characters were having trouble staying still at the door, which was required to actually complete the level.