Introduction
Background:
Project Mimicus is based on machine learning research proposed in the SFV research paper (also known as Learning Acrobatics By Watching Youtube). This research proposes a method to convert 2D videos of human movements into 3D character animations.
Goal:
Our goal is to answer the question “In its current state, can this technology be used to create a game?”
Deliverables:
We have two deliverables.
1) Our main deliverable is a documentation of our learning process.
2) Our second deliverable is an experience we create using this learning.
Team Members:
Our team consists of 5 members (4 second-year ETC students and 1 first-year ETC student) as follows:
1. Alexandra Gobeler (Role: Rigging artist and Game designer)
2. Haran Kim (Role: 3D artist and Game designer)
3. Siyu Ren (Role: Gameplay Programmer)
4. Himanshu Telkikar (Role: Producer and Game designer)
5. Yutong Eliza Zhang (Role: Machine Learning Programmer)
Client:
Our client is Google, and our point of contact at Google is Erin Hoffman-John.
What we have built:
1) We put together a custom machine learning pipeline, required for a game-like experience, by using some of the projects created by the authors of the paper. We received guidance from the authors and Google to make sure we have everything we need to get the desired output from the pipeline.
2) After getting this output from the pipeline, we have created a way to use this output, retarget it to a character, and generate animations in Unreal Engine at runtime.
3) We have documented our learnings throughout this process, along with limitations of the technology, and possible ways to overcome these limitations.
4) Our final playable is an interactive experience which is divided into two parts.
a. In the first part, the players record videos of human movements on their mobile devices and upload them to Google Drive. After this, the machine learning pipeline runs and the animation data is eventually brought into Unreal engine.
b. In the second part, the players get to watch their animations inside the playable demo.
What went well
1) We created a custom ML pipeline to fit a game animation pipeline
2) We applied a motion smoothing technique which removed noise and improved the quality of resulting animations
3) We created a method to take output animation data from machine learning pipeline, retarget it to our character and generate animations at runtime
4) We figured out technical and design limitations early on and created game design ideas to overcome those limitations
5) We had success solving a big problem – player motivation to upload videos
6) We created a pipeline which uses Google Cloud Virtual Machine to drastically speed up computations and achieve an acceptable processing time (1 day for a video to 20 minutes for 4 videos)
What could have been better
1) A major chunk of the semester was used up to create the machine learning pipeline, which left very little time for building an application which uses this pipeline. It could have been handled by predicting this early and setting expectations accordingly
2) The machine learning output – animation data – is not high fidelity animation data. The output is not good for nimble, fast movements. A completely different way to input videos, with large acrobatic movements could have helped (eg. YouTube)
3) We initially tried retargeting animations to a character with a different rig (than the original one). This is an extremely difficult problem to solve. We eventually rigged the character to match the original one, but, we should have done this in the beginning itself. This would have saved a lot of time, while also yielding better animation quality
4) As we had very little time to create the final interactive experience, we had to cut down on some features which were proposed originally
5) Since we started building our final experience in the beginning of week 10, we couldn’t playtest early and iterate enough on the final interactive experience
6) While we cut down on processing time by using Google Cloud Virtual Machine, this meant we could not build in an easy and intuitive user experience flow. While this is okay for target audience – our client, it is not very suitable for other end-users (who are not programmers)
Lessons learned
1) This project required programming experience in Machine Learning and having just one machine learning engineer made it difficult. The lesson learned is that either have all your engineers learn the basics of machine learning early on, or free the expert from all other tasks and let them focus on machine learning tasks from day one.
2) Picking up a game engine like Unreal from scratch is difficult. An engineer should be dedicated for this task on day one and they should focus all their initial time on learning to use the game engine.
3) We decided to split the team into two sub-teams: programming and design, in order to increase efficiency for both these tracks. This helped a lot, but, required very strong communication from the design team in order to create the same mental image and have the programming team on the same page. When used, techniques like storyboarding and one page design documents, drastically helped in improving communication
4) For a project with a completely new technology, it is extremely useful to test the technology early. This should be considered as top priority in projects like this one.
5) Designers on the team had to predict in advance how the technology would work (by watching references to existing research). This led to many ideas which were infeasible in reality. The lesson learned is that designers should be prepared to see a lot of features discarded in the future.
6) We used humor to cover up for low fidelity animations. This worked surprisingly well.
Conclusion
1) The final documentation will be handed over to the client as a live Google document (Google Doc) which contains links to videos and other relevant media.
2) The executable of the final interactive experience will be handed over to the client along with instructions on how to complete the required setup. We will not be handing over any source code.
3) The client will use the documentation and interactive experience internally as learning tools, if and when they decide to build their own projects using this machine learning technology
Our goal was to answer the question, “In its current state, can this technology be used to create a game?”. This technology cannot produce high fidelity animations for nimble movements (it only works well with really large, acrobatic movements). It might be a good fit to create low-budget games, independent games, or for personal projects trying to use a novel input mechanism. However it is not suitable to use for large-scale AAA games at the moment. One of the best things about the technology, though, is accessibility. Anyone with a smartphone or camera can record videos of human movements and generate 3D character animations.