Post-Mortem
November 28, 2011 in Articles
Post-Mortem
Introduction
DynacTiV is a ETC project that is dedicated to explore interactive TV experience, which is sponsored by Microsoft. The high concept is to use Kinect as a bridge to build connections between the audience and TV broadcasters in real time. In traditional live broadcasts, it’s really just one-way communication. Call-in, twitter, or web comments brought some instant feedback, but not as involving and intriguing and also distracts the viewer. Now, with Microsoft’s Kinect, a unique input device, there are opportunities to take interactive television to where it has never gone before. Advanced body position and motion tracking, facial and voice recognition, and intuitive gesture control are just a few of the features Team DynacTiV explored to revolutionize interactive television.
Goal
The primitive goal on the project list description was seeking to “explore turning liver media (TV) into a more interactive experience”. And as important as it is, the exploration “should be unique and something not tried before”.
Some early directions we got from our client contact Arnold Blinn was:
l Push Kinect. This is a Microsoft differentiating feature that competitors can’t replicate.
l When using companions, feel free to use them all (iPhone, Android) but don’t ignore Windows Phone or simple Laptops running Windows.
l Services should be built in C# and run on Azure, so that we have something we can take over and run with if interesting.
l Prototypes/development of the “TV viewing” experience should be built on a PC with the connected Kinect. If Microsoft likes it and wants it on the console, they will then port it.
l We want users interacting with the content, not simply controlling the pace of the content in a DVR way.
l Consider what “interactive” means for interactive TV.
l Regarding TV genres, LIVE broadcasts that is NOT SPORTS is an area of interest. Given the major U.S. elections for 2012, politics and speeches/debates/town hall meetings is one area with a lot of potential. If you had to work with this genre, what would you do?
The Team
We are a team of 5 people, 3 second-year students and 2 hired staff. Two programmers are Peter Kinney and Zach Cummings. Wai Kay Kong worked as designer and 3D artist. Jue Wang took the responsibilities for UI art. Kan Dong was the producer and participated in UI design work.
The Client
We are really excited to work in collaboration with Microsoft Corp., a worldwide software leader, with the goal of using Kinect technology to create enjoyable service in TV watching experience. Our client is interested in our idea to build a real-time relationship between broadcasters and TV viewers. For traditional broadcast shows, there is no connection with the audience they are broadcasting to, the only reaction they can capture is from the people in the venue, whereas DynacTiV tries to connect to a vastly broader scale audience, maybe millions of people watching TV at home. And with Kinect technology, it opens up the possibilities for more organic input, like gesture controls and voice commands.
“Case 1: the ideal vs. the reality “
We came up with tons of ideas in the first week’s brain storming, and coincidentally they fell into different level’s category. Our client wants us to focus on level 2 or 3 interactions.
The second week, we concluded our ideas into 4 concepts: “Augmented Politics”, “Interactive Kinect Story Teller”, “Ghost Town” and “Body Posture”. After proposing to our client, we got lots of applause on the concept of “Interactive Kinect Story Teller”, which has extremely valuable potential for a large crowd of virtual audience members to provide feedback to live performances. Arnold also liked the ideas of “Augmented Politics” and “Body Posture”, said it would be great to integrate these two to crowd source large amount of data to give the broadcast speaker valuable input about the success of their presentation. At this stage, we nailed down our high concept, and started to flesh it out with more design details in the next week.
Our primary focus shifted to the interactions directly between viewers and performers, with less focus on contact between users. By aggregating audiences’ body postures, polling opinions, and background noise levels, Kinect will determine likely levels of user interest in the program. This information can be displayed for the performer, allowing them to adjust to what gains the most positive response from the viewer.
Data to be detected:
l Video Data
Number of people in room
Amount of movement/ fidgeting
Composite of all data
l Audio Data
Background noise levels
Layer audio together to simulate “crowd”
l Other
Voting
Sub-menus
Gimmicks(throwing tomatoes)
Avatars
What went well
The actual path we travelled stuck to the primitive goal of delivering a above-level-2 interactive viewing experience that has never been explored before. We are using Kinect as our exclusive input device, and created the whole experience in the context of political events, even though our system can be absolutely applied to any other live broadcast genres.
Taking interactions between viewers and performers as the priority, we put most of our effort on digging out the most meaningful, efficient and intuitive ways to communicate. We also took Kinect’s unique features into our considerations, dedicating to the experience that competitors can’t replicate. So we came up with passive data collection in which audience don’t have to anything explicitly, and Kinect do the collecting work in the background with its skeleton tracking and array of microphone features. In addition, combined the technology of skeleton tracking and depth image processing, audience can actively interact with the broadcasts.
What went wrong
During the process of development, there were too much back-and-forth resulted into waste of time. For example, we offered the gesture of thumb up/down and throwing flowers/tomatoes at first, but had to drop down the feature of throwing, because it was too distracted and lost the meaning for giving efficient feedback.
Another part that was in our original design, but dropped down at the early developing stage is virtual audience interaction, which we added later on with the stage theme idea after halves. The reason we wanted to abandon it was we were directed to focus exclusively on the interactions between audience and performers without worrying about virtual audience interactions. However, although this is not in the scope of our project, we want to deliver a complete experience with themed art background, and simulated watching experience. So in later development, as we put the viewing environment in a 3D podium, we resumed this feature as a highlight to make audience feels more involving. In future vision, this feature can be ported with Xbox avatar, turning this simulated interactions into the virtual viewing experience that you share with your Xbox friends online.
“Case 2: playful high technology”
What went well:
Working with the Kinect was straightforward and a relatively easy process. We had a very effective wrapper that gave us access to all of the depth, video, and skeletal tracking data streams that we needed. This allowed us to easily get a large portion of the program done in a relatively short period of time. Additionally, the server was relatively pain-free. Previous networking experience and a simple set of requirements allowed us to create and configure the code needed for the server quickly, thus allowing for more time to debug and make additional modifications. Overall the project was paced well, allowing for the coding to be done early enough to allow time for stress testing and debugging. Constant play-tests following halves gave us critical user feedback and helped reveal bugs as early in development as possible. Feedback also allowed us to remove extraneous features that allowed more care to be put into developing the core features that were more closely related to the primary goals of the project.
What went wrong:
Underestimating the difficulty of accessing the Kinect’s audio feed caused a substantial amount of time to be placed into various “dead-end” areas of research and testing. In the end we eventually discovered ways to get most of the audio features we wanted implemented. Some features such as voice-recognition weren’t able to be implemented due conflicts between Mono Develop and MSDN version support. Another difficulty that was only semi-problematic for an early portion of the project was version control. The Unity server was not set up until a few weeks into the project. For the first few weeks, major asset changes and parallel lines of development resulted in multiple versions of the project existing that eventually had to be merged together. After everyone had server access this problem ceased to exist. One of the problems that existed beyond the teams control was related to testing. While attempting to do the best we could, it was virtually impossible to simulate a real-world “at home” experience in our testing environment. Additionally, hardware limitations such as being able to track only two people at a time led to further limits on test-group sizes.
“Case 3: what makes a good interface”
What went well:
The good thing about our user interface development is that we had a very smooth and complete work flow. At the beginning, since none of us had experience with user interface design and development before, we had a relatively difficult start. For the first version of UI, we just did what we thought would work and did not put too much effort on research, as a result, the first version was not good, a lot of functions did not work in the way that we want them to work. However, we figured out a good way to develop UI soon, that is research/design, and then do art/code, and then play-test, and then use the feedback we get from play-test to iterate our design again. It turns out to be a very efficient and good way to development user interface. Usually, interface is a very ambiguous thing since different people have different feeling about the interface. After every play-test, we would get feedback about what made play-testers’ feel confusing and which ways play-testers’ prefer. Based on those feedback, we filtered out what need to be changed or added. And then artists and programmers would do quick change to the interface and get ready for the next play-test. After several iterations, we got a clear user interface.
What went wrong:
Since we do not have experience with interface development before, we had a very slow start. If we could work efficiently from the beginning of the semester, we would have more time to polish.
And, another thing is about play-testers. Since we had all of our play-test at ETC, most play-testers are gamers or people have some knowledge about interactive experience. However, the audience for our product are more than these. Almost everyone watches TV shows, we have a very wide range of audience. And if we could get more naive play-testers, we would have a better user interface for naive users.
To sum up, user interface is important. At the beginning, we thought our main task in the project is to make sure the technology works, while interface is just a side. But, actually, interface is like the front end of our project. Without a good interface, we could not efficiently test whether our technology works and we could not show people how our technology works.
“Case 4: we didn’t play test at all”
What went well:
That’s an irony, we found out that play test couldn’t be more useful actually. We had 7 open play tests, and more than 100 play testers from ETC students, faculties, and visitors.
Oct 18, Alpha Test (passive)
Oct 19, Alpha Test (passive & active)
Nov 2, UI & Usability Test
Nov 6, Naive Test
Nov 9, Server & Audience Interaction Test
Nov 11, Open-loop Test
Nov 18, Beta Test (close-loop)
Dec 2, Live Debate Test
For each of these tests, we documented with demo fraps, videos and photos of play testers, surveys and inquiries. We found out that segment our whole product into parts and test them at different time helped the whole developing process. From one aspect, testing each feature separately can filter other features’ effect, and let the tester focus on commenting each well, so we got really strong and useful feedback on each of the features. In addition, this saved us time, instigated and guided our progress, because if we waited until everything finished and then do the test, it would be too late to change anything.
What went wrong:
As we figured out where the problems were after each test, it became really hard to follow the schedule we made before. Schedule changed sometimes, and many unplanned works appeared based on how strong testers feel about the drawbacks. We need time to fix the things people hate, in case they’ll still focus on that point next time and don’t give fresh feedback.
“Case 5: design in limitations”
What went well:
Since we had some prior experience working with the Kinect, we knew exactly what the specifications and limits to the Kinect Technology were. Knowing the limitations helped the design immensely. We followed the philosophy of “Small but done well” and it has worked out for us. Experience with the Unity Engine helped determined what was feasible design-wise ahead of time. After the initial 2 week brainstorming session, the project goals were clear and well within scope. Reaching the constant testing phase after halves has proven useful for a plethora of design changes. We have managed to keep Feature Creep in check by maintaining focus and we reached the Polish step on schedule.
What went wrong:
A good amount of originally planned features had to be cut as well because of either technological difficulties, from things found from testing or because similar ideas were merged. Difficulty on the Kinect Audio has lead to certain sub-features becoming cut from the Simulated Audience feature. There were several actions of the Simulated Audience that relied on Voice Recognition (booing, cheering, key words). Thankfully, we came with alternative actions the Simulated Audience would take that would not require voice recognition. The tomato/flower throwing feature was extremely well received but we found that it distracts the viewer from the broadcast. Instead of watching the broadcast, the viewer ends up spending the entire time trying to land a tomato on someone’s face. In light of this, we had to kill this baby because the goal of the project is ultimately about Television and not about scoring head-shots. Voting and Thumbs up/down were extremely similar ideas with two separate methods of input. The original “Voting” was actually more complex, involved and required a lot of gesture inputs in an attempt to relay information on how the viewers felt. In the end, we felt that it was too complicated and the broadcaster’s main pieces of information they need is level of interest and approval.
There are also some extra assets made for the project that ends up unused because of the above or because of theme inconsistency. The first Stage was designed to have an old time feel. However, we wanted to be involved in Politics, so the old time stage did not fit. A new and modern stage was created instead.
Conclusion
One of the most important things to be taken away from this project is the value of research and foresight. The ‘audio-problem’ resulted in part to a gross underestimation of the level of difficulty accessing the audio stream posed. In the future discovering problematic areas ahead of time and discovering an appropriate way to address them could help optimize efficiency. Additionally, exploring the concept of introducing new features without taking away from the core goals is crucial. Most notably, the “fun” feature of throwing things at the stage/screen was interesting, yet proved to distract from the core goal of watching a broadcast. Features added are meant to “enhance” an experience, not change the goal of the experience. Furthermore, the value of user testing was extremely evident throughout the project. Constant tests following halves allowed the UI to continuously evolve to be more intuitive and user friendly. Since the product is intended to appeal to a large audience it was imperative that all resources be explored to get as many test subjects from as broad a spectrum as possible.
For the next step, our client Microsoft will review our work and if they are interested in it, they might take over it. However, in DynacTiV future vision, people’s viewing experience has been refreshed dramatically with Kinect sitting in every one’s living room. Watching live broadcasts, and anxious to join them to show your opinions? Do a simple gesture of thumb up or down, Kinect will detect your action and send it to the broadcasters right away. Your feedback take actions immediately. From your Xbox friend club, you can choose companions to watch with you digitally and remotely, and interact with them. You are watching TV at home, but it’s really like you are watching it at the live spot!
Comments are closed.