Technical Setup
Platform Decision
Given the future-facing nature of our goals, we had to find technology to support us. Initially, we considered using the Hololens, Meta 2, and the Magic Leap. Since we were familiar with developing for the Hololens and had immediate access to it, we chose it as the platform for our initial efforts.
One of our first prototype ideas involved creating 2D geometry shapes in AR using interactive markers. While developing for this, we encountered difficulty with marker tracking, as the Hololens was not compatible with Vuforia. In addition to this, our goal for exploring different Learner-Content interactions was hard to accomplish given the Hololens’ relatively restricted gesture library. While that could be considered a helpful constraint, the user experience is also fairly frustrating on the Pinch gesture, which is definitely a problem that will be solved in the future.
Overall, it seemed like the Hololens was not a great fit; neither for our overarching goal of exploring a number of interactions, or for our prototype’s goal of marker tracking.
We were aware of these limitations, so were also doing some research on potential workarounds. A passthrough VR setup presented itself as a powerful alternative. It involves using a Virtual Reality (VR) headset in combination with a stereo camera to simulate AR on a VR headset. This instantly ameliorates a major pain point that is unavoidable with all current-gen AR headsets – a field-of-view limitation that allows you to view virtual content in only a small rectangular area in your view through the glasses. In addition, combining this setup with a Leap Motion hand controller could give us access to the much more natural gesture library of the Leap Motion SDK. This setup felt more comfortably in-line with our vision of a future AR headset, so we decided to proceed with it for our next prototype, and consequently for all future prototypes when it proved to be able to accommodate all our needs.
The second issue, that of marker / image tracking, was important to solve for a number of reasons. Image recognition can serve as a link between the virtual and physical world; either for initializing virtual objects at a fixed point in the real world, or for synchronizing the virtual space for two users (which we would need to do for our Learner-Learner / Learner-Teacher Interaction prototypes)
Marker Tracking
For our setup, we found that we could use OpenCV for Unity to implement image tracking. We first did a quick gold spike on that, which we then had to optimize. There was also another issue of the Zed camera freezing during execution. Eventually we tracked the Zed freezing issue to an incompatibility issue with the Zed SDK and the NVIDIA Graphics Drivers. After downgrading the driver version, we were able to overcome this problem.
Using the intrinsic calibration parameters of the Zed Mini’s left camera to feed into the OpenCV Marker Detection script, we were able to significantly improve the quality of tracking. However, it was running at 30 fps, which was too slow.
We then moved the marker detection to a background thread, in order to allow the main render thread to continue running at our target 90fps. This meant that tracked marker positions would come in a few frames behind the current view, but we stored the position of the tracker in world space and smoothed the data out, so that stationary and slow-moving trackers would appear to be tracking correctly.
Image Tracking
While marker tracking using arUco markers yielded great success, using image tracking using OpenCV’s image pattern detection was considerably less reliable. We found that compared to marker tracking, image tracking both required more computational resources to calculate, and was less accurate in tracking quality than the markers. However, it is possible to use image tracking in situations where the images remain stationary throughout the experience.
Locating The Camera - Room Tracking
Both the marker and image tracking was not without their limitations, however. Most notably, the trackables had to span a certain fraction of the total field of view of the camera for OpenCV to detect them. Using a conservative estimate of the maximum distance at which a 10cm arUco tracker would be detected, we constructed 50cm-wide wall markers, which we estimated would be seen at an average distance of 3.5m from the center of the room. This worked quite well when placing content at the positions of the markers.
While the positional tracking of the markers were relatively stable, we observed heavy jitter when tracking the rotation of the markers. When we first encountered this issue, we were investigating the use of the larger wall-markers to locate the camera at a room-scale. Slight changes to the marker’s rotation resulted in significant changes in the camera’s position. Attempts to smooth this data out by using multiple wall-markers yielded limited success, as the other markers contributed their own sources of jitter as well.
In the end, we decided to use multiple 10cm markers placed on the table as our source of room tracking, and then rely on the Zed’s internal positional tracking when the markers were not in view. Because the Zed’s internal tracking accumulates drift over time, however, we saw significant jumps in tracking when the camera relocated a marker after not seeing any markers for some time. This resulted in a noticeable feeling of disorientation for the player in the headset. The markers would have to be in view throughout the experience, or at least looked at in regular intervals.
After about halfway through development, we migrated our VR headset from the Oculus Rift to the Vive, which allowed us to use the outside-in tracking of the Vive lighthouses to locate the users in a room space.