Lessons learned from front line AR
In this, the post-Pokémon GO era, we’ve grown used to the idea of AR as plaything.
Niantic’s monster-collecting sensation deserves much credit for its immediate legacy. In demonstrating what AR is to the masses, it has laid the groundwork on which many other developers can build their success, free from the burden of introducing a new medium.
But while its merits demand much attention, Pokémon GO equally – and most likely unintentionally – established the convention that AR is playful, or even laughable. When its light burned brightest, Pokémon GO existed as inspiration for myriad tongue-in-cheek headlines, covering stories that bemoaned users taking huge risks to reach a monster, or unfairly promoted the notion that anything that makes gamers leave the house is something of a first.
All those Pokémon GO stories in the national press, of course, served as a lighthearted distraction from the headlines they rubbed shoulders with; troubling news of conflict, terrorism and poverty. What you are less likely to find in the national news, however, are the stories that highlight AR’s potential to better the reality behind those most dismal of news items.
Certainly, AR can be fun, but it has applications in the most serious of situations too.
AR FOR GOOD
Over at Float, Steve Richey stands as a software engineer. His employer is a specialist in building custom apps, crafting digital strategies, and delivering human-centered design. And for one of its most recent projects, it looked at the potential of taking AR to the very front lines of terrorism.
Float was commissioned by the Combating Terrorism Technical Support Office - a branch of US government tasked with funding and researching concepts that could assist counter-terror personnel – to look at the potential of AR as a tool to aid their work.
Starting with an alpha, Richey and his team began to map out an ambitious project that would ultimately generate some 200 pages of research for the client. Float envisioned a platform that could – using AR smartglasses – provide the wearer with facial recognition on the ground, translate text in the field of view, call in vehicle information as a result of the user gazing at a license plate, and provide navigational UI inside buildings. Ambitious concepts, certainly, but such is the spirit of any determined alpha.
A later beta presented the same idea as an Android app running on a smartphone, but in researching and testing the smartglasses version, the Float team learned a great deal about the methods, tools and resources available to AR developers, and – fortunately – are happy to share them here.
“One of the first problems we had to solve going into this project was asking what kind of hardware are we going to deploy,” Richey offers, introducing the technical challenge of developing for smartglasses. “What platforms do we want to target? There are quite a few.”
Richey’s team considered over 20 models of smartglasses – platforms they considered ‘true AR’ relative to mobile options – guided by the idea that supporting a commercially available, affordable option would provide a reliable, suitable platform.
Ultimately, they picked the Epson Moverio BT-200, which runs stock Android, via a home screen akin to conventional smartphones, and the Osterhout Design Group R-6, which employs a custom version of Google’s mobile operating system known as Android OS.
But what of those lessons learned?
Getting face detection right was one of the prime goals of this research project. Without traveling into the future, it would be impossible for Float to identify any human on Earth, of course, so they had to come up with a manageable way to craft a working prototype.
“Once you’ve detected a face in an image, that’s one thing, but you need to recognize it,” confirms Richey. “You need to be able to compare it to a database of known faces. About the same time that we were tackling the problem, and trying to find a good solution for it, the site how-old.net got popular. It was popular for a day on Twitter, like a lot of tech things are.”
You’ll probably remember the website from your own social media feeds; it was fairly simple in practice, but demonstrated the potential of facial analysis technology to the mainstream. You’d upload a profile shot to the website, and it would quickly predict your age. In fact, you may have shared some results when the website grossly misestimated you age, painting you as much older or younger than the reality. How-old.net wasn’t perfect, but it sure was popular.
“Underneath, it was actually using the Microsoft Project Oxford Face API, which has a bunch of face recognition comparison tasks,” Richey continues. “It’s free, basically, when you have a very low number of API calls per day, and it was exactly what we wanted. We added a small sort of server layer underneath of that, to maintain this database of faces we had trained, and run comparisons and stuff like that. But this basically served all our needs in terms of face recognition.”
A great option for AR developers early on in a project, then. But how did Richey get that text recognition to work?
EASY AS A,B,C?
Text detection in fact proved to be rather more complex than face recognition, despite their being so much diversity in human appearance. For starters, there was no established technology based on any common mobile OS that delivered Float what was needed, and commercial options were equally unsuitable. Reading text from a clean image or neatly printed page is one thing, but Float required the technology to work in the real world, perhaps from a dust covered car license plate, or in an unusual handwritten font.
For a good time Float considered the Word Lens technology, which is today part of Google Translate. The platform is easy to use and understand, having been developed for many years. WordLens uses the PhotoOCR algorithm, which Google also now owns, putting the offering far ahead of the competition in terms of accuracy and ability. WorldLens, however, is according to Richey “kept under lock and key”, so was not available for the project. Thus, Float looked at compatible options, and settled on Tesseract.
“Tesseract is an optical character recognition library managed by Google,” Richey explains. “It was started by HP, and Google kind of acquired it a few years back, and they’ve open-sourced it. The way Tesseract works is that you give it an image, it expects black text on a white background, and if it doesn’t find that, it’s going to threshold your image to generate text on a white background. It then goes through the image, and every character it thinks it sees, it’s going to try and fix and recognize.”
The problem is that Tesseract will look for letters that aren’t there in conditions Float's app was designed to operate in, much like a human seeing faces in the clouds when it is really looking at droplets of water. Or the solution will be overwhelmed by overlapping strokes in written text. There are many advantages though, as Tesseract makes use of machine learning, looping passes of the image it analyzes, applying its new understanding of the text each time.
Richey solved this by looking to Otsu thresholding; a method that helps highlight text and make it stand out, so Tesseract can effectively 'see' it more clearly. As Float was working with Android, it also adopted 'tess-two', a small open source compatibility layer available on GitHub. Finally, a ‘Stroke width transform' algorithm was applied, giving Float the level of text recognition they has targeted.
“This is an algorithm that you give an image, and it looks for regions of the image that have a continuous stroke,” says Richie. “So the character C might have this particular shape, but the width of the stroke throughout that shape is continuous. We also use libcvv, which has a bunch of algorithms for stuff like stroke width transform. We feed it an image, and it gives us back rectangles that are areas of the image it thinks contains text.”
The result? Reliable, accurate and robust text recognition, even when viewing busy images in tough environments.
LOCATION, LOCATION, LOCATION
The final challenge, then, was developing the location tracking and navigational UI to safely guide prospective users through unknown territory and to specific goals. Importantly, the solution had to work indoors – where GPS was too unreliable for the accuracy required – and position the UI at something like chest height – as the chosen smartglasses had a narrow field of view.
As such, the team looked to the sensor fusion technique, where a device merges data from its various inputs to build an accurate picture, combining the likes of Bluetooth triangulation, Wi-Fi triangulation, gravitometer data and magnetometer readings. And yet again, such a solution was still too inaccurate. So Float kept looking. Its needs were highly specific, and the effort of considering all options was warranted.
“We wanted to do something that required as little prior knowledge about a space as possible,” Richey says of a further challenge to consider.
Fortunately, and in spite of all the challenges, the depth sensing cameras and machine learning abilities of Google’s Project Tango hardware proved a fantastic solution. Project Tango, which we’ve covered here and here on this blog previously, learns more about an interior as a user moves through a space, pulling on machine learning as areas are revisited.
That approach would still need a 3D map of space for areas the user had not yet visited, but on the front lines of counter terrorism, it would be entirely possible for Float’s application to collaborate with a scoping robot or drone sent in ahead of the user to capture data.
A TESTING MEDIUM
Having employed a great deal of research and experimentation, Richey finally had a solution that would function as demanded, employing widely available tech and consumer ready hardware. And all that Float learned will be useful to those working on AR content that strives to recognize, read and guide.
And before he returns to his next project at Float, Richey has one more insight for others embracing the creative and commercial potential of working with AR.
“Testing is actually very hard for an augmented reality application,” Richey warns. “You need some sort of text you can scan, and maybe a face you can scan. When we were tired of scanning our co-workers faces, we’d have [a picture of] Robert Downy Junior at our desks, and have it there for months and months.
“This testing is hard to do, and you can test on a smartphone, but that’s not quite the same as testing of the smartglasses, which means you have to deploy to the smartglasses, hit for it all to build, put them on, do whatever you wanted to test, take them off, run the test again. Testing can get pretty complicated.”
Testing, infamously, has been reshaped by VR, where developers must find ways around common problems. The VR first timer can deliver unreliable feedback if their experience is clouded – positively or otherwise – by the novelty of the experience. Testing spaces, meanwhile, must now be physically reshaped to allow for the freedom of movement – and safety – that being enclosed in a headset demands. The list there goes on.
And as Float has learned, AR testing introduces its own set of parallel challenges, whether employing a user group or keeping it internal.
What's more, all those additional lessons, from picking the right industrial software solution to embracing new hardware, have demonstrated the potential of working on commercial and research projects. The practical, tangible demands set by the Combating Terrorism Technical Support Office gave Float a clear brief, leading them to solutions that were both pragmatic and innovative, arming the studio with an arsenal of techniques to apply to future work, whether it is serving a sneaker brand like Nike, or endeavoring to bolster mobile security.
So whether you're a game developer looking to better understand AR as a medium, or perhaps an architectural pre-viz specialist keen to empower your own technology by combining it with commercial options, looking to projects and opportunities beyond the headlines can reveal powerful options. There's plenty to learn from teams like Float, and even more to be gained from partnering with organizations like the Combating Terrorism Technical Support Office.
Pokémon GO may have got your attention – along with the rest of the world's – but in terms of opportunity, technology and learning it barely makes up the tip of the proverbial iceberg.
If the tip is that large, then it is well worth taking a peek below the water level.