Will Augmented Reality Present A Moderation Problem?

24.07.2024 01:44

Techdirt

Augmented reality is on its way. The ability to layer media over top of the real world is, for the moment, a hardware problem. Our phones can do it – think Pokémon Go, or Snapchat’s dancing hotdog – but a real-time heads-up display remains out of reach, for the moment. When such technology arrives, it will be a speech problem.

True augmented reality will be social and locative. Users will view media published and placed by other users, and unlike a photo-bound dancing hotdog or dog’s tongue-and-ears mapped to your face, this media will be tied to particular real-world locations. Users’ AR content may be visible to anyone who glaces at a particular wall, or only apparent to a select few.

For thousands for years, humanity has sought an escape from the detached scribal hunch that reading, writing, and computing have thus far required. Neck pain and blundering around the world with our faces in our phones have kept the dream of heads-up computing alive through multiple premature hardware hype-cycles.

This pent-up, unmet demand was on full display at the launch of Apple’s Vision Pro – a virtual reality device with effective-enough pass-through to mimic augmented reality, until users moved around. Nevertheless, VisionPro users did their best to compute while walking, driving, and riding the subway – only to see their apps slide away from them, left behind as they moved (you can, at least, carry your apps with you). Augmented reality, tethered to the user and their immediate surroundings, will satisfy this want.

We have also already grown accustomed to wearing sunglasses and corrective lenses, and seeing them worn by others. As a result, when they arrive, or arrive at an accessible price point, augmented reality glasses are likely to be adopted by most consumers very quickly.

No matter how it is used, social, locative, ubiquitous augmented reality will turbocharge anxieties about online speech by bringing it further into, or onto, the real physical world. If existing uses of speech are any guide, it will be used to exalt, express, opine, teach, trade, harass, titillate, blaspheme, and defame. Some of these uses will present problems, both for budding AR platforms and their users.

When there are controversies generated by AR’s rapid adoption and sheer imposition upon the physical world, features of the AR medium will make classic misuses of speech much more difficult for moderators to govern.

Locative AR media is unlikely to be text-first. Text was developed for flat surfaces – floating text is either difficult to read – let alone interact with as one might on a desktop computer – or obscures the surrounding environment, replicating the problem of walking while holding a book in front of your face. It’s hard to compose an email on the side of a building. This isn’t to say that text won’t be used to communicate in AR – but that messages will be short, overlaid on flat surfaces, and usually presented alongside other media. Phone AR apps, such as Mirage, which allows users to leave multimedia collages for others to view around New York City, seem to confirm this intuition. Confusingly, an earlier, different app called Mirage World pioneered the concept in 2017.

Thus, if platforms fully embrace the natural expressive and possibilities of this medium, moderators will not be dealing with easily machine readable text so much as subjectively interpretable public art. This might be an acceptable issue for a hyperlocal startup, but as with all content moderation problems, the long tail of novel misuse grows large at scale.

The context, meaning, and offensive potential of AR speech will turn in large part on where it appears. What is it layered over? “Who can place what speech where?” will soon matter a lot to already-large platforms moving into the AR space. Whether geospatial AR advertising in Google Maps or sharing locative multimedia with friends via Snapchat Spectacles, limiting the “who” to trusted creators and state tourism boards is a beta test, but eventually these products will launch to the full, diverse public. Even if hardware costs provide another initial gate to adoption, it won’t last forever, and as we have seen with the internet at large and within gaming subcultures, the shift from hobbyist to mass medium can itself be culturally tumultuous.

Their varied uses of the ability to layer speech over the real world are likely to push the boundaries of platform comfort and constitutional speech protections. Who can publish what on a church, a gay bar, an embassy, or their ex-husband’s home. Visible to whom? Although the speech will be virtual, platform answers to these questions may provoke the ire of the owners of physical properties to which AR content is mapped or tethered.

Even how some media or animation appears may matter. Higher and lower resolution versions of the same content will coexist as AR hardware evolves. Moderators won’t just find themselves having to answer questions like “are the AR arms sprouting from the shrub hugging or groping?”, but even “do they look unseemly at a lower frame rate?”

None of this will be easy to solve, but good solutions will provide strong reasons to prefer one platform’s overlay over another’s. User control over what they see in the world around them will be paramount. No one wants to put on a pair of glasses to be assaulted by ads and horrors. Some public/private distinction will also likely work well. The potential controversy, harm, and sheer competition for space that attends speech visible to all users of a given platform will likely spur platforms towards subsidiary layers of visibility – public, followers, close friends, etc. – but the prospect of personal augmented realities raises all the democratic anxieties of individualized algorithmic feeds. Nevertheless, for the best solution, ownership of a new layer of spatial, social computing is prize.

Will Duffield is an adjunct scholar in the Cato Institute’s Center for Representative Government