Can AI visualize an environment from sounds? UT researchers put it to the test
AUSTIN (KXAN) -- Just how well can artificial intelligence listen?
University of Texas at Austin researchers worked to determine whether sound clips are enough for AI to know what the environment it's hearing looks like, a skill previously unique to humans.
To do so, the team sampled 100 YouTube video and audio clips from cities around the globe to initially train its model on what various environments look and sound like. From there the technology was fed 10-second, audio-only clips, and asked to produce high-resolution images of what the setting looks like.
"Our use of advanced AI techniques, supported by cutting edge large language models, in particular a soundscape to image diffusion model, demonstrates that machines have the potential to approximate this human sensory experience," UT assistant professor Dr. Yuhao Kang said.
According to Kang, the results showed a strong correlation between the proportions of sky and greenery between the generated images and the real-world environment they were asked to duplicate. They were slightly less accurate in correlating building proportions.
However, when depicting urban environments, researchers were particularly impressed with the AI's ability to accurately match the architectural styles and distances between objects to the actual image. They also often accurately reflected the weather conditions, and the time of day the audio was recorded.
The study also worked to determine how well humans could match audio to images. When given an audio clip and three images to choose from, humans were able to accurately predict the setting 80% of the time.
Dr. Kang said that the success rate was similar to the AI's rate of accurately generating an image of the environment.
Kang said from here there could be multiple potential applications.
"For instance, we may understand our soundscape, such as, how can we reduce noise," Kang said. "We can also enrich our multi-sensory experiences. For instance, when you visit a specific place in a museum or in (virtual reality), we may only see the world, but now we can also generate its soundscape."