Apple researchers develop AI that can ‘see’ and understand screen context

Join us in Atlanta on April 10 to explore the future of a safe workforce. We’ll explore the vision, benefits, and use cases of artificial intelligence for security teams. Request an invitation here.


Apple researchers have developed a new artificial intelligence system that can understand on-screen entities as well as ambiguous references to conversational and background context, allowing for more natural interactions with the voice assistant, according to a paper published on Friday.

The system, called ReALM (Reference Parsing as Language Modeling), leverages large language models to transform the complex task of reference parsing, including understanding references to visual elements on the screen, into a pure language modeling problem. This enables ReALM to achieve significant performance improvements compared to existing methods.

“The ability to understand context, including references, is critical for conversational assistants,” a team of Apple researchers wrote. “Enabling users to issue queries about what they see on the screen is key to ensuring a truly hands-free experience with voice assistants. step.”

Enhanced conversational assistant

To handle screen-based references, a key innovation of ReALM is to reconstruct the screen using parsed on-screen entities and their positions to produce a textual representation that captures the visual layout. The researchers demonstrated that this approach, combined with a fine-tuned language model specifically for reference parsing, can outperform GPT-4 on this task.

VB event

Artificial Intelligence Impact Tour – Atlanta

Continuing our tour, we will head to Atlanta for the AI ​​Impact Tour stop on April 10th. This exclusive, invitation-only event in partnership with Microsoft will discuss how generative AI is transforming the security workforce. Space is limited, please request an invitation now.

request an invitation

Apple’s AI system, ReALM, can understand references to on-screen entities, such as the “260 Samples on Sale” list shown in this model, allowing for more natural interactions with voice assistants. (Image source: arxiv.org)

“We demonstrate large improvements over existing systems with similar capabilities across different types of references, with our smallest model achieving an absolute gain of more than 5% over on-screen references,” the researchers wrote. “Our larger The performance of the model is far better than GPT-4.”

Practical applications and limitations

This work highlights the potential of centralized language models to handle tasks such as reference resolution in production systems where using large-scale end-to-end models is not feasible due to latency or computational constraints. By releasing this research, Apple is signaling that it will continue to invest in making Siri and other products more familiar and contextually aware.

Still, the researchers warn that relying on automatic parsing of the screen has limitations. Processing more complex visual references, such as differentiating multiple images, may require a combination of computer vision and multimodal techniques.

Apple races to close AI gap as rivals rise

Apple has quietly made significant progress in artificial intelligence research, even as it lags behind tech rivals in the race to dominate the fast-growing field.

From multimodal models that fuse vision and language, to AI-driven animation tools, to technology for building high-performance, professional AI on a budget, the company’s research labs’ continued breakthroughs demonstrate how its AI ambitions are rapidly escalating.

But the famously secretive technology giant faces fierce competition from companies such as Google, Microsoft, Amazon and OpenAI, which are actively productizing generative artificial intelligence in areas such as search, office software, and cloud services.

Apple has long been a fast follower rather than a first mover, and now faces a market where artificial intelligence is changing at breakneck speed. At its closely watched Worldwide Developers Conference in June, the company is expected to unveil a new large-scale language model framework, an “Apple GPT” chatbot, and other AI capabilities across the ecosystem.

“We’re excited to share details about our ongoing AI efforts later this year,” CEO Tim Cook hinted during a recent earnings call. Despite its characteristic opacity, it’s clear that Apple’s AI efforts are widespread.

However, the iPhone maker’s late arrival puts it in an unusually weak position as the battle for artificial intelligence supremacy heats up. Deep financial resources, brand loyalty, elite engineering and a tightly integrated product portfolio provide it with a great opportunity, but there are no guarantees in this high-stakes competition.

A new era of ubiquitous, truly intelligent computing is coming. Come June, we’ll see if Apple has done enough to ensure it has a hand in shaping it.

#Apple #researchers #develop #understand #screen #context


Discover more from Yawvirals Gurus' Zone

Subscribe to get the latest posts sent to your email.

Leave a Comment

Index

Discover more from Yawvirals Gurus' Zone

Subscribe now to keep reading and get access to the full archive.

Continue reading