Join us in Atlanta on April 10 to explore the future of a safe workforce. We’ll explore the vision, benefits, and use cases of artificial intelligence for security teams. Request an invitation here.
Apple researchers have developed a new artificial intelligence system that can understand on-screen entities as well as ambiguous references to conversational and background context, allowing for more natural interactions with the voice assistant, according to a paper published on Friday.
The system, called ReALM (Reference Parsing as Language Modeling), leverages large language models to transform the complex task of reference parsing, including understanding references to visual elements on the screen, into a pure language modeling problem. This enables ReALM to achieve significant performance improvements compared to existing methods.
“The ability to understand context, including references, is critical for conversational assistants,” a team of Apple researchers wrote. “Enabling users to issue queries about what they see on the screen is key to ensuring a truly hands-free experience with voice assistants. step.”
Enhanced conversational assistant
To handle screen-based references, a key innovation of ReALM is to reconstruct the screen using parsed on-screen entities and their positions to produce a textual representation that captures the visual layout. The researchers demonstrated that this approach, combined with a fine-tuned language model specifically for reference parsing, can outperform GPT-4 on this task.
VB event
Artificial Intelligence Impact Tour – Atlanta
request an invitation
“We demonstrate large improvements over existing systems with similar capabilities across different types of references, with our smallest model achieving an absolute gain of more than 5% over on-screen references,” the researchers wrote. “Our larger The performance of the model is far better than GPT-4.”
Practical applications and limitations
This work highlights the potential of centralized language models to handle tasks such as reference resolution in production systems where using large-scale end-to-end models is not feasible due to latency or computational constraints. By releasing this research, Apple is signaling that it will continue to invest in making Siri and other products more familiar and contextually aware.
Still, the researchers warn that relying on automatic parsing of the screen has limitations. Processing more complex visual references, such as differentiating multiple images, may require a combination of computer vision and multimodal techniques.
Apple races to close AI gap as rivals rise
Apple has quietly made significant progress in artificial intelligence research, even as it lags behind tech rivals in the race to dominate the fast-growing field.
From multimodal models that fuse vision and language, to AI-driven animation tools, to technology for building high-performance, professional AI on a budget, the company’s research labs’ continued breakthroughs demonstrate how its AI ambitions are rapidly escalating.
But the famously secretive technology giant faces fierce competition from companies such as Google, Microsoft, Amazon and OpenAI, which are actively productizing generative artificial intelligence in areas such as search, office software, and cloud services.
Apple has long been a fast follower rather than a first mover, and now faces a market where artificial intelligence is changing at breakneck speed. At its closely watched Worldwide Developers Conference in June, the company is expected to unveil a new large-scale language model framework, an “Apple GPT” chatbot, and other AI capabilities across the ecosystem.
“We’re excited to share details about our ongoing AI efforts later this year,” CEO Tim Cook hinted during a recent earnings call. Despite its characteristic opacity, it’s clear that Apple’s AI efforts are widespread.
However, the iPhone maker’s late arrival puts it in an unusually weak position as the battle for artificial intelligence supremacy heats up. Deep financial resources, brand loyalty, elite engineering and a tightly integrated product portfolio provide it with a great opportunity, but there are no guarantees in this high-stakes competition.
A new era of ubiquitous, truly intelligent computing is coming. Come June, we’ll see if Apple has done enough to ensure it has a hand in shaping it.
#Apple #researchers #develop #understand #screen #context
Discover more from Yawvirals Gurus' Zone
Subscribe to get the latest posts sent to your email.