Tech | Visa | Scholarship/School | Info Place

The biggest AI announcements from Google I/O

Google is going all-in on artificial intelligence—and it wants you to know it. During its keynote address at its I/O developer conference on Tuesday, Google mentioned “AI” more than 120 times. so many!

But not all of Google’s AI announcements are significant in their own right. Some are gradual. Others were revisited. So, to help you sort through the essentials, we’ve rounded up the top new AI products and features announced at Google I/O 2024.

Google plans to use generative artificial intelligence to organize the entire Google search results page.

What would an AI organization’s page look like? Well, it depends on the search query. But Google said they may show AI-generated summaries of comments, discussions on social media sites like Reddit and AI-generated lists of suggestions.

Currently, Google plans to display AI-enhanced results pages when it detects that users are looking for inspiration (for example, when they are planning a trip). Soon, it will also show results when users search for dining options and recipes, including results for movies, books, hotels, e-commerce, and more.

Project Astra and Gemini Live

Image Source: google/google

Google is improving its artificial intelligence chatbot Gemini so that it can better understand the world around it.

The company previewed a new experience in Gemini called Gemini Live, which allows users to have “deep” voice chats with Gemini on their smartphones. Users can interrupt Gemini while the chatbot is speaking to ask clarifying questions, and it will adapt to their speech patterns in real time. Gemini can see and react to a user’s surroundings through photos or videos captured by a smartphone’s camera.

Gemini Live, which won’t be available until later this year, can answer questions about things within (or closest to) the smartphone camera’s field of view, such as what neighborhood a user might be in or the name of a part on a damaged bicycle. The technological innovation driving Live stems in part from Project Astra, a new initiative within DeepMind that aims to create AI-driven applications and “agents” that enable real-time, multimodal understanding.

Google Vio

Image Source: Google

Google is targeting OpenAI’s Sora with Veo, an artificial intelligence model that can create roughly a minute-long 1080p video clips based on text prompts.

Veo can capture different visual and cinematic styles, including landscape and time-lapse footage, and edit and tweak the resulting footage. The model understands camera movement and visual effects very well based on cues (think descriptors like “pan,” “zoom,” and “explode”). Veo has some understanding of physics, such as fluid dynamics and gravity, which contributes to the realism of the videos it generates.

Veo also supports mask editing that makes changes to specific areas of the video, and can generate videos from still images, similar to generative models such as Stability AI’s stabilized videos. Perhaps most interestingly, Veo can generate longer videos—videos longer than a minute—given a series of prompts that come together to tell a story.

Ask for photos

Image Source: TechCrunch

Google Photos is infusing AI with the launch of an experimental feature called “Ask Photos,” powered by Google’s Gemini family of generative AI models.

Ask Photos, launching later this summer, will allow users to search their Google Photos collection using natural language queries that leverage Gemini’s understanding of photo content and other metadata.

For example, users will be able to perform broader and more complex searches, such as finding “best photos of every national park I’ve visited,” rather than searching for something specific within a photo, such as “One World Trade.” ” In this example, Gemini will use signals including lighting, blur, and lack of background distortion to determine what makes a photo the “best” photo in a given collection, and combine this with geolocation information and dates. Comprehension combines to return relevant images.

Gemini in Gmail

Image Source: TechCrunch

Gmail users will soon be able to search, summarize and draft emails (powered by Gemini), as well as take action on emails to perform more complex tasks, such as helping process returns.

In a demo at I/O, Google showed how parents who want to know what’s going on at their child’s school can ask Gemini to summarize all recent emails sent by the school. In addition to the email body, Gemini will analyze attachments such as PDFs and give you a summary with key points and action items.

From Gmail’s sidebar, users can ask Gemini to help them organize receipts in their emails, even put them into a Google Drive folder, or extract information from receipts and paste them into a spreadsheet. If this is something you do regularly (for example, tracking expenses as a business traveler), Gemini can also provide automated workflows for future use.

Detect fraud during calls

Google previewed an artificial intelligence feature that alerts users to potential scams during phone calls.

The feature will be built into future versions of Android using Gemini Nano, the smallest version of Google’s generative AI product, and can run entirely on the device to listen in real time for “conversation patterns commonly associated with scams.”

The feature has yet to set a specific release date. Like many other things, Google is previewing how much the Gemini Nano will be able to do at some point in the future. However, we do know that the feature will be opt-in – and that’s a good thing. While using the Nano means the system doesn’t automatically upload audio to the cloud, the system is still effectively listening in on the user’s conversations – a potential privacy risk.

Artificial Intelligence Assisted Accessibility

Image Source: Google

Google is enhancing Android’s TalkBack accessibility feature with some generative AI magic.

Soon, TalkBack will use Gemini Nano to create auditory descriptions of objects for low-vision and blind users. For example, TalkBack might call a dress “a close-up of a black and white plaid dress. The dress is short, has a collar and long sleeves. It is tied with a large bow at the waist.”

According to Google, TalkBack users encounter approximately 90 or so untagged images every day. Using Nano, the system will be able to provide insights into the content—possibly without someone having to manually enter that information.

Read more about Google I/O 2024 at TechCrunch

#biggest #announcements #Google

Leave a Reply

Your email address will not be published. Required fields are marked *