Tech | Visa | Scholarship/School | Info Place

Google Gemini Pro 1.5 enters public preview on Vertex AI

Gemini 1.5 Pro, Google’s most powerful generative AI model, is now available in public preview on Vertex AI, Google’s AI development platform for enterprises. The company announced the news this week at its annual Cloud Next conference in Las Vegas.

Gemini 1.5 Pro was launched in February and joins Google’s Gemini family of generative AI models. Its main feature is undoubtedly the amount of context it can handle: from 128,000 tokens to up to 1 million tokens, where “tokens” refer to subdivided bits of the original data (e.g. the syllable “fan”, “tas” and “tic”) using the word “awesome”).

One million tokens is equivalent to approximately 700,000 words or approximately 30,000 lines of code. It is approximately four times the amount of data that Anthropic’s flagship model Claude 3 can take as input, and approximately eight times the amount of OpenAI’s GPT-4 Turbo max context.

A model’s context or context window refers to the initial set of data (such as text) that the model considers before generating output (such as additional text). A simple question—”Who won the 2020 U.S. presidential election?”—can serve as background, just like a movie script, email, essay, or e-book.

Models with smaller context windows tend to “forget” the content of recent conversations, causing them to stray off topic. This is not necessarily true for models with large contexts. And, as an added benefit, large-context models can better grasp the narrative flow of the data they receive, generate more contextually rich responses, and reduce the need for fine-tuning and fact-based grounding—at least hypothetically.

So, what exactly can we do with a context window of 1 million tokens? Google promises to do things like analyze code bases, “reason” about lengthy documents, and hold long conversations with chatbots.

Since Gemini 1.5 Pro is multilingual and multimodal, starting Tuesday it will be able to understand images and videos in addition to text, as well as audio streams, so the model can also analyze and compare TV shows, movies, Content in media such as radio spans broadcasts in different languages, conference call recordings, etc. 1 million tokens translates to approximately 1 hour of video or approximately 11 hours of audio.

With its audio processing capabilities, the Gemini 1.5 Pro can also generate transcriptions of video clips, although the jury is still out on the quality of these transcriptions.

In a pre-recorded demo earlier this year, Google showed off Gemini 1.5 Pro searching the transcript of the Apollo 11 moon landing TV broadcast (roughly 400 pages) for quotes containing jokes, and then finding quotes similar to the following in the film footage Contents of the scene: Pencil sketch.

Google says early users of Gemini 1.5 Pro — including United Wholesale Mortgage, TBS and Replit — are taking advantage of large contextual windows to perform tasks ranging from mortgage underwriting; automated metadata tagging of media archives; and generation, interpretation and transformation. code.

The Gemini 1.5 Pro can’t handle a million tokens at the snap of a finger. In the demo mentioned earlier, each search took 20 seconds to a minute to complete—much longer than the average ChatGPT query.

Google has previously said that latency is an area of ​​focus and that it is working to “optimize” Gemini 1.5 Pro over time.

Notably, Gemini 1.5 Pro is slowly making its way to other parts of Google’s enterprise product ecosystem, with the company announcing on Tuesday that the model (in private preview) will power new features in Code Assist, Google’s generative AI coding assistance tool. . Google says developers can now perform “large-scale” changes across code bases, such as updating cross-file dependencies and reviewing large chunks of code.


#Google #Gemini #Pro #enters #public #preview #Vertex

Leave a Reply

Your email address will not be published. Required fields are marked *