Tech | Visa | Scholarship/School | Info Place

Google releases Imagen 2, video clip generator

Google doesn’t have the best track record when it comes to image-generating AI.

In February this year, the image generator built into Google’s artificial intelligence chatbot Gemini was found to randomly inject gender and racial diversity into prompts about people, resulting in images of racially diverse Nazis, among other objectionable errors.

Google took down the generator and vowed to improve it and eventually re-release it. While we await its return, the company has launched Imagen 2, an enhanced image generation tool within its Vertex AI developer platform, albeit one with a decidedly more enterprise slant. Google announced Imagen 2 at its annual Cloud Next conference in Las Vegas.

Imagen 2 is actually a family of models, launched in December after being previewed at the Google I/O conference in May 2023, that can create and edit images based on text prompts, much like OpenAI’s DALL-E and Midjourney. For business types, Imagen 2 can render text, logos and logos in multiple languages, with the option to overlay these elements on existing images such as business cards, apparel and products.

After first launching in preview, Vertex AI now generally offers image editing with Imagen 2 along with two new features: Heal and Heal.Repair and Repair, other popular image generators including DALL-E have been offering functionality for deletion for some time Remove unwanted parts of the image, add new components and extend the image’s boundaries to create a wider field of view.

But the real heart of the Imagen 2 upgrade is what Google calls “text to live images.”

Imagen 2 can now create short four-second videos based on text prompts, similar to AI-driven clip generation tools like Runway, Pika, and Ireverent Labs. Consistent with Imagen 2’s corporate focus on using live images as a tool for marketers and creatives, such as a GIF generator for ads showcasing nature, food, and animals, Imagen 2’s theme has been fine-tuned.

Google says the live image can capture “a range of camera angles and movements” while “Supports consistency throughout the sequence. ” But currently they have a lower resolution: 360 pixels x 640 pixels. Google promises to improve this in the future.

To allay (or at least try to allay) concerns about the possibility of deepfakes, Google says Imagen 2 will employ SynthID, a method developed by Google DeepMind, to apply invisible encrypted watermarks to live images. Of course, detecting these watermarks (which Google claims can adapt to edits including compression, filters, and tonal adjustments) requires tools provided by Google that are not available to third parties.

No doubt hoping to avoid another generated media controversy, Google stressed that live image generation will be “safely filtered.” A spokesperson told TechCrunch via email: “ The Imagen 2 model in Vertex AI does not suffer from the same issues as the Gemini application. We continue to conduct extensive testing and engage with customers. “

But it’s generous to assume that Google’s watermarking technology, bias mitigation, and filters work as well as it claims, even with live images Competitive Does a video generation tool already exist?

Not really.

Runway can generate higher resolution 18-second clips. Stability AI’s video editing tool, Stable Video Diffusion, offers greater customizability (in terms of frame rate). OpenAI’s Sora – which, of course, isn’t commercially available yet – seems poised to blow away the competition with the level of photorealism it can achieve.

So what are the real technical advantages of live imagery? I’m not sure. And I don’t think I’m being too harsh.

After all, Google is behind some truly impressive video generation technology like Imagen Video and Phenaki. Phenaki is one of Google’s most interesting experiments in text-to-video, turning long, detailed prompts into two-minute-plus “movies”—but be warned, these clips are low-resolution and suffer from low frame rates , and only to a certain degree of coherence.

Recent reports suggest that the generative AI revolution has caught Google CEO Sundar Pichai off guard, and that the company is still struggling to keep up with competitors, so products like Live Images feel like The also-rans are no surprise. But it’s still disappointing. I can’t help but feel that an even more impressive product lurks—or was—lurking in Google’s skunkworks.

Models like Imagen are trained on large numbers of examples, often from public websites and web datasets. Many generative AI vendors view training data as a competitive advantage and therefore take it and the information associated with it to heart. But training data details are also a potential source of intellectual property-related litigation, another factor in the reluctance to reveal too much.

As I did with the announcement about the generative AI model, I asked about the data used to train the updated Imagen 2 and whether creators whose work may have been overwhelmed by the model training process can opt out in the future. at some point.

Google told me only that its models are “primarily” trained on public web data, drawn from “blog posts, media records, and public conversation forums.” Which blogs, transcripts, and forums? It’s anyone’s guess.

A spokesperson pointed out that Google’s web publisher controls allow webmasters to prevent the company from scraping data, including photos and artwork, from their sites. But Google won’t commit to releasing opt-out tools or compensating creators for their (unknowing) contributions — something many competitors, including OpenAI, Stability AI and Adobe, have already done.

Another point worth mentioning: Text-to-live images are not covered by Google’s generative AI indemnity policy, which protects Vertex AI customers from copyright claims related to Google’s use of training data and generative AI model output . This is because Text to Live Image is technically in preview status; the policy only covers generative AI products that are generally available (GA).

Reflux, where a generative model spits out mirrored copies of the examples (such as images) it was trained on, is rightly a concern for enterprise customers. Informal and academic research suggests that the first-generation Imagen (the predecessor to Imagen 2) was not immune, spitting out photos of identifiable people, copyrighted works of artists, and more when prompted in specific ways.

Barring controversy, technical issues, or some other major unforeseen setback, text-to-live images will be coming to GA at some point in the future. But for the live images that currently exist, Google is basically saying: Use at your own risk.

#Google #releases #Imagen #video #clip #generator

Leave a Reply

Your email address will not be published. Required fields are marked *