Tech | Visa | Scholarship/School | Info Place

AI21 Labs’ new AI model can handle more context than most

The AI ​​industry is increasingly turning to generative AI models with longer context. But models with large context windows tend to be computationally intensive. Dagan, head of product at artificial intelligence startup AI21 Labs, asserts that this is not necessarily the case, and his company is releasing a generative model to prove it.

The context or context window refers to the input data (such as text) that the model considers before generating output (more text). Models with smaller context windows tend to forget even recent conversational content, while models with larger context avoid this pitfall – and as an added bonus, have a better grasp of the data stream they receive.

AI21 Labs’ Jamba is a new text generation and analysis model that can perform many of the same tasks as models like OpenAI’s ChatGPT and Google’s Gemini. Jamba is trained on a mix of public and proprietary data and can write text in English, French, Spanish and Portuguese.

Jamba can handle up to 140,000 tokens when running on a single GPU (such as a high-end Nvidia A100) with at least 80GB of memory. That means about 105,000 words, or 210 pages—a decent-sized novel.

In comparison, Meta’s Llama 2 has a context window of 32,000 tokens (small by today’s standards), but only requires a GPU with about 12GB of memory to run. (Context windows are typically measured in tokens, which are bits of raw text and other data.)

On the surface, Jamba is unremarkable. There are a number of free, downloadable generative AI models available, from Databricks’ recently released DBRX to the aforementioned Llama 2.

But what’s unique about Jamba is what’s inside. It uses a combination of two model architectures: transformer and state space model (SSM).

Transformer is the architecture of choice for complex inference tasks, powering models such as GPT-4 and Google Gemini. They have several unique characteristics, but by far the defining characteristic of Transformers is their “attention mechanism.”For each piece of input data (such as a sentence), the transformer weighing relevance of each other input (other sentences) and extract from them to generate an output (a new sentence).

SSM, on the other hand, combines several features of older types of AI models, such as recurrent neural networks and convolutional neural networks, to create a more computationally efficient architecture capable of handling long data sequences.

Now, SSM has its limitations. But some early versions, including an open-source model called Mamba by researchers at Princeton University and Carnegie Mellon University, can handle larger inputs than equivalent Transformer-based models while outperforming them on language generation tasks.

In fact, Jamba uses Mamba as part of its core model, which Dagan claims provides three times the throughput on long contexts compared to a similarly sized Transformer-based model.

“While there are some initial academic examples of the SSM model, this is the first commercial-grade, production-scale model,” Dagan told TechCrunch in an interview. “In addition to being innovative and of interest to the community for further research, this architecture opens up tremendous efficiency and throughput possibilities.”

Now, while Jamba is released under the Apache 2.0 license, an open source license with relatively few usage restrictions, Dagan stresses that this is a research version and not intended for commercial use. The model has no safeguards to prevent the generation of toxic text and no mitigations to address potential bias; a fine-tuned, ostensibly “safer” version will be rolled out in the coming weeks.

But Dagan asserts that even in its early stages, Jamba shows promise in the SSM architecture.

“The added value of this model lies in its size and innovative architecture, which can be easily installed on a single GPU,” he said. “We believe that as Mamba makes more adjustments, performance will improve further.”

#AI21 #Labs #model #handle #context

Leave a Reply

Your email address will not be published. Required fields are marked *