Tech | Visa | Scholarship/School | Info Place

AI21 Labs breathes life into a new generation of AI Transformers with Jamba

Join us in Atlanta on April 10 to explore the future of a safe workforce. We’ll explore the vision, benefits, and use cases of artificial intelligence for security teams. Request an invitation here.

The concept of Transformers has dominated the field of generative artificial intelligence since the groundbreaking research paper “Attention is All You Need” was first published in 2017.

Transformers aren’t the only path forward for generative AI, however. A new method from AI21 Labs called “Jamba” looks to surpass Transformers. Jamba combines the Mamba model based on the Structured State Space Model (SSM) with the Transformer architecture to create an optimized gen AI model. Jamba is the abbreviation of Joint Attention and Mamba (Jamba) architecture, which is designed to combine the best attributes of SSM and Transformer. Jamba is released as an open source model under the Apache 2.0 license.

To be clear, Jamba is currently unlikely to replace current Transformer-based large language models (LLMs), but it may be complementary in some areas. According to AI21 Labs, Jamba outperforms traditional Transformer-based models on generative reasoning tasks according to benchmarks such as HellaSwag. However, it currently does not outperform Transformer-based models on other key benchmarks such as massive multi-task language understanding (MMLU) for problem solving.

Jamba is more than just a new Jurassic work from AI21 Labs

AI21 Labs has a special focus on next-generation artificial intelligence for enterprise use cases. The company raised $155 million in August 2023 to support its growing efforts.

VB event

Artificial Intelligence Impact Tour – Atlanta

Continuing our tour, we will head to Atlanta for the AI ​​Impact Tour stop on April 10th. This exclusive, invitation-only event in partnership with Microsoft will discuss how generative AI is transforming the security workforce. Space is limited, please request an invitation now.

request an invitation

The company’s enterprise tools include Wordtune, an optimization service that helps businesses generate content that matches the organization’s tone and brand. A121 Labs told VentureBeat in 2023 that it often competes with and directly wins the new generation of AI giant OpenAI in enterprise business.

To date, AI21 Labs’ LLM technology relies on the Transformer architecture, like all other LLMs. Just a year ago, the company launched the Jurassic-2 LLM series, which is part of the AI21 Studio natural language processing (NLP)-as-a-service platform and is also available for enterprise integration via API.

Jamba is not an evolution of Jurassic, it is a completely different hybrid SSM and Transformer model.

Attention isn’t all you need, you also need context

Transformers have dominated the next generation of artificial intelligence so far, but there are still some shortcomings. Most notably, inference generally slows down as the context window grows.

As AI21 Lab researchers point out, Transformer’s attention mechanism scales with sequence length and reduces throughput because each token relies on the entire sequence before it. This puts long-context use cases outside the scope of efficient production.

Another issue highlighted by AI21 Labs is the large memory footprint required to scale the Transformer. Transformer memory footprint scales with context length, making it challenging to run long context windows or heavily parallel batches without significant hardware resources.

Context and memory resource issues are two issues that the SSM approach hopes to solve.

The Mamba SSM architecture, originally proposed by researchers at Carnegie Mellon University and Princeton University, has less memory requirements and a different attention mechanism to handle large context windows. However, it is difficult for the Mamba method to provide the same output level as the transformer model. The Jamba hybrid SSM Transformer approach is an attempt to combine the resource and context optimization of the SSM architecture with the powerful output capabilities of the transformer.

AI21 Labs’ Jamba model offers a 256K context window and can provide 3x the throughput on long contexts compared to Mixtral 8x7B. AI21 Labs also claims that Jamba is the only model of its size capable of accommodating up to 140K contexts on a single GPU.

It is worth noting that, just like Mixtral, Jamba uses a Mix of Experts (MoE) model. However, Jamba uses MoE as part of its hybrid SSM Transformer approach, allowing for the ultimate in optimization. Specifically, according to AI21 Labs, Jamba’s MoE layer allows it to utilize only 12B of the 52B parameters available at inference time, making these 12B active parameters more efficient than a Transformer-only model of the same size.

Jamba is still in its early stages and is not yet part of AI21 Labs’ enterprise offerings. The company plans to offer a beta guidance version on the AI21 platform soon.

#AI21 #Labs #breathes #life #generation #Transformers #Jamba

Leave a Reply

Your email address will not be published. Required fields are marked *