Join us in Atlanta on April 10 to explore the future of a safe workforce. We’ll explore the vision, benefits, and use cases of artificial intelligence for security teams. Request an invitation here.

Humans are wired to reason: “what ifs” and “whys” and the ability to “read between the lines” and infer unstated information are all critical to our ability to solve problems.

So far, artificial intelligence models have naturally struggled in this area. But researchers from Stanford University and Notbad AI, Inc. have now revealed that they have taught artificial intelligence models to think before responding to prompts, much like (most) humans think about what to say before speaking.

The researchers introduced Quiet-STaR — an extension of the Self-Learning Reasoner (STaR) model — which is trained on an extensive corpus of Internet data and learns to generate rationales for each token to explain future text and improve predictions.

Quiet-STaR, applied to Mistral 7B, shows % improvements in zero-shot direct inference capabilities on the CommonsenseQA question answering challenge (from 36.3% basic to 47.2%) and the GSM8K primary school mathematics word problem dataset (from 5.9% basic to 10.9%). Moreover, these improvements increase as the number of markers used in the model’s “internal mind” increases.

VB event

Artificial Intelligence Impact Tour – Atlanta

Continuing our tour, we will head to Atlanta for the AI ​​Impact Tour stop on April 10th. This exclusive, invitation-only event in partnership with Microsoft will discuss how generative AI is transforming the security workforce. Space is limited, please request an invitation now.

request an invitation

“Quiet-STaR marks a step toward LM learning to reason in a more general and scalable way,” the researchers wrote.

The shortcomings of artificial intelligence reasoning so far

Previous approaches to helping language models learn from inference were more focused and less general: AI was trained to solve a single task or a predefined set of tasks that relied on a carefully curated dataset.

For example, Quiet-STaR developers note that pre-trained language models that are fine-tuned to output traces of human reasoning before answering multiple-choice questions outperform AI trained directly on the answers. Other models, when provided with “scaffolding,” can generate chain-of-thought solutions without additional supervision. Additionally, the researchers “forced” the model to use thought chain reasoning, preventing the model from answering with complete confidence.

Researchers at Stanford University and Notbad AI, Inc. argue: “However, these methods again only work on question-answering datasets.”

STaR in particular demonstrates that a model can “bootstrap” its reasoning capabilities on a question-and-answer dataset. They can sample fundamentals to try to answer a question, and if those fundamentals lead to the correct answer, they can train on those fundamentals and repeat it iteratively to solve increasingly difficult problems.

However, Quiet-STaR researchers note that training on curated datasets limits the “scale and generalizability” of the underlying principles. High-quality datasets “essentially cover only a subset of inference tasks.”

The researchers assert that inferring fundamentals from a handful of examples in Q&A is a “highly constrained environment.” “Ideally, language models can learn to infer unstated rationales in arbitrary text.”

By extending STaR, “we allow the LM to learn from a variety of tasks that exist in the language. To our knowledge, this is the first work to explicitly train an LM to reason from text, rather than from a curated inference task or collection of inference tasks.” Make inferences.”

Think “Quietly”

Researchers at Stanford University and Notbad AI, Inc. call their technique Quiet-STaR because it applies STaR “quietly.”

The method generates many internal thoughts in parallel at each token to interpret future text before responding to the prompt (i.e., the process of “thinking”). When the AI ​​finally comes up with an answer, it produces a mixture of predictions with or without justification.

The REINFORCE algorithm is then applied; in reinforcement learning, this collects samples from an episode to update the policy parameters and thought-start and thought-end embeddings. The researchers explain that this helps increase the likelihood that the AI ​​will accurately predict future text. As part of this, the model also discards incorrect predictions.

“By iteratively optimizing these parameters, Quiet-STaR trains the model to generate more useful rationales throughout the training process,” the researchers wrote.

Because they aimed at generalist reasoning, they used zero-shot prompts (“Let’s think step-by-step”) without contextual examples. Quiet-STaR is applied to Mistral 7B using the web text dataset OpenWebMath and the Colossal Clean Crawled Corpus.

“Quiet-STaR…allows the model to quietly think about each token and train a useful distribution,” the researchers wrote.

They add, “By training on rich inference tasks implicit in diverse web texts, rather than narrowly specializing in a specific dataset, Quiet-STaR points the way to more powerful and adaptable language models.”

Bridging the gap between models and human reasoning capabilities

Notably, the researchers created a parallel sampling algorithm that can generate rationales from all tokens in a string. This enables the mark to “focus on itself”, with all preceding marks having the same idea and preceding text. This allows “continuation of all ideas in parallel” and each inference call generates an additional token for all tokens.

The researchers introduced custom meta tags at the beginning and end of each idea. <|思想开始|> and <|思想结束|> Initialized with a dash “-“, usually used to indicate pause.

“Intuitively, the start thought flag can be understood as putting the model into ‘thought mode’, while the end thought flag can be understood as telling the model when it has finished thinking,” the researchers explain.

The next step incorporates a so-called “hybrid head”, a “shallow” multi-layer perceptron. This helps researchers retrospectively determine the extent to which next-tagged predictions for a given idea were incorporated into current next-tagged predictions.

Finally, the researchers optimized parameters to increase the likelihood that future text would be more likely to appear. Reinforcement techniques provide “learning signals” to fundamentals based on their impact on future predictions. To help reduce variance, the researchers also introduced a “teacher forcing” technique to ensure the neural network was as close as possible to the real sequence.

Ultimately, “Quiet-STAR represents a step toward language models that can learn to reason in a general and scalable way,” the researchers concluded. “Future work can build on these insights to further close the gap between language models and human-like reasoning capabilities.”

#QuietSTAR #language #models #learn #speak

Leave a Reply

Your email address will not be published. Required fields are marked *