Reinforcement Self-Training (REST) and Fine-Tuning of LLMs meet ReACT-style LLM agent for reasoning and action on external data, by Google, on the topic of Medicine. AI Agent to self-improve + self-fine-tune.
Reward policy optimization and ranking code now evolves to a simple prompt: Prompt Engineering in 2024 continues!
Advanced Local LLM Update Mechanism
The described system introduces a mechanism for overnight self-updating of local Large Language Models (LLMs), such as those on Mac Mini or Mac Studio (192GB unified mem) devices. Users can command these LLMs to autonomously update on specific topics, ranging from niche medical fields to new medications. This process culminates in the early morning with the creation of a downsized yet highly focused LLM variant for mobile deployment, ensuring users have the latest, specialized information at their fingertips.
This development is part of Google's ongoing research pipeline, slated for public release maybe in the first quarter of 2024. A notable aspect is the performance leap in LLMs, quantitatively illustrated by an increase from 70% to 77% efficiency. This improvement is benchmarked against a challenging dataset curated by Google, specifically designed to test the limits of search engines. Notably, the self-distilled, much smaller, LLMs, with a reduced parameter count (between two to seven billion), exhibit significant performance boosts when self-updated with cutting-edge data, demonstrating a near-linear enhancement in capability.
Google merges a React-style LLM agent, adept at reasoning and interacting with external data, with a Reinforcement Self-Training (REST) approach. This hybrid model facilitates iterative training and self-enhancement of the LLM. Key features of this methodology include AI-generated feedback, creation of synthetic datasets for AI appraisal, and comprehensive model fine-tuning. This framework enables a continuous self-improvement cycle, where the LLM progressively refines its responses and learning efficiency.
The technical process involves the LLM sourcing and synthesizing new data overnight, followed by generating and refining responses based on this data. These responses undergo evaluation and reinforcement learning, leading to iterative model enhancement. The outcome is an LLM that is not only more adept at specific tasks but also capable of generating succinct, up-to-date models for mobile devices. This represents a significant step in AI development, focusing on localized, autonomous learning and adaptation, tailored to user-specified domains.
#ai
#airesearch
#reasoning
38 Comments