- The Vision, Debugged;
- Posts
- Forget Siri, Meet Octopus! How Nexa AI is Revolutionizing AI Agents
Forget Siri, Meet Octopus! How Nexa AI is Revolutionizing AI Agents
PLUS: Build your first AI Chatbot in less than 10 minutes
Howdy fellas!
Spark & Trouble are back in action, ready to crack open the latest AI wonders and show you how to leverage them for maximum impact.
Hereās the scoop for this edition š
Nexa AIās Octopus model outperforms top LLMs in both speed & latency while performing actions
3 awesome AI Tools you JUST can't miss!
How you can build your first AI Chatbot in less than 10 minutes!
Time to jump in!š
PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.
Hot off the Wires š„
We're eavesdropping on the smartest minds in research. š¤« Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.ā”
āWhat is AI Anyway?ā š¤ Thatās a deep question to ponderā¦ What are we actually building towards? This is what Mustafa Suleyman (CEO of Microsoft AI) discussed in his recent TED Talk (yes, that was the title of the talk). A key takeaway from the talk was that LLMs today have achieved a pretty good level of IQ, & some amazing work is being done as far as EQ is concerned (think of the recent release of Hume AI). And now, the future lies in āAQ (Action Quotient)ā - with AI actually ādoing stuffā for you - like a super-helpful assistant on your phone.
And this is exactly where Nexa AI aims to shine, with its state-of-the-art on-device Octopus models, opening up a new paradigm of āSmall Action Models (SAMs)ā.
Nexa AIās Octopus v2 model (source: HuggingFace)
Okay, first letās clarify some key terminology (weāll use them frequently going further):
Function Calling: To perform actual actions (like āsearching YouTubeā or āsetting an alarmā), language models need to interact with other entities or tools. They do this by using the APIs provided by these entities, by calling functions to invoke the API with appropriate parameters. To understand the functions available for use, language models are provided a list of these functions, along with their detailed description & parameters.
On-Device Models: These are AI models which are deployed onto the device with which the user is interacting. Itās a constrained environment without GPUs, or tons of memory - so you canāt have really bulky models on-device.
Tokens: In the world of AI, tokens are like the beads on a necklace, each one representing a piece of language, like a word or part of a word. They help AI understand and generate text.
So, whatās new?
As of today, weāve taken a few steps towards building such AI āagentsā - stuff like LangChainās āAgentā module & MultiOnās āAgentā API are gaining a lot of traction. But folks wonāt disagree that developing & working with such agents is not scalable. They rely on cloud-based models, having multiple drawbacks - privacy, WiFi requirement, and cost (performing a single task may require multiple API calls, and each API call consists of ~1000 tokens when function calling is involved).
Folks have started exploring smaller models, or SLMs, which can be deployed on edge computing devices. However, that too has its own set of challenges including extensive tuning for function calling, lower inference speeds (absence of GPUs of edge devices) & battery draining of edge devices like iPhones (a big no-no!).
Did you know?
Energy consumption reaches 0.1J per token for 1 billion parameter models, so even an SLM with 7B parameters, used for function calling (~1000 tokens) consumes 700J per call = 1.4% of iPhone battery.
This means, just 71 function calls will drain the battery fully!
To address these challenges, the amazing folks at Nexa AI recently released their Octopus v2 model - itās fast, accurate, and energy-efficient, making it perfect for powering future AI agents on all sorts of devices.
Under the hoodā¦
So, whatās the magic behind Octopus v2 being such a ādesirableā model? šŖ
(Itās just a 2B parameter model, clocking 98%+ function calling accuracy while achieving 95% reduction in context lengths - for iPhones, it means 37x more function calls)
Well, folks at Nexa came up with the idea of āfunctional tokensā - these are special tokens assigned to function names that are added to the modelās vocabulary.
To understand this, think of the following:
Say, your phone has a function get_weather_forecast()
that can fetch the weather at any location. You ask an LLM agent āWhatās the weather like in Seattle?ā
Since this is real-time information, that the LLM is not trained on, it will try to perform function calling. However, while predicting the function name, a usual LLM would have to predict multiple tokens (like get
, _
, weather
, _
, forecast
), leaving leeway for inaccuracies to creep in, while also bloating the number of tokens.
Instead, Octopus v2 assigns get_weather_forecast()
function a new token (say, <nexa_0>
) ā Now, when it wants to perform function calling, it needs to predict just this 1 token!
For a set of N functions that the model can leverage, this essentially turns the āfunction name predictionā problem into a āclassification among N tokensā problem (much easier to solve as well).
TBH, although introducing new tokens into the modelās vocabulary is not something ānewā (weāve seen that a lot back in the day with custom BERT models, etc.), its use in LLMs for function calling is definitely a smart move. This is exactly what ācreativity in data scienceā is all about!
Here are some more details about Octopus v2 worth knowing:
The base model is Gemma-2B model
A prompt template is used that facilitates single, parallel & nested function calls
It implements āearly stoppingā through the use of another special token, which drastically reduces the context length (basically, the model stops processing when it encounters this token)
The training dataset was synthetically created using Googleās Gemini, to consider 20 Android APIs across System APIs (calling, texting, etc.), App APIs (preinstalled Google Apps) & Smart Device APIs. It included negative sampling as well as a verification mechanism.
Dataset generation process for Octopus v2 (source: Octopus v2 paper)
Why does this matter?
Octopus v2 achieved phenomenal results when compared to top on-cloud LLMs & SLMs which seem to be promising candidates for AI agents till now - both in terms of function call accuracy & latency:
Model | Accuracy | Latency |
---|---|---|
[Meta] LLaMA + RAG | 68.1% | 13.46s |
[OpenAI] GPT-3.5 + RAG | 98.1% | 1.97s |
[OpenAI] GPT-4 | 98.5% | 1.02s |
[Microsoft] Phi-3 | 45.7% | 10.2s |
[Apple] OpenELM | (Unable to generate function calls) | 50.78s |
[Nexa AI] Octopus v2 | 99.0% | 0.37s š¤Æ |
Moreover, Nexa folks also extended Octopus v2 to use 3rd party app APIs, like those of DoorDash & Yelp, and observed similar results, which is actually great! The best part? All this is open source, with their model publicly available through HuggingFace.
Imagine Siri and Google Assistant (or maybe a new competitor) on steroids, automating app workflows with this tech!
Not just assistants, this tech could eventually become a brainy sidekick for wearables like smartwatches.
Augmented & Virtual Reality are other exciting avenues - Unity's already dipped its toes in AR/VR with Octopus & it looks mind-blowing!
Nexa AI's Octopus model transforms VR/AR experiences with on-device AI. Take a look at our demo below!
Highlights:
āļø Compatibility: Smooth operation on VR headsets like Meta Quest 2.
š” Offline: Octopus runs entirely on-device, no internet needed.
ā” Rapid Inference:ā¦ twitter.com/i/web/status/1ā¦ā Nexa AI (@nexa4ai)
5:02 PM ā¢ Apr 29, 2024
Wait! Thereās more to this madnessā¦
Folks at Nexa seem to have many more tricks up their sleeve!
Within a few weeks of releasing Octopus v2, they announced Octopus v3 - a multimodal AI agent at par with (GPT-4 + GPT-4V) for function calling, optimized to a size of less than 1B parameters, able to process multilingual queries. It is compatible with several edge devices including something as constrained as a Raspberry Pi. The model is currently under research & not made public.
Introducing OctopusV3š! The smallest, most powerful on-device multimodal model for super AI agents š¤ - fast, accurate, energy-efficient
Highlights:
š Compact size: Less than 1B parameters
šø Multimodal: Processes both text and images for function calling
šÆ Highā¦ twitter.com/i/web/status/1ā¦ā Nexa AI (@nexa4ai)
2:20 AM ā¢ Apr 18, 2024
And they didnāt stop there! They extended the concept of āfunctional tokensā to now call specialized LLMs, interconnected through a graph, to create Octopus v4 as an orchestrator for this graph. This framework can address scalability challenges as well as result in faster & āgreenerā inferences.
Octopus v4 can identify the most relevant GPT and transform the initial query into a format best suited for the selected GPT (source: Octopus v4 paper)
Spark and Trouble are super excited to see how this vision unfolds. Imagine the possibilities when different LLMs with unique strengths collaborate seamlessly ā a truly intelligent and interconnected future awaits!
10x Your Workflow with AI š
Work smarter, not harder! In this section, youāll find prompt templates š & bleeding-edge AI tools āļø to free up your time.
Fresh Prompt Alert!šØ
Spark & Trouble know a killer landing page is the first impression that converts clicks to customers. But crafting compelling copy can feel like staring at a blank canvas.
Fear not! This week's prompt is here to bridge the gap between your brilliant product and persuasive prose. Try it out & see for yourself š
You are a seasoned copywriter experienced in writing highly converting landing page copies for some of the worldās top products.
Help me create a persuasive landing page for [my product/service] that converts visitors into customers. Here are some details about my product/service:
[insert details such as benefits, unique selling points, target audience, etc.].
3 AI Tools You JUST Can't Miss š¤©
š» Replit - Code in your browser, instantly - from Python to Unity - code, share & deploy projects
š Momento AI - Lets you create your AI companion (or chat with existing ones) for real conversations and 24/7 support.
š„ Fliki - Your AI partner for video creation from text, blogs, tweets & much more!
Spark 'n' Trouble Shenanigans š
Do you wish to create an AI-powered Chatbot, but doubt your skills? Or just thinking whoās got the time?
Well, Spark & Trouble have you coveredā¦
Now you can build advanced AI chatbots visually in minutes, powered by GPT and Claude, and add it to any website! Without any coding! š®
Head over to Chatling, create an account & create your first AI chatbot for FREE.
Creating an AI Chatbot using Chatling Builder (source: instagram)
Hereās a quickstart tutorial for you to get a feel for this awesome platform.
Check it out & do share your experience with us! If you end up building one & showcasing it on social media, donāt forget to tag us š
Well, thatās a wrap! Until then, |
Reply