The Vision, Debugged;
Posts
Forget Siri, Meet Octopus! How Nexa AI is Revolutionizing AI Agents

Forget Siri, Meet Octopus! How Nexa AI is Revolutionizing AI Agents

PLUS: Build your first AI Chatbot in less than 10 minutes

Tezan Sahu & Sandra Anil
May 14th, 2024

Howdy fellas!

Spark & Trouble are back in action, ready to crack open the latest AI wonders and show you how to leverage them for maximum impact.

Here’s the scoop for this edition 👀

Nexa AI’s Octopus model outperforms top LLMs in both speed & latency while performing actions
3 awesome AI Tools you JUST can't miss!
How you can build your first AI Chatbot in less than 10 minutes!

Time to jump in!😄

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

Hot off the Wires 🔥

We're eavesdropping on the smartest minds in research. 🤫 Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.⚡

“What is AI Anyway?” 🤔 That’s a deep question to ponder… What are we actually building towards? This is what Mustafa Suleyman (CEO of Microsoft AI) discussed in his recent TED Talk (yes, that was the title of the talk). A key takeaway from the talk was that LLMs today have achieved a pretty good level of IQ, & some amazing work is being done as far as EQ is concerned (think of the recent release of Hume AI). And now, the future lies in “AQ (Action Quotient)” - with AI actually “doing stuff” for you - like a super-helpful assistant on your phone.

And this is exactly where Nexa AI aims to shine, with its state-of-the-art on-device Octopus models, opening up a new paradigm of “Small Action Models (SAMs)”.

Nexa AI’s Octopus v2 model (source: HuggingFace)

Okay, first let’s clarify some key terminology (we’ll use them frequently going further):

Function Calling: To perform actual actions (like “searching YouTube” or “setting an alarm”), language models need to interact with other entities or tools. They do this by using the APIs provided by these entities, by calling functions to invoke the API with appropriate parameters. To understand the functions available for use, language models are provided a list of these functions, along with their detailed description & parameters.

On-Device Models: These are AI models which are deployed onto the device with which the user is interacting. It’s a constrained environment without GPUs, or tons of memory - so you can’t have really bulky models on-device.

Tokens: In the world of AI, tokens are like the beads on a necklace, each one representing a piece of language, like a word or part of a word. They help AI understand and generate text.

So, what’s new?

As of today, we’ve taken a few steps towards building such AI “agents” - stuff like LangChain’s “Agent” module & MultiOn’s “Agent” API are gaining a lot of traction. But folks won’t disagree that developing & working with such agents is not scalable. They rely on cloud-based models, having multiple drawbacks - privacy, WiFi requirement, and cost (performing a single task may require multiple API calls, and each API call consists of ~1000 tokens when function calling is involved).

Folks have started exploring smaller models, or SLMs, which can be deployed on edge computing devices. However, that too has its own set of challenges including extensive tuning for function calling, lower inference speeds (absence of GPUs of edge devices) & battery draining of edge devices like iPhones (a big no-no!).

Did you know?

Energy consumption reaches 0.1J per token for 1 billion parameter models, so even an SLM with 7B parameters, used for function calling (~1000 tokens) consumes 700J per call = 1.4% of iPhone battery.
This means, just 71 function calls will drain the battery fully!

To address these challenges, the amazing folks at Nexa AI recently released their Octopus v2 model - it’s fast, accurate, and energy-efficient, making it perfect for powering future AI agents on all sorts of devices.

Under the hood…

So, what’s the magic behind Octopus v2 being such a ‘desirable’ model? 🪄
(It’s just a 2B parameter model, clocking 98%+ function calling accuracy while achieving 95% reduction in context lengths - for iPhones, it means 37x more function calls)

Well, folks at Nexa came up with the idea of “functional tokens” - these are special tokens assigned to function names that are added to the model’s vocabulary.

To understand this, think of the following:

Say, your phone has a function get_weather_forecast() that can fetch the weather at any location. You ask an LLM agent “What’s the weather like in Seattle?”

Since this is real-time information, that the LLM is not trained on, it will try to perform function calling. However, while predicting the function name, a usual LLM would have to predict multiple tokens (like get , _ , weather, _, forecast), leaving leeway for inaccuracies to creep in, while also bloating the number of tokens.

Instead, Octopus v2 assigns get_weather_forecast() function a new token (say, <nexa_0> ) ⇒ Now, when it wants to perform function calling, it needs to predict just this 1 token!

For a set of N functions that the model can leverage, this essentially turns the “function name prediction” problem into a “classification among N tokens” problem (much easier to solve as well).

TBH, although introducing new tokens into the model’s vocabulary is not something ‘new’ (we’ve seen that a lot back in the day with custom BERT models, etc.), its use in LLMs for function calling is definitely a smart move. This is exactly what “creativity in data science” is all about!

Here are some more details about Octopus v2 worth knowing:

The base model is Gemma-2B model
A prompt template is used that facilitates single, parallel & nested function calls
It implements “early stopping” through the use of another special token, which drastically reduces the context length (basically, the model stops processing when it encounters this token)
The training dataset was synthetically created using Google’s Gemini, to consider 20 Android APIs across System APIs (calling, texting, etc.), App APIs (preinstalled Google Apps) & Smart Device APIs. It included negative sampling as well as a verification mechanism.

Dataset generation process for Octopus v2 (source: Octopus v2 paper)

Why does this matter?

Octopus v2 achieved phenomenal results when compared to top on-cloud LLMs & SLMs which seem to be promising candidates for AI agents till now - both in terms of function call accuracy & latency:

Model	Accuracy	Latency
[Meta] LLaMA + RAG	68.1%	13.46s
[OpenAI] GPT-3.5 + RAG	98.1%	1.97s
[OpenAI] GPT-4	98.5%	1.02s
[Microsoft] Phi-3	45.7%	10.2s
[Apple] OpenELM	(Unable to generate function calls)	50.78s
[Nexa AI] Octopus v2	99.0%	0.37s 🤯

Moreover, Nexa folks also extended Octopus v2 to use 3rd party app APIs, like those of DoorDash & Yelp, and observed similar results, which is actually great! The best part? All this is open source, with their model publicly available through HuggingFace.

Imagine Siri and Google Assistant (or maybe a new competitor) on steroids, automating app workflows with this tech!

Not just assistants, this tech could eventually become a brainy sidekick for wearables like smartwatches.

Augmented & Virtual Reality are other exciting avenues - Unity's already dipped its toes in AR/VR with Octopus & it looks mind-blowing!

Nexa AI's Octopus model transforms VR/AR experiences with on-device AI. Take a look at our demo below!
Highlights:
⚙️ Compatibility: Smooth operation on VR headsets like Meta Quest 2.
📡 Offline: Octopus runs entirely on-device, no internet needed.
⚡ Rapid Inference:… twitter.com/i/web/status/1…
— Nexa AI (@nexa4ai)
5:02 PM • Apr 29, 2024

Wait! There’s more to this madness…

Folks at Nexa seem to have many more tricks up their sleeve!

Within a few weeks of releasing Octopus v2, they announced Octopus v3 - a multimodal AI agent at par with (GPT-4 + GPT-4V) for function calling, optimized to a size of less than 1B parameters, able to process multilingual queries. It is compatible with several edge devices including something as constrained as a Raspberry Pi. The model is currently under research & not made public.

Introducing OctopusV3🐙! The smallest, most powerful on-device multimodal model for super AI agents 🤖 - fast, accurate, energy-efficient
Highlights:
📏  Compact size: Less than 1B parameters
📸  Multimodal: Processes both text and images for function calling
🎯  High… twitter.com/i/web/status/1…
— Nexa AI (@nexa4ai)
2:20 AM • Apr 18, 2024

And they didn’t stop there! They extended the concept of “functional tokens” to now call specialized LLMs, interconnected through a graph, to create Octopus v4 as an orchestrator for this graph. This framework can address scalability challenges as well as result in faster & ‘greener’ inferences.

Octopus v4 can identify the most relevant GPT and transform the initial query into a format best suited for the selected GPT (source: Octopus v4 paper)

Nexa has open-sourced this whole thing, inviting everyone to join the party and build the world’s largest graph of language models.

Spark and Trouble are super excited to see how this vision unfolds. Imagine the possibilities when different LLMs with unique strengths collaborate seamlessly – a truly intelligent and interconnected future awaits!

10x Your Workflow with AI 📈

Work smarter, not harder! In this section, you’ll find prompt templates 📜 & bleeding-edge AI tools ⚙️ to free up your time.

Fresh Prompt Alert!🚨

Spark & Trouble know a killer landing page is the first impression that converts clicks to customers. But crafting compelling copy can feel like staring at a blank canvas.

Fear not! This week's prompt is here to bridge the gap between your brilliant product and persuasive prose. Try it out & see for yourself 👇

You are a seasoned copywriter experienced in writing highly converting landing page copies for some of the world’s top products.

Help me create a persuasive landing page for [my product/service] that converts visitors into customers. Here are some details about my product/service:
[insert details such as benefits, unique selling points, target audience, etc.].

* Replace the content in brackets with your details

3 AI Tools You JUST Can't Miss 🤩

💻 Replit - Code in your browser, instantly - from Python to Unity - code, share & deploy projects
😊 Momento AI - Lets you create your AI companion (or chat with existing ones) for real conversations and 24/7 support.
🎥 Fliki - Your AI partner for video creation from text, blogs, tweets & much more!

Spark 'n' Trouble Shenanigans 😜

Do you wish to create an AI-powered Chatbot, but doubt your skills? Or just thinking who’s got the time?

Well, Spark & Trouble have you covered…

Now you can build advanced AI chatbots visually in minutes, powered by GPT and Claude, and add it to any website! Without any coding! 😮

Head over to Chatling, create an account & create your first AI chatbot for FREE.

Creating an AI Chatbot using Chatling Builder (source: instagram)

Here’s a quickstart tutorial for you to get a feel for this awesome platform.

Check it out & do share your experience with us! If you end up building one & showcasing it on social media, don’t forget to tag us 😁

Well, that’s a wrap!
Thanks for reading 😊

See you next week with more mind-blowing tech insights 💻

Until then,
Stay Curious🧠 Stay Awesome🤩

PS: Do catch us on LinkedIn - Sandra & Tezan

Reply

or to participate.