The Vision, Debugged;
Posts
How Microsoft’s Phi-3 Redefines Power in Your Pocket

How Microsoft’s Phi-3 Redefines Power in Your Pocket

PLUS: We Tried Moshi’s Real-Time Voice AI—Here’s Why It Blew Our Minds!

Tezan Sahu & Sandra Anil
October 1st, 2024

Howdy fellas!

While LLMs (or “large” language models) dominated 2023, much of the focus in 2024 has been towards developing smaller models that may be comparable to these giants of the AI space. In this edition, Spark and Trouble uncover some of the impressive evolution behind compact AI models that could one day fit snugly in your pocket, and much more!

Here’s a sneak peek into today’s edition 👀

💵 This AI prompt could lead to your next investment win. Check it out!
🪄 5 awesome AI tools to skyrocket your productivity
🤖 What makes Microsoft’s Phi-3 models punch so high above their weight category?
🔊 Can Moshi’s Real-Time Voice AI Live Up to the Hype?

Time to jump in!😄

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

10x Your Workflow with AI 📈

Work smarter, not harder! In this section, you’ll find prompt templates 📜 & bleeding-edge AI tools ⚙️ to free up your time.

Fresh Prompt Alert!🚨

Ready to turn your investment dreams into reality? 🚀💰

This week's Fresh Prompt Alert is your golden ticket to financial wizardry! Say goodbye to aimless scrolling through stock tickers and hello to personalized, pro-level market insights. Whether you're a Wall Street wolf or a cautious kitten, this prompt will have you navigating market trends like a boss.

Who knows? Your next big investment might be just a prompt away! 👇🏼

You are an experienced Financial Analyst specializing in identifying solid investment opportunities from thorough research of the [country] market, personalized to your client.

Please conduct research and analysis to identify current market trends and potential investment opportunities in [specific industry or sector].

Following are some relevant details that may be helpful for your analysis:
1. [My investment goals]
2. [My risk tolerance]
3. [My time horizon]

Also, please provide me with a report that includes your findings and recommendations.

* Replace the content in brackets with your details [better to try this on Microsoft Copilot or any other chatbot with access to the web]

Check out an example of this prompt in action!

5 AI Tools You JUST Can't Miss 🤩

📡 ReviewRadar: Figure out what users want, fast! with AI-powered Customer-Review Analysis
📖 Inncivio: An AI-Powered Learning Infrastructure for Businesses
🎙️ AudioNotes AI: Transform Your Thoughts into Clear Text Notes
🧱 Bricks: AI-powered spreadsheets for effortless reports
👨‍🏫 GuruBase: The tech world’s short-cut search

Hot off the Wires 🔥

We're eavesdropping on the smartest minds in research. 🤫 Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.⚡

Remember when having a calculator on your phone was mind-blowing? Well, in this AI-powered era, we've come a long way from basic arithmetic. Now, it's all about having increasingly capable AI models right in your pocket!

And Microsoft just raised the bar with their latest breakthrough: the Phi-3 family of Small Language Models (SLMs), which pack some serious punch on the go!

rose ali GIF by The Bachelorette Australia

After experiencing the Phi-3 series of SLMs… (source: Gif by bacheloretteau on Giphy)

Forging the fundamentals

As language models evolve, two paths have emerged: massive models with hundreds of billions of parameters, racing toward AGI; and efficient small models that pack impressive power into smartphones and wearables.

A glimpse into the journey of small language models (source: SLM survey paper)

The vision behind SLMs?—Democratize machine intelligence. And, Microsoft is leading the charge with the Phi-3 family, built to bring machine intelligence to everyone, anywhere, anytime.

Before we dive into Microsoft's latest creation, let's take a quick look at where Small Language Models stand today. Despite their compact size, SLMs are proving powerful:

Data Quality Over Quantity: Training data quality is proving to be super critical for SLM capability.
Overtraining FTW: SLMs are typically trained on massive amounts of data (>1.5T tokens), regardless of their size. This "overtraining" helps pack more power into smaller packages.
Closing the Gap: Open-source SLMs are catching up to their closed-source counterparts in common sense tasks, though complex reasoning remains a challenge.
Hardware Matters: SLM performance isn't just about size – it's also about how well the model architecture aligns with the hardware it's running on.
Memory Footprint: Generally, a model's memory usage scales linearly with its parameter count.

It's in this context that Microsoft's Phi-3 family enters the scene, ready to push the boundaries even further.

But before we dive into the nitty-gritty, let's decode some tech jargon:

Small Language Models (SLMs): These are new frontier of small AI models with capabilities similar to their larger counterparts. How small? from few 100M to ~10B parameters (compared to ~1.8T parameters in GPT-4). Examples include Gemma, Mistral, Phi-series of models, Llama-3, etc.

Attention: This technique is used in language models to help the model focus on different parts of the input text when making predictions. Think of it like a teacher who pays attention to different students based on who is asking questions. This helps the model understand context better.

Quantization: It's like putting your AI on a diet, switching from heavy 32-bit numbers to lighter 8-bit ones (or even smaller). Slimmer, faster, more efficient! , Perfect for running on devices with limited resources.

Mixture of Experts (MoE): Picture a team of specialists, each an expert in their field, with a smart gatekeeper deciding who tackles what, when a problem is thrown at it. That's MoE in action! Super-efficient even with complex tasks.

So, what’s new?

Phi-3 is part of Microsoft’s Phi series, which started with Phi-1 (with the seminal paper “Textbooks Are All You Need”) —an AI superstar in coding tasks—and evolved into Phi-2, which showcased superior reasoning and language understanding.

Now, the Phi-3 and Phi-3.5 families are here, and they’re not just competing—they're are punching way above their weight class (models >10x their size) in tasks like language understanding, coding, math, and reasoning.

Model	Parameters
phi-3-mini	3.8B
phi-3-small	7B
phi-3-medium	14B

Here's the kicker: The phi-3-mini model can be quantized to as little as 4 bits, making it lightweight enough to deploy on a smartphone. That’s less than 2 GB of memory—an absolute feat of engineering & data science!

4-bit quantized phi-3-mini running natively on an iPhone (source: Phi-3 paper)

Under the hood…

Phi-3’s secret? The models are designed with a highly optimized transformer-decoder architecture similar to Llama-2, boosted by some clever optimizations:

Blocksparse attention: Basically, instead of paying attention to every part of the input text in a conventional manner, this method skips parts intelligently, saving processing time without sacrificing accuracy. It’s a perfect fit for resource-constrained environments. (Interested in knowing more? Check out this deep dive)
High-Quality Data: Instead of force-feeding the model endless information, they curated a "gourmet" dataset. Instead of overtraining, the focus here was more on having a sufficiently optimal data mixture for training.
Post-Training Polish:
- Supervised Fine-Tuning (SFT): Extra lessons in math, coding, and reasoning.
- Direct Preference Optimization (DPO): Teaching the model to be a responsible, chat-savvy AI citizen (yes, these models were developed using a safety-first mindset, with a strong focus on responsible AI principles)

Comparing the quality of Phi-3 models against competing SLMs (source: azure.microsoft.com)

But wait, there’s more!

Microsoft didn't stop there. They've also introduced the Phi-3.5 series, designed to tackle multilingual, multimodal, and long-context tasks:

Model	Parameters
phi-3.5-mini	3.8B
phi-3.5-MoE	6.6B (active)
phi-3.5-Vision	4.2B

Beyond the previous techniques used to train Phi-3 models, these models have the following upgrades:

Phi-3.5-mini uses “LongRoPE” technique to expand the context window from 4K to a whopping 128K tokens—giving it unprecedented ability to handle complex, long-form content.
Phi-3.5-MoE uses a clever "mix of experts" approach, activating only the 2 most relevant ‘experts’ among the 16 available, for each token. It uses the “SparseMixer” technique, which improves the calculation gradients (which tell the model how to adjust its parameters to improve) efficiently & scalably, even when only a subset of experts is active.

Phi-3-MoE’s performance across a range of tasks compared against other SLMs (source: techcommunity.microsoft.com)

Phi-3.5-Vision is a standout multimodal model in the family, blending CLIP ViT-L’s image understanding with the power of Phi-3.5’s natural language capabilities. It shines across tasks like image-text reasoning, PowerPoint parsing, and even video summarization. This model represents a significant advancement over its predecessors, offering improved performance and broader applicability while maintaining a relatively compact size.

Why does this matter?

These tiny titans outperform models twice their size (or more) across language, reasoning, coding, and math tasks. The Phi-3-mini is even giving GPT-3.5 a run for its money!

How Phi-3.5 models stack up against other SLMs (source: techcommunity.microsoft.com)

Phi-3 is already making a real-world impact. ITC, a major Indian business conglomerate, is already putting Phi-3 to work in their Krishi Mitra app, bringing AI-powered agricultural assistance to over a million farmers – even in areas with limited internet access.

Want to Try It Yourself?

You can! Microsoft has made Phi-3 available on the Azure AI Playground and Hugging Chat playground. Go ahead, give it a spin and see what this pocket-sized powerhouse can do!

Also, check out Microsoft’s Phi-3 Cookbook for various hands-on examples, to get started

As we watch the AI revolution unfold, one thing's becoming clear: sometimes, the biggest breakthroughs come in the smallest packages. So the next time someone asks if that's a supercomputer in your pocket... well, you just might be able to say yes!

Spark 'n' Trouble Shenanigans 😜

Boy, do we have a treat for you! We've been playing around with Moshi, the first real-time open-source voice AI by Kyutai Labs, and it's blown our circuits!

This open-source voice AI from Kyutai Labs is shaking things up with its real-time dialogue skills and the potential for local deployment on your laptops. With male (Moshiko) and female (Moshika) variants, plus the Mimi speech codec, there's plenty to explore.

While it's not perfect – it can be a bit abrupt and goes off on tangents – it's impressively expressive and spontaneous. Industry pros like Karpathy and Saravia have noted its quirks, but we think that's part of its charm.

Having a very similar experience as @karpathy with the Moshi agent.
Moshi is a bit abrupt, interrupts a lot, and ignores some questions in the conversation. I almost lost it in this short conversation I had with it. 😂
Lots of work to do but it's exciting to see the… x.com/i/web/status/1…
— elvis (@omarsar0)
10:11 PM • Sep 18, 2024

Give Moshi a whirl and let us know if you find it as endearingly chaotic as we do.
Does it live up to the hype, or is it just another voice in the crowd?

Well, that’s a wrap!
Thanks for reading 😊

See you next week with more mind-blowing tech insights 💻

Until then,
Stay Curious🧠 Stay Awesome🤩

PS: Do catch us on LinkedIn - Sandra & Tezan

Reply

or to participate.