The Vision, Debugged;
Posts
Is TOM the Future of Wearable AI Technology?

Is TOM the Future of Wearable AI Technology?

PLUS: Unlock Event Planning Superpowers with This AI Prompt

Tezan Sahu & Sandra Anil
July 30th, 2024

Howdy fellas!

When it comes to AI, Spark and Trouble are buzzing with excitement!

This week, they're piecing together the future of wearable tech, exploring AI-powered productivity boosters, and cooking up the ultimate event-planning hack.

Get ready to upgrade your digital sidekick and supercharge your workflow!

Here’s a sneak peek into today’s edition 👀

Say Hello to TOM: The Wearable AI Assistant Platform of Tomorrow
Plan Like a Pro: Your AI Logistics Prompt Awaits
3 AI Tools to Supercharge Your Productivity

Time to jump in!😄

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

Hot off the Wires 🔥

We're eavesdropping on the smartest minds in research. 🤫 Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.⚡

Remember Tony Stark's AI assistant J.A.R.V.I.S. from the Iron Man movies?
(for the uninitiated, it assisted Stark in various tasks, including managing his household, providing data analysis, controlling the Iron Man suit, and offering strategic advice during combat)

Giphy

Spark and Trouble here are huge Iron Man fans. Like, really huge! So when they talk about AI assistants, they're basically channelling our inner Tony Stark & have always wondered “When can we get our own J.A.R.V.I.S?”

Well, while we're cherishing the impressive surge in wearable AI such as Humane AI Pin & Meta’s Ray Ban Smart Glasses & still waiting for our personal J.A.R.V.I.S, there's something pretty cool on the horizon that might just change the game…

Researchers from the National University of Singapore and the University of Hong Kong have developed a new platform that aims to be the foundation for creating a new generation of wearable AI assistants: TOM (The Other Me).

Forging the Fundamentals

Before we dive into TOM's impressive capabilities, let's break down some key concepts:

Wearable AI: Think beyond smart watches! These are devices like the Humane AI Pin, Microsoft’s Smart Backpack, etc. that integrate AI capabilities into everyday accessories.

Interaction Paradigms: These are ways in which we interact with technology. Some of the new ways include concepts like “Heads-Up Computing,” where devices provide information without requiring users to look down at screens, and “Dynamicland,” turns entire physical spaces into interactive computing environments.

Pervasive Augmented Reality (AR): Augmented Reality or AR overlays digital information, like images or sounds, onto the real world using devices like smartphones or AR glasses. Imagine looking at your living room through your phone and seeing a digital sofa that you can move around and see from different angles. Pervasive AR takes this a step further by making AR experiences available everywhere, seamlessly blending digital and real worlds throughout your daily life. For example, AR navigation might guide you through a shopping mall, or virtual information could pop up about a historical site as you walk by.

Mixed Reality (MR): Unlike AR, MR blends the real and virtual worlds, creating new environments where physical and digital objects coexist and interact in real-time. Imagine wearing glasses that let you see and manipulate virtual objects as if they were part of your physical environment.

Multimodal Interactions: This involves using multiple methods (or modes) to communicate and interact with technology - think about speaking commands while tapping a screen, using gestures & eye-movements to navigate interfaces or feeling vibrations & other feedback when seeing corresponding visuals

So, what’s new?

The era of smart wearables powered by generative AI is here (source: Analytics India Magazine)

Most of the current interaction paradigms face several limitations:

Lack of specific guidelines for developing context-aware AR systems.
Toolkits often support specific hardware but not general-purpose daily tasks.
Wearable AI systems are typically limited to speech interactions and offer minimal support for data analysis and visualization

Also, most wearable tech often fall short in providing truly intelligent and context-aware assistance. This is where TOM comes into the picture. TOM is not just another gadget – it's a comprehensive platform for creating and analyzing assistive applications. It aims to understand both context and user, supporting multimodal interactions with AR/MR devices and leveraging the latest in ML/AI technologies.

Under the hood…

TOM's architecture is designed to address three core stakeholder requirements:

Just-in-time Assistance (for Users): TOM allows users to interact naturally, providing both explicit (voice, gestures) and implicit (gaze, physiological data) inputs. The system understands user context to offer proactive, minimally intrusive assistance.
Data Recording and Analysis (for Researchers): TOM can record and visualize data for real-time and retrospective analysis, enabling researchers to train models and analyze their performance.
Ease of Development (for Developers): The platform simplifies integration of new devices and sensors, deployment of new assistance models, and access to existing data and models.

TOM supports various inputs from users, including explicit gestures & voice commands, implicit stuff like gaze direction & persistence, along with physiological data (such as heart rate, body temperature, etc.). To comprehend the context, it relies on visual and auditory scene understanding, social context & device availability.

High-level conceptual architecture of TOM (source: TOM paper)

TOM employs a client-server architecture, with the server handling complex computations and the client focusing on user interaction. This design ensures optimal performance and efficiency:

The server (implemented in Python) consists of three layers:
- Widgets: Listen for sensors and receive input data
- Processors: Transform and process input data
- Services: Produce desired outcomes using data from Widgets and Processors
Clients (developed using Unity3D & MRTK or WearOS) stream sensor data to the server and receive real-time feedback for actuators

Server (left) & client (right) architectures for TOM (source: TOM paper)

To understand the user and their environment, TOM employs a combination of AI technologies, including computer vision, speech recognition, natural language processing, and object tracking. These capabilities enable the platform to make intelligent decisions and provide relevant assistance.

The wide range of technologies supported in TOM (source: TOM paper)

Why does this matter?

The researchers have already developed some exciting proof-of-concept services using TOM:

Running Assistant: Offers personalized coaching, route options, proactive safety alerts, and post-run summaries. Imagine training for a marathon with a AI coach always by your side!
Translation and Querying Assistant: Translates and superimposes text from surroundings, provides information about objects based on gaze or voice commands, and allows follow-up questions for deeper understanding. How about exploring a foreign city now, where every sign and menu is instantly translated into your language?

The potential applications are vast. Here’s what we think businesses could use TOM to develop specialized assistants for:

Manufacturing: Providing workers with real-time instructions and safety alerts
Healthcare: Assisting doctors with patient information and treatment recommendations
Education: Creating immersive, personalized learning experiences
Customer service: Empowering representatives with instant access to product information and troubleshooting guides
Retail: Enhanced shopping experiences with product information and recommendations
Tourism: Immersive, informative guided experiences in any language

What’s the intrigue?

While TOM represents a significant step forward, there are still hurdles to overcome:

Seamless modality transitions and automatic switching between services
Battery life and power efficiency
Data privacy and security concerns

Despite these challenges, TOM represents a significant step forward in the development of wearable AI assistants.

As the code becomes publicly available on GitHub on August 1st, Spark & Trouble eagerly await the innovative applications developers and researchers will create using TOM.

Meanwhile, here are some of our ideas to enhance TOM even further:

How about integrating Small Action Models (instead of LLM/SLM) to improve interaction efficiency? (wondering what are small action models? Check out this previous edition to know more)

What if we could use TinyChatEngine instead of heavy LLMs on edge devices? Will that be able to shift the server towards more optimal performance?

10x Your Workflow with AI 📈

Work smarter, not harder! In this section, you’ll find prompt templates 📜 & bleeding-edge AI tools ⚙️ to free up your time.

Fresh Prompt Alert!🚨

Ever feel like herding cats would be easier than organizing your next big event? Well, Spark and Trouble have got your back!

This week's Fresh Prompt Alert turns you into a logistics wizard, capable of juggling attendees, venues, and catering like a pro. It's like having a personal event planner in your pocket, minus the fancy headset and clipboard.

Give it a whirl and watch your event planning skills level up!

I want you to act as a logistician.

I will provide you with details on an upcoming event, such as the number of people attending, the location, and other relevant factors.

Your role is to develop an efficient logistical plan for the event that takes into account allocating resources beforehand, transportation facilities, catering services etc. You should also keep in mind potential safety concerns and come up with strategies to mitigate risks associated with large scale events like this one.

My first request is "I need help organizing a [what type of gathering?] for [number of people] people in [enter location here]."

* Replace the content in brackets with your details

3 AI Tools You JUST Can't Miss 🤩

⚡ Thunderbit: 1-Click no code AI app and automation builder for business users
💪 SuperAnnotate: AI Data Platform for LLMs, Computer Vision & Natural Language Processing
💥 Komiko: Create Comics, Webtoon, and Manga with AI

Spark 'n' Trouble Shenanigans 😜

Have you heard about the latest buzz in the AI world? Runway Gen-3 Alpha is taking the internet by storm with its mind-blowing video creation skills! It's giving major SORA vibes, and everyone's going crazy for it!

Guess what? Spark & Trouble couldn't resist jumping on this hype train, and they're diving into Gen-3 too. And here's the best part – you can join in on the fun! 🎉

Want to start creating your own jaw-dropping videos? We've got you covered! Check out this awesome guide that'll walk you through everything you need to know about Runway Gen-3. It's short, sweet, and packed with all the good stuff.

Whether you're into artsy stuff, marketing, or just want to impress your friends, this is your ticket to video magic. Let's get creating! 🎬✨

Well, that’s a wrap!
Thanks for reading 😊

See you next week with more mind-blowing tech insights 💻

Until then,
Stay Curious🧠 Stay Awesome🤩

PS: Do catch us on LinkedIn - Sandra & Tezan

Reply

or to participate.