Is VisionDroid the Missing Piece in Your App Testing Strategy?

PLUS: How did AI turn a Lemon into a Building? Find Out...

Howdy fellas!

Spark can never quite relate to Trouble's endless rants about the drudgery of testing. But when Spark, weary of Trouble's grumbles, shared a cool new innovation, Trouble's eyes lit up like never before.

Big Eyes Wow GIF by Cartoon Network

Gif by cartoonnetwork on Giphy

Curious? Get ready to explore this cool new innovation that's set to change the game for mobile GUI testing!

Hereā€™s a sneak peek into todayā€™s edition šŸ‘€

  • Meet VisionDroid - the AI that caught 37 bugs humans missed ā€“ how?

  • Get fit fast - your AI personal trainer is just a prompt away

  • 3 AI Tools that you JUST cannot miss!

  • Man creates viral architectural designs using AI, inspired by common foods

Time to jump in!šŸ˜„

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

Hot off the Wires šŸ”„

We're eavesdropping on the smartest minds in research. šŸ¤« Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.āš”

Do you notice something peculiar in these two images?

Screenshot from TripAdvisor App (source: applitools.com)

Screenshot from Amazon app (source: applitools.com)

If you spotted that the ratings are overlaid with the hotel name in the first image and that the quantity popup went off the screen in the second image (preventing users from proceeding to buy stuff), then bingo!

These are glaring examples of non-crash UI bugs in Android apps.

Although these screenshots date back to 2018, many of us experience similar bugs while navigating through the interface of some or the other apps. These embarrassing (and sometimes frustrating) bugs highlight a critical, yet often dreaded, aspect of software development: Testing.

Most developers acknowledge the need for extensive testing but find it absolutely mundane & tend to prioritize other tasks.

A recent survey showed that more than 44% of developers dedicate less than 20% of time to testing.

Also, more than 40% developers feel that their products undergo inadequate testing before being pushed to production.

Thankfully, the world of AI is offering a helping hand. Tools like GitHub Copilot have made it a breeze to automate writing unit tests, etc. for your codebase.

But what about the automation of UI testing? Enter VisionDroid, a tidy, new vision-driven approach to automated UI testing using the power of multimodal LLMs!

Hold on a second... VisionDroid? A vision-driven approach to debugging? Is it just us, or does this sound eerily familiar to our beloved newsletter, "The Vision, Debugged"? šŸ¤” Looks like great minds think alike ā€“ or debug alike!šŸ˜‚

Forging the Fundamentals

All jokes aside, VisionDroid is seriously cool stuff. But before we dive into VisionDroid's magic, let's break down some key terms:

Graphical User Interface (GUI): It is a way for users to interact with computer programs using visual elements like windows, buttons, and menus. Your desktops, mobile phones, etc. - all involve GUI-based interactions, compared to those older text-based interfaces where geeks would have to type in complex commands to get stuff done.

Automated GUI Testing: The process of using software to mimic user actions and verify that an application's visual interface, like buttons and menus, functions correctly

Functional Bugs: These include issues that affect how an app works - its functionality (for example, buttons that don't work or calculations that are wrong), rather than its performance or security

Non-Crash Bugs: These are software glitches that cause unexpected behavior without completely shutting down the program. They could be intra-page (related to single GUI page - mostly display related) or inter-page (involve multiple GUI screens, in a sequence)

Few-Shot Prompting: A technique where an LLM is given a few examples within the prompt itself. This helps the model understand the desired format and respond more accurately, especially for complex tasks. To learn more, check out this tutorial.

Louvain Algorithm: A method for detecting communities in large networks by optimizing a score that measures how well connected the nodes within a community are compared to random connections. For the geeks who wish to know the nitty-gritty details, check out this video.

Crash bug

Intra-Page Non-Crash Bug

Inter-Page Non-Crash Bug

So, whatā€™s new?

Now, you might be thinking, "Don't we already have automated testing?" Well, yes, but there's a catch. Most current automated UI testing techniques are great at catching ā€œcrash bugsā€ ā€“ those frustrating moments when an app suddenly closes. But they often miss subtler, non-crash bugs that can be just as annoying for users.

These sneaky bugs might cause misaligned buttons, overlapping text, or a sequence of interactions across multiple screens that doesn't actually crash the app. Spotting these issues typically requires human testers, which is time-consuming and doesn't scale well as apps grow more complex.

Researchers found that a whopping 51% of bugs reported in popular Google Play Store apps were non-crash, functional issues.

This is where VisionDroid shines. Developed by a team of innovative researchers, it uses the power of multimodal large language models (MLLMs) ā€“ specifically, GPT-4 ā€“ to detect non-crash, functional bugs in mobile apps.

Under the hoodā€¦

Now onto understanding how VisionDroid tackles the challenge of identifying non-crash bugs.

Overview of VisionDroidā€™s architecture (source: VisionDroid paper)

Text-Image Alignment

This first step is like giving the AI a detailed map of the app's interface

  • Basic information about app, the various GUI pages & information about the multiple widgets on each GUI page are extracted from the manifest file & the view hierarchy file of the app

  • Screenshots of the GUI pages are captured & the various widgets are annotated using bounding boxes, having colors based on the widgetā€™s functionality

  • Most of this is done using a Python wrapper of UIAutomator

  • This alignment ensures that GPT-4 has a clear understanding of the GUIā€™s layout and elements

Example of image-text alignment (source: VisionDroid paper)

Function-Aware Explorer

Here's where things get really coolā€¦

  • GPT-4 is prompted to automatically explore various functionalities by using the aligned text & visual information to construct prompts to identify the functionality being tested & the next potential action (like tapping, swiping, and interacting with different features)

  • This logical sequence of GUI page screenshots, along with the textual descriptions obtained during exploration are stored in a testing history database

  • During this exploration, VisionDroid also employs a few-shot prompt to identify intra-page non-crash bugs within the individual screenshots (if youā€™re curious, then the examples of intra-page bugs for the examples in the few-shot prompt are obtained from a ā€˜bug example databaseā€™)

Logic-Aware Bug Detector

Now for the really clever part. Finding bugs that span across multiple screens (inter-page bugs) requires a more sophisticated approach, so hereā€™s what VisionDroid doesā€¦

  • The sequence of screenshots in the testing history database is converted into a graph:

    • Nodes represent screenshots, i.e., the UI page being explored

    • Edges signify transitions between pages

    • Edge weights are assigned based on the semantic similarity of function names, generated by the exploration prompt in function-aware explorer

  • The Louvain algorithm is applied to detect communities within this graph, highlighting logical sub-sequences of related screenshots

  • These sub-sequences are then analyzed by GPT-4 using another few-shot prompt to spot inter-page bugs

Example of sub-sequence segmentation using Louvain algorithm on the testing history sequence (source: VisionDroid paper)

Feel free to dive into their code to understand more about these various processes.

Why does this matter?

VisionDroid significantly outperforms existing deep learning-based UI issue detection techniques and automated GUI testing tools.

When taken for a spin in the wild, VisionDroid was able to identify 83 non-crash bugs across 105 popular apps from Google Play Store (including Expensify, Sygic, DigiCal, etc.). Even more impressive, 37 of these were completely new issues that had slipped past human testers and other automated tools till date!

Developers of respective apps have already fixed 10 of these bugs and confirmed 9 more, proving VisionDroid's real-world value.

Imagine the potential impact on companies like Uber, Airbnb, or mobile banking apps. These businesses rely heavily on their mobile interfaces, and even small bugs can lead to frustrated users and lost revenue.

Did you know?

About 40% of US-based mobile users will uninstall an app if it has too many software issues and go to a competitor

VisionDroid could be a game-changer in these high-stakes environments, catching issues before they ever reach users, potentially saving millions in support costs and lost business.

Spark & Trouble are rooting for this amazing innovation to make its way into the list of most used test-automation frameworks very soon. Who knows? Maybe one day, "testing" will no longer be a dreaded word in the developer's vocabulary! šŸ˜‰

Key Takeaways

(screenshot this!)

Multimodal AI Power: VisionDroid showcases how combining visual and textual understanding can tackle complex real-world problems

Clever Data Structuring: The use of graph theory to group related app screens demonstrates the importance of intelligent data organization in AI applications

Few-Shot Learning: VisionDroid's success with few-shot prompting highlights the power of this technique for adapting large language models to specialized tasks

10x Your Workflow with AI šŸ“ˆ

Work smarter, not harder! In this section, youā€™ll find prompt templates šŸ“œ & bleeding-edge AI tools āš™ļø to free up your time.

Fresh Prompt Alert!šŸšØ

Feeling like a couch potato with dreams of becoming a gym rat? šŸ„”šŸ’Ŗ We've got just the ticket!

This week's Fresh Prompt Alert is your personal fitness guru in a box. Whether you're aiming to sculpt abs of steel or just want to touch your toes without pulling a muscle, this prompt's got your back (and your core, and your glutes).

So, grab your sweatbands and protein shakes, because it's time to turn those fitness fantasies into sweaty realities! šŸ‘‡

You are an expert fitness coach.

I am [mention the problem youā€™re facing in detail with context].

Generate a challenging yet engaging ā€˜Workout Challenge of the Weekā€™ focused on [state fitness goal]. Ensure itā€™s suitable for someone with access to [mention available equipment or no equipment].

I want you to [mention how you want the output in detail with examples].

* Replace the content in brackets with your details

3 AI Tools You JUST Can't Miss šŸ¤©

  • šŸ“ƒ Alva AI - Your trusted co-pilot system for daily tasks management

  • šŸ“ˆ Bitscale - Empower growth teams to research, personalize and generate content at scale

  • šŸ“ InspNote - AI-powered tool to capture spontaneous thoughts and ideas

Spark 'n' Trouble Shenanigans šŸ˜œ

ā€œWhen life gives you lemonsšŸ‹, make ā€¦ buildingsšŸ¢ā€

Sounds out of place? šŸ¤” Well, with AI, today almost anything is possible! Check out these jaw-dropping AI-powered building designs created by architectural designer Fatih Ekși using food as inspiration (biomimetic architecture on a whole new level), thatā€™s definitely creating waves on Instagram! šŸ‘‡

From a ā€œlemonā€ ā€¦ (source: Instagram)

ā€¦ to this wonder (source: Instagram)

From a ā€œtomatoā€ ā€¦ (source: Instagram)

ā€¦ to this marvel (source: Instagram)

From an ā€œonionā€ ā€¦ (source: Instagram)

ā€¦ to this spectacle (source: Instagram)

Fatih hasnā€™t revealed the exact tools that he has used to come up with these mind-bending concepts. Whatā€™s your guess? Reply to us with your guesses šŸ˜‰

This is an incredible example of how AI can be used as a powerful tool for creatives, enabling design beyond what was possible till today! Simply bewildering šŸ¤Æ

Well, thatā€™s a wrap!
Thanks for reading šŸ˜Š

See you next week with more mind-blowing tech insights šŸ’»

Until then,
Stay CuriousšŸ§  Stay AwesomešŸ¤©

PS: Do catch us on LinkedIn - Sandra & Tezan

Reply

or to participate.