Step Inside Your PhotošŸ–¼ļø: GenExā€™s Mind-Blowing Tech

PLUS: 2.5 Days + AI = A Mind-Blowing Video Game (You Won't Believe This)

Howdy fellas!

This holiday season, Spark and Trouble stumbled upon a hidden map leading to some exciting innovations. Theyā€™re ready to unwrap what they uncoveredā€”come along for the ride!

Happy Santa Hat GIF by Finch Care

Gif by finchcare on Giphy

Hereā€™s a sneak peek into todayā€™s edition šŸ‘€

  • The futuristic tech behind GenEx, creating explorable 3D worlds from a single image

  • Turn your business ideas into unforgettable USPs with this prompt

  • 5 AI Tools That Are Making Waves Right Now

  • OpenAIā€™s Sora turns fantasy gaming dreams into instant reality

Time to jump in!šŸ˜„

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

Hot off the Wires šŸ”„

We're eavesdropping on the smartest minds in research. šŸ¤« Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.āš”

Ever wished you could step into the panoramic worlds of a video game like Minecraft or No Manā€™s Sky with just a snapshot?

Well, researchers from Johns Hopkins University have just brought us one step closer to that reality with GenEx - a groundbreaking AI system that can create explorable 3D worlds from a single image!

If you caught our previous edition about Imagine360 (check it out here!), you saw how AI can transform simple smartphone videos into immersive 360Ā° experiences.

GenEx takes this concept to a whole new level - not just creating panoramic views, but generating entire explorable worlds from just one image!

Think of it like this:

You show GenEx a photo of a street corner, and it doesn't just understand what's in the photo - it imagines and generates what's around that corner, behind you, and even down the block, creating a fully explorable 3D environment that maintains remarkable consistency with the original scene.

So, whatā€™s new?

Traditionally, AI struggled with generating dynamic worlds. Most approaches relied on static 3D models or limited pre-defined environments. This created bottlenecks for applications like robotics, VR/AR, or even autonomous navigation.

GenEx flips the script with fascinating techniques:

  • Dynamic Exploration: It doesnā€™t just create a static sceneā€”it builds worlds that evolve with every movement.

  • High Fidelity: Maintains consistency and realism in 3D environments, even for long explorations.

  • Generative Imagination: AI agents can simulate unseen parts of the world, make predictions, and refine decisionsā€”essentially acting like explorers with creative foresight.

Itā€™s a step toward giving AI the imaginative abilities humans use to navigate the world.

Forging the fundamentals

Before we dive in, here are some key terms simplified:

Explorable Generative World: A virtual environment created by AI that evolves as you explore it. Think of it like a video game world that grows and changes with your movements.

Panoramic Representations: There are three ways in which 360Ā° models like GenEx "sees" the world:
 āž¤ Cubemap: Imagine unfolding a cube into six square images that together show everything around you
 āž¤ Equirectangular Panorama: Like taking that cube and stretching it into a flat rectangle
 āž¤ Sphere: The same view wrapped around a virtual globe, as if you're standing in its center

Policy: The strategy guiding an AI agentā€™s actions, like where to go or how to explore.

Hereā€™s how the 3 interchangeable panoramic representations look like (source: GenEx paper)

Under the hoodā€¦

So how does GenEx pull off this magic? It combines two powerful components, and hereā€™s what the birdā€™s-eye-view of the overall flow looks likeā€¦

Our understanding of GenExā€™s method of generating an explorable world (source: created by authors)

World Initialization

Using a single image, GenEx generates a complete 360-degree panoramic view. This is achieved by combining text and visual cues with a state-of-the-art text-to-image model.

  • First, it uses physics engines like Unreal Engine & Unity to collect training data in the form of cubemaps, by following pre-defined trajectories

These cubemaps are projected to other 360Ā° representations at the time of video generation during exploration

  • Next, researchers augment an exiting text-to-panorama model, fine-tuned from FLUX.1 (this itself is super-cool, so do check it out) to also consider a starting image for consistent panorama generation

Check out this open-source superfast equirectangular panoramic view generator model
(source: created by authors)

World Transition

As the agent explores this intialized world, the environment dynamically updates. These explorations are usually sampled from a set of potential actions that a user can perform (like ā€˜walk straightā€™, ā€˜look downā€™, ā€˜pan outā€™, etc.).

A diffusion-based video model ensures seamless transitions, while advanced spherical learning techniques keep the visuals coherent, even when the agent takes sharp turns or long strides.

GenExā€™s diffusion-based video model is adapted from this overall architecture (source: Generative World Explorer paper)

This diffusion-based video model is adapted from one of this research groupā€™s older (well, not that old too!) projects: Generatve World Explorer

If you're intrigued and want the full breakdown, this paper is a must-check!

Exploration of the Generated Worlds

GenEx introduces 3 modes of exploring these explorable generative worlds:

  • Interactive Exploration: Control the agent manually, like in a sandbox game.

  • GPT-Assisted Free Exploration: Let GPT-4o chart the course for freely epxloring the world, with directions & distances

  • Goal-driven navigation: Given an explicit goal, such as ā€œmove to the blue carā€™s position & turn backā€, GPT performs the planning to allow the agent to choose actions & explore

Whatā€™s the intrigue?

One of the standout features of GenEx is its Imagination-Augmented Policy, where AI agents make decisions based on both real and imagined observations. Just like how humans imagine what might be around a corner before walking around it, GenEx can help AI agents "imagine" unseen parts of the environment to make better decisions!

  • Single-Agent Scenario: The agent simulates potential outcomes to make informed decisions. For instance, it might ā€œimagineā€ what lies ahead before deciding which path to take.

  • Multi-Agent Collaboration: Agents share their imagined perspectives, enabling them to work together seamlessly. Imagine a team of robots coordinating in a warehouse, each able to understand what the others can and cannot see!

An example of how a single LLM agent can imagine previously unobserved views to better understand the environment (source: GenEx paper)

How does this matter?

GenEx significantly outperforms current open-source models in maintaining 3D consistency and video quality.

It excels in applications like generating birdā€™s-eye views & creating 3D maps. It can maintain coherent environments even when exploring paths up to 20 meters long!

The applications could be mind-boggling:

šŸŽ® Gaming & VR: More immersive and dynamic virtual worlds that expand as you explore
šŸš— Autonomous Vehicles: Better navigation by "imagining" what's around corners
šŸ—ļø Urban Planning: Visualize how new buildings will affect city landscapes
šŸ¤– Robotics: Smarter robots that can navigate complex environments more naturally
šŸ“š Education: Interactive 3D learning environments from simple photographs

Companies like Epic Games, DeepMind, and even Teslaā€™s AI division could benefit from these advancements.

What to learn more about this innovative research?

āž¤ Check out the full technical writeup
āž¤ Explore examples on their project page
āž¤And the code? Well, thatā€™ll be shared soon. So, stay tunedā€¦

The future of AI exploration might not be about processing what we can see, but about imagining what we can't - just like humans do!

GenEx takes us one step closer to AI systems that don't just see the world, but understand and navigate it like we do.

What do you think about this technology?
Share your thoughts with Spark & Trouble! āœØ

10x Your Workflow with AI šŸ“ˆ

Work smarter, not harder! In this section, youā€™ll find prompt templates šŸ“œ & bleeding-edge AI tools āš™ļø to free up your time.

Fresh Prompt Alert!šŸšØ

Ever struggled to make your brand the talk of the town? Youā€™re not alone.

This weekā€™s Fresh Prompt Alert is here to save the day!

Whether you're a startup founder or just dreaming up your next big idea, this prompt is your go-to toolkit for crafting killer USPs that hit your audience right where it matters.

Say goodbye to generic pitches and hello to standout branding that resonates, differentiates, and delivers. Ready to unlock your brand's magic? šŸ‘‡

As a branding expert, your task is to help [insert company name], a business that offers [insert products/services], develop a compelling unique selling proposition (USP).

Analyze the companyā€™s target audience, which is [insert demographic details], and identify their main pain points and desires.

Consider the companyā€™s strengths, its competitorsā€™ weaknesses, and any unique features or benefits it offers.

Suggest 3 potential USPs that clearly communicate the companyā€™s value, differentiate it from competitors, and resonate with the target audience.

For each USP, provide a brief explanation of why it would be effective and how it aligns with the companyā€™s brand identity.

* Replace the content in brackets with your details

5 AI Tools You JUST Can't Miss šŸ¤©

  • šŸ’» ImgToCode: Transform your UI to Code

  • ā™»ļø Lido: Convert PDFs to Excel, fast

  • šŸ”— ShowHype: Create videos from URLs & images in a flash

  • šŸ–‹ļø Steer: Save hours of writing emails & messages

  • šŸ§‘šŸ¼ā€šŸ’¼ Pin: 10Ɨ boost in your recruiting efforts

Spark 'n' Trouble Shenanigans šŸ˜œ

OpenAIā€™s Sora just levelled up the AI game during its "12 Days of Ship-mas," and the result? Someone built Forever Landā€”a faux video game ad inspired by Little Big Planet, crafted in just 2.5 days.

Using AI tools like Sora, ChatGPT, and a sprinkle of Final Cut Pro magic, Chad Nelson conjured raccoons with goggles, octo-elephants with button eyes, and even 180-degree camera spinsā€”all while proving AI can go from "huh?" to "wow!" in seconds.

Watching the AI seamlessly craft worlds, animate characters, and stitch it all together felt like magicā€”and sparked a tiny debate between Spark & Trouble about who should claim credit if they had AI tools back in the day.

If youā€™ve ever dreamed of an AI sidekick that turns your wildest ideas into a polished product, check this out āœØ

Well, thatā€™s a wrap!
Thanks for reading šŸ˜Š

See you next week with more mind-blowing tech insights šŸ’»

Until then,
Stay CuriousšŸ§  Stay AwesomešŸ¤©

PS: Do catch us on LinkedIn - Sandra & Tezan

Reply

or to participate.