The Vision, Debugged;
Posts
Anime Just Got a Whole Lot Cooler with AI! 😎

Anime Just Got a Whole Lot Cooler with AI! 😎

PLUS: How Does a Handheld Device Transform Reality? Find Out!

Tezan Sahu & Sandra Anil
June 18th, 2024

Howdy fellas!

Can you believe it’s been 6 years since GPT-1 first dropped? It’s wild to see how gen AI has evolved, and now, with mind-blowing advancements in text, image, and video generation, the fun just doesn’t stop!

Gif by jasperai on Giphy

Spark & Trouble are back with today's edition to dive into the latest AI marvel that’s got everyone buzzing. Buckle up!

Here’s a sneak peek into this week’s edition 👀

Deep dive into the research by Tencent Labs that can supercharge anime artists’ efforts
Our prompt that could help you prioritize features & tasks like a boss
3 incredible AI tools that you JUST can’t miss!
Check out Transferscope, a prototype that paves the way for ‘synthesized reality’

Time to jump in!😄

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

Hot off the Wires 🔥

We're eavesdropping on the smartest minds in research. 🤫 Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.⚡

Imagine being a kid again, sprawled on the couch, mesmerized by the antics of your favorite cartoon characters. From Dragon Ball Z's gravity-defying battles to the slapstick humor of Tom and Jerry, those animated worlds left an indelible mark on our childhood memories.

Dragon Ball Z Gif by xbox on Giphy

But have you ever wondered about the sheer effort that goes into creating just a few seconds of those magical cartoons?

Creating animations is no child's play. One of the real struggles for animators is “frame interpolation” - the process of generating those in-between frames that smoothly connect the keyframes. It's a laborious task that can consume up to a whopping 60% of an animation project's workload! 🥵

Did you know?
Even the legendary animator & manga artist Hayao Miyazaki (director of “The Boy & The Heron”, & countless other anime films) could only produce only ~5 min of animation per week due to this time-consuming “frame interpolation” process.

But what if there was a way to streamline this process, allowing aspiring animators to bring their visions to life faster?

Well, the Chinese AI Lab, Tencent, just released “ToonCrafter”, an AI tool that automatically generates all the in-between frames in animation just from the keyframes of images you provide!

You can try out ToonCrafter for FREE by either cloning their GitHub repository or using their online playground! The model is also available on Hugging Face for use.

Understanding the ‘jargon’

Before diving into ToonCrafter's magic, let's break down some key terms:

Frame Interpolation: This essentially means creating new frames between two existing ones. Think of it like adding missing pages in a flipbook to create a smoother animation. While live-action (using photography instead of animation) frame interpolation leverages real-world motion data, anime & cartoons present a unique challenge due to their stylized, exaggerated movements and high-contrast visuals.

Dis-occlusion: When an object is hidden behind another and then reappears, that's dis-occlusion. Traditional interpolation methods struggle with these situations, often creating nonsensical results.

Motion Priors: These are essentially pre-existing knowledge sets about how things move in the real world. For example, a car typically moves on the ground, not in the air.

Latent Diffusion Models (LDMs): These are powerful AI models that can generate new images or videos from a compressed representation called a latent space. To know more, head over to one of our previous editions, where we’ve explained latent spaces & diffusion process in detail.

So, what’s new?

Until recently, the general idea of frame interpolation relied on the “linear motion assumption” (think a car moving in a straight line), but this falls short of the dramatic leaps and bounds often seen in cartoons.

Moreover, most traditional techniques relying on motion priors from live-action videos don’t work well for cartoons. Imagine a character suddenly looking hyper-realistic in the middle of an anime fight! This is because of the "domain gap" between live-action and cartoons.

Additionally, LDMs tend to compress information significantly, causing loss of details.

ToonCrafter transcends these limitations by leveraging generative cartoon interpolation—synthesizing in-between frames that capture the unique styles & motions of anime by augmenting existing LDMs & training on a large dataset of anime footage.

Examples of generative cartoon interpolation by ToonCrafter, given the starting & end frames (source: ToonCrafter GitHub repo)

Under the hood…

ToonCrafter builds upon the DynamiCrafter model, an LDM designed for live-action videos. DynamiCrafter uses the first and last frames of a video to generate latent representations of the middle frames, which are then decoded into actual video frames. The model consists of:

Image-Context Projector: Digests the context of input frames.
Spatial Layers: Learn the appearance distribution of input frames, similar to Stable Diffusion v2.1.
Temporal Layers: Capture the motion dynamics between video frames.

However, simply applying this pre-trained LDM to cartoons wouldn't make a cut – due to the domain gap between live-action videos (which these models are trained on) and the vibrant, high-contrast world of animation.

Overview of the ToonCrafter model architecture (source: ToonCrafter paper)

Toon Rectification Learning

To address the domain gap, ToonCrafter employs Toon Rectification Learning, which works as follows:

The researchers first created a specialized Cartoon Video Dataset of raw cartoon videos, consisting of raw cartoon videos split into high-quality snippets. These snippets are captioned using BLIP-2 and annotated for the first, middle, and last frames.
The DynamiCrafter model was finetuned on this dataset using an efficient rectification learning strategy (because the cartoon dataset was much smaller than the original live-action dataset used to train DynamiCrafter). In practice, this involved freezing the temporal layers (since they retain real-world motion priors) while fine-tuning the other components on the cartoon video dataset.

Detail Injection While Decoding

LDMs can sometimes cause flickering, blurry, or distorted results due to losses in the latent space used for diffusion. ToonCrafter addresses this with a Dual-Reference-Based 3D Decoder. That might seem a mouthful, but here’s the essential idea - this decoder uses features from the first and final frames and injects them into the latent features of the intermediate frames.

Working of the “Detail Injection” in Decoder (source: ToonCrafter paper)

For the geeks who wish to probe in more, here’s how this works:

In shallow layers, it selectively focuses on relevant features from the reference frames using a cross-attention mechanism
In deeper layers, it attends to all features to enhance detail retention, maintaining consistency and reducing artifacts, using residual learning

What’s the intrigue?

ToonCrafter goes far beyond just interpolating between keyframes in an animation. For animators seeking greater control, it introduces sketch-based generation guidance.

Animators can now just sketch out rough ideas for a subset of intermediate frames, and the model will use your sketch as a guide during the generation process. To accomplish this, ToonCrafter uses a ControlNet-inspired “sketch encoder” that allows the conditioning of generated frames on these sketches, ensuring that the generated animations stay true to the artist's vision.

Sparse sketch guidance for anime generation (source: ToonCrafter GitHub repo)

Why does this matter?

The results speak for themselves: ToonCrafter outperforms existing methods by a significant margin, acing various metrics:

FVD (Frechet Video Distance): Measures the realism of generated videos (lower values indicate better performance)
KVD (Kernel Video Distance): Evaluates the consistency of the generated frames
Cosine Similarities: Assesses how well the generated frames align with text prompts or input frames (higher values indicate better alignment)
CPBD (Cumulative Probability Blur Detection): Detects the sharpness of the frames (the higher, the better)

User studies further highlight ToonCrafter's superiority, with participants consistently preferring its outputs over other methods.

But beyond the numbers, ToonCrafter's true potential lies in its ability to revolutionize the animation industry. With its success, we can envision a future where AI assistants become indispensable tools in the animator's toolkit. From real-time animation previews to personalized cartoon avatars, the possibilities are endless. Imagine interactive storybooks that come alive with AI-generated animations, or even immersive virtual worlds where your favorite cartoon characters can interact with you!

Spark & Trouble are eagerly looking forward to seeing the release of some anime with “animated by ToonCrafter” in the credits 😁

Key Takeaways

(Screenshot this!)

Interdisciplinary Innovation: ToonCrafter's integration of generative models, motion priors, and cartoon-specific adaptations showcases the potential of interdisciplinary approaches in AI research.

Advanced Modeling Techniques: Leveraging latent spaces and innovative decoding strategies can overcome traditional limitations in frame interpolation.

User Control and Adaptability: The sketch-based generation guidance exemplifies how AI can be adapted to provide user-friendly, controllable solutions in creative domains.

10x Your Workflow with AI 📈

Work smarter, not harder! In this section, you’ll find prompt templates 📜 & bleeding-edge AI tools ⚙️ to free up your time.

Fresh Prompt Alert!🚨

Do you ever feel like a juggling act between features, ideas, and endless to-do lists?

We get it.

That’s why this week’s Fresh Prompt Alert is a game-changer. Transform into an expert product manager with industry wisdom, armed with a proven prioritization framework.

Dive in and prioritize like a pro👇

Act as an expert product manager having a solid experience in the industry, working with highly successful products.

Consider the following:
- Vision of the Company: [specify your company goals or product vision]
- Prioritization Framework: [specify a prioritization framework, e.g. RICE scoring model, ICE scoring model, Value vs.Complexity Quadrant, Kano Model, Weighted Scoring Prioritization, the MoSCoW method, Opportunity Scoring]

Based on the above, prioritize the following:
[list all features or viable initiatives that you'd like to prioritize]

* Replace the content in brackets with your details

3 AI Tools You JUST Can't Miss 🤩

🛒 TextBrew AI - Automated product description generation using unique product identifiers
✅ SmasherOfOdds MVP Machine - 100% FREE Product Plan Generator
🪄 PicToChart - Generate captivating infographics by describing your topic

Spark 'n' Trouble Shenanigans 😜

Have you heard about this incredible new invention? It's called Transferscope, and it's a handheld device that blends human creativity with artificial intelligence in the most mind-blowing way, paving a way for Synthesized Reality!

With just a simple click, you can capture any object or concept, and bam! It seamlessly blends it into any scene, creating entirely new and imaginative realities right before your eyes!

Transferscope v1 handheld device (source: aid-lab.hfg-gmuend.de)

Powered by a Raspberry Pi and leveraging cutting-edge tech like Kosmos 2 image interpreter and Stable Diffusion with ControlNet, this little marvel can transform captured images in under a second!

Can you believe that? It's like having a magic wand that can bend reality to your will.

Transferscope

Transferscope - A generative AI exploration device. Sample anything. Transform everything. https://aid-lab.hfg-gmuend.de/articles/transferscope Transferscope is…

vimeo.com/929277009

While the tech behind the device mey be complex, the UX is simple and intuitive. This prototype is a great example of the next generation portable technology, where on-device AI-inferencing frees up the user to interact more fully and naturally with their environment and subject.

Well, that’s a wrap!
Thanks for reading 😊

See you next week with more mind-blowing tech insights 💻

Until then,
Stay Curious🧠 Stay Awesome🤩

PS: Do catch us on LinkedIn - Sandra & Tezan

Reply

or to participate.