Anime Just Got a Whole Lot Cooler with AI! šŸ˜Ž

PLUS: How Does a Handheld Device Transform Reality? Find Out!

Howdy fellas!

Can you believe itā€™s been 6 years since GPT-1 first dropped? Itā€™s wild to see how gen AI has evolved, and now, with mind-blowing advancements in text, image, and video generation, the fun just doesnā€™t stop!

Gif by jasperai on Giphy

Spark & Trouble are back with today's edition to dive into the latest AI marvel thatā€™s got everyone buzzing. Buckle up!

Hereā€™s a sneak peek into this weekā€™s edition šŸ‘€

  • Deep dive into the research by Tencent Labs that can supercharge anime artistsā€™ efforts

  • Our prompt that could help you prioritize features & tasks like a boss

  • 3 incredible AI tools that you JUST canā€™t miss!

  • Check out Transferscope, a prototype that paves the way for ā€˜synthesized realityā€™

Time to jump in!šŸ˜„

PS: Got thoughts on our content? Share 'em through a quick survey at the end of every edition It helps us see how our product labs, insights & resources are landing, so we can make them even better.

Hot off the Wires šŸ”„

We're eavesdropping on the smartest minds in research. šŸ¤« Don't miss out on what they're cooking up! In this section, we dissect some of the juiciest tech research that holds the key to what's next in tech.āš”

Imagine being a kid again, sprawled on the couch, mesmerized by the antics of your favorite cartoon characters. From Dragon Ball Z's gravity-defying battles to the slapstick humor of Tom and Jerry, those animated worlds left an indelible mark on our childhood memories.

Dragon Ball GIF by Xbox

Dragon Ball Z Gif by xbox on Giphy

But have you ever wondered about the sheer effort that goes into creating just a few seconds of those magical cartoons?

Creating animations is no child's play. One of the real struggles for animators is ā€œframe interpolationā€ - the process of generating those in-between frames that smoothly connect the keyframes. It's a laborious task that can consume up to a whopping 60% of an animation project's workload! šŸ„µ

Did you know?
Even the legendary animator & manga artist Hayao Miyazaki (director of ā€œThe Boy & The Heronā€, & countless other anime films) could only produce only ~5 min of animation per week due to this time-consuming ā€œframe interpolationā€ process.

But what if there was a way to streamline this process, allowing aspiring animators to bring their visions to life faster?

Well, the Chinese AI Lab, Tencent, just released ā€œToonCrafterā€, an AI tool that automatically generates all the in-between frames in animation just from the keyframes of images you provide!

You can try out ToonCrafter for FREE by either cloning their GitHub repository or using their online playground! The model is also available on Hugging Face for use.

Understanding the ā€˜jargonā€™

Before diving into ToonCrafter's magic, let's break down some key terms:

Frame Interpolation: This essentially means creating new frames between two existing ones. Think of it like adding missing pages in a flipbook to create a smoother animation. While live-action (using photography instead of animation) frame interpolation leverages real-world motion data, anime & cartoons present a unique challenge due to their stylized, exaggerated movements and high-contrast visuals.

Dis-occlusion: When an object is hidden behind another and then reappears, that's dis-occlusion. Traditional interpolation methods struggle with these situations, often creating nonsensical results.

Motion Priors: These are essentially pre-existing knowledge sets about how things move in the real world. For example, a car typically moves on the ground, not in the air.

Latent Diffusion Models (LDMs): These are powerful AI models that can generate new images or videos from a compressed representation called a latent space. To know more, head over to one of our previous editions, where weā€™ve explained latent spaces & diffusion process in detail.

So, whatā€™s new?

Until recently, the general idea of frame interpolation relied on the ā€œlinear motion assumptionā€ (think a car moving in a straight line), but this falls short of the dramatic leaps and bounds often seen in cartoons.

Moreover, most traditional techniques relying on motion priors from live-action videos donā€™t work well for cartoons. Imagine a character suddenly looking hyper-realistic in the middle of an anime fight! This is because of the "domain gap" between live-action and cartoons.

Additionally, LDMs tend to compress information significantly, causing loss of details.

ToonCrafter transcends these limitations by leveraging generative cartoon interpolationā€”synthesizing in-between frames that capture the unique styles & motions of anime by augmenting existing LDMs & training on a large dataset of anime footage.

Examples of generative cartoon interpolation by ToonCrafter, given the starting & end frames (source: ToonCrafter GitHub repo)

Under the hoodā€¦

ToonCrafter builds upon the DynamiCrafter model, an LDM designed for live-action videos. DynamiCrafter uses the first and last frames of a video to generate latent representations of the middle frames, which are then decoded into actual video frames. The model consists of:

  • Image-Context Projector: Digests the context of input frames.

  • Spatial Layers: Learn the appearance distribution of input frames, similar to Stable Diffusion v2.1.

  • Temporal Layers: Capture the motion dynamics between video frames.

However, simply applying this pre-trained LDM to cartoons wouldn't make a cut ā€“ due to the domain gap between live-action videos (which these models are trained on) and the vibrant, high-contrast world of animation.

Overview of the ToonCrafter model architecture (source: ToonCrafter paper)

Toon Rectification Learning

To address the domain gap, ToonCrafter employs Toon Rectification Learning, which works as follows:

  1. The researchers first created a specialized Cartoon Video Dataset of raw cartoon videos, consisting of raw cartoon videos split into high-quality snippets. These snippets are captioned using BLIP-2 and annotated for the first, middle, and last frames.

  2. The DynamiCrafter model was finetuned on this dataset using an efficient rectification learning strategy (because the cartoon dataset was much smaller than the original live-action dataset used to train DynamiCrafter). In practice, this involved freezing the temporal layers (since they retain real-world motion priors) while fine-tuning the other components on the cartoon video dataset.

Detail Injection While Decoding

LDMs can sometimes cause flickering, blurry, or distorted results due to losses in the latent space used for diffusion. ToonCrafter addresses this with a Dual-Reference-Based 3D Decoder. That might seem a mouthful, but hereā€™s the essential idea - this decoder uses features from the first and final frames and injects them into the latent features of the intermediate frames.

Working of the ā€œDetail Injectionā€ in Decoder (source: ToonCrafter paper)

For the geeks who wish to probe in more, hereā€™s how this works:

  • In shallow layers, it selectively focuses on relevant features from the reference frames using a cross-attention mechanism

  • In deeper layers, it attends to all features to enhance detail retention, maintaining consistency and reducing artifacts, using residual learning

Whatā€™s the intrigue?

ToonCrafter goes far beyond just interpolating between keyframes in an animation. For animators seeking greater control, it introduces sketch-based generation guidance.

Animators can now just sketch out rough ideas for a subset of intermediate frames, and the model will use your sketch as a guide during the generation process. To accomplish this, ToonCrafter uses a ControlNet-inspired ā€œsketch encoderā€ that allows the conditioning of generated frames on these sketches, ensuring that the generated animations stay true to the artist's vision.

Sparse sketch guidance for anime generation (source: ToonCrafter GitHub repo)

Why does this matter?

The results speak for themselves: ToonCrafter outperforms existing methods by a significant margin, acing various metrics:

  • FVD (Frechet Video Distance): Measures the realism of generated videos (lower values indicate better performance)

  • KVD (Kernel Video Distance): Evaluates the consistency of the generated frames

  • Cosine Similarities: Assesses how well the generated frames align with text prompts or input frames (higher values indicate better alignment)

  • CPBD (Cumulative Probability Blur Detection): Detects the sharpness of the frames (the higher, the better)

User studies further highlight ToonCrafter's superiority, with participants consistently preferring its outputs over other methods.

But beyond the numbers, ToonCrafter's true potential lies in its ability to revolutionize the animation industry. With its success, we can envision a future where AI assistants become indispensable tools in the animator's toolkit. From real-time animation previews to personalized cartoon avatars, the possibilities are endless. Imagine interactive storybooks that come alive with AI-generated animations, or even immersive virtual worlds where your favorite cartoon characters can interact with you!

Spark & Trouble are eagerly looking forward to seeing the release of some anime with ā€œanimated by ToonCrafterā€ in the credits šŸ˜

Key Takeaways

(Screenshot this!)

Interdisciplinary Innovation: ToonCrafter's integration of generative models, motion priors, and cartoon-specific adaptations showcases the potential of interdisciplinary approaches in AI research.

Advanced Modeling Techniques: Leveraging latent spaces and innovative decoding strategies can overcome traditional limitations in frame interpolation.

User Control and Adaptability: The sketch-based generation guidance exemplifies how AI can be adapted to provide user-friendly, controllable solutions in creative domains.

10x Your Workflow with AI šŸ“ˆ

Work smarter, not harder! In this section, youā€™ll find prompt templates šŸ“œ & bleeding-edge AI tools āš™ļø to free up your time.

Fresh Prompt Alert!šŸšØ

Do you ever feel like a juggling act between features, ideas, and endless to-do lists?

We get it.

Thatā€™s why this weekā€™s Fresh Prompt Alert is a game-changer. Transform into an expert product manager with industry wisdom, armed with a proven prioritization framework.

Dive in and prioritize like a prošŸ‘‡

Act as an expert product manager having a solid experience in the industry, working with highly successful products.

Consider the following:
- Vision of the Company: [specify your company goals or product vision]
- Prioritization Framework: [specify a prioritization framework, e.g. RICE scoring model, ICE scoring model, Value vs.Complexity Quadrant, Kano Model, Weighted Scoring Prioritization, the MoSCoW method, Opportunity Scoring]

Based on the above, prioritize the following:
[list all features or viable initiatives that you'd like to prioritize]

* Replace the content in brackets with your details

3 AI Tools You JUST Can't Miss šŸ¤©

  • šŸ›’ TextBrew AI - Automated product description generation using unique product identifiers

  • āœ… SmasherOfOdds MVP Machine - 100% FREE Product Plan Generator

  • šŸŖ„ PicToChart - Generate captivating infographics by describing your topic

Spark 'n' Trouble Shenanigans šŸ˜œ

Have you heard about this incredible new invention? It's called Transferscope, and it's a handheld device that blends human creativity with artificial intelligence in the most mind-blowing way, paving a way for Synthesized Reality!

With just a simple click, you can capture any object or concept, and bam! It seamlessly blends it into any scene, creating entirely new and imaginative realities right before your eyes!

Transferscope v1 handheld device (source: aid-lab.hfg-gmuend.de)

Powered by a Raspberry Pi and leveraging cutting-edge tech like Kosmos 2 image interpreter and Stable Diffusion with ControlNet, this little marvel can transform captured images in under a second!

Can you believe that? It's like having a magic wand that can bend reality to your will.

While the tech behind the device mey be complex, the UX is simple and intuitive. This prototype is a great example of the next generation portable technology, where on-device AI-inferencing frees up the user to interact more fully and naturally with their environment and subject.

Well, thatā€™s a wrap!
Thanks for reading šŸ˜Š

See you next week with more mind-blowing tech insights šŸ’»

Until then,
Stay CuriousšŸ§  Stay AwesomešŸ¤©

PS: Do catch us on LinkedIn - Sandra & Tezan

Reply

or to participate.