Riding the Wave: Inside the First Generation of AI Filmmaking
- Bella Boak Weinstein
- Nov 11, 2025
- 5 min read
Updated: Jan 13
At Gold Leader AI, we’ve always known filmmaking is equal parts beauty and brutality. Even before artificial intelligence entered the picture, it was a craft built on problem-solving, improvisation, technical headaches, and the occasional descent into madness. Now that AI is part of the filmmaking process, the chaos hasn’t lessened, it has simply evolved into something stranger, faster, and unexpectedly energizing.

In our first-ever podcast, the AI Film Podcast (that’s already out of date), our Director of AI Innovation, Robert Wald, sat down with filmmaker and editor Jake Serlin to talk honestly, sometimes jokingly, sometimes begrudgingly, about what it actually feels like to make films with today’s AI tools. What emerged wasn’t a lecture or a manifesto. It was a raw conversation between two craftsmen trying to understand the new future of their own medium.
“Hey man, I don't like the computers taking over the world as much as the next guy, but I would also rather be on the boat than in the water.”
Jake said it laughing, but the truth behind it runs deep. Everyone senses that AI is accelerating. Ignoring it won’t stop the tide. The question is whether we learn to use the tools, or get swept under by them.
Commercials: The First Battleground
Much of our current exploration happens in commercials, not because they’re the peak of artistry, but because they’re the best place to pressure-test an emerging medium. Jake recently built an advertisement using generative tools, pushing a simple question: Can an AI-generated spot feel like it was filmed with scripting and actors?
Doing that requires clever prompting, classic editing, and the willingness to endure what Jake described as “fascinating and frustrating” swings between brilliance and disaster. Commercials work because they don’t demand rigid continuity. They can be abstract. Their visual logic is looser, and their emotional logic relies more on impact than on storytelling structure.
They also hide one of AI’s biggest current weaknesses: consistency. Beyond eight, twenty, or thirty seconds, keeping characters stable is still fragile. Commercials are short enough to brute-force it, and strange enough to let the chaos become an advantage.
Jake put it plainly: “Spec commercials are where it's at right now… as an advertisement for yourself that's going to get attention.”
The Veo3 Shift
A major part of the conversation revolved around Veo3, a model that has made a genuine leap in generative filmmaking—particularly in articulated speech. Historically, the workflow required generating silent video, then separate audio, then gluing them together with lip-sync software. Even when it worked, something always felt off. The body didn’t move like a person speaking. The head didn’t pivot. The chest didn’t rise. The uncanny valley remained intact.
Veo3 changed that by generating the performance, the face, the neck, the chest, and the voice all at once. As Jake described it, Veo3 “includes all of the ingredients of the person's movement,” unlike dubbing tools that isolate only the face.
Before Veo3’s latest update, there was a hard cap: eight seconds of articulated speech per clip. Anything longer required hacks and stitching. Now, with the introduction of image prompting, you can drop in a frame of two people and instruct: make the person on the right say this. It’s still not as strong as the model’s text-only generations, but it’s finally possible to create longer dialogue sequences with recognizably consistent characters.
As Robert summed up, “Sound is the most important part of video.”
Integrated audio allows integrated performance, and that’s what pulls a shot out of the valley of the uncanny.
The Quiet Feeling of Wrongness
Even with these improvements, the uncanny valley hasn’t evaporated. Jake described generating a scene with eight people. When focused on the main character, everything felt fine, until he realized four people at the edges of the frame were making the exact same expression, hands posed identically. Viewers may not consciously detect it, but “somewhere in the back of their brains they go, something about this image isn’t lining up.”
AI video often works beautifully on a phone screen. Move it to a large monitor or TV, and the seams show. Realism is rising, but inconsistency still whispers through the frame.
Emotion, Animation, and Why Humans Still Matter
Jake was clear about one emotional limit: “I still have yet to see an AI performance… that makes me feel a thing.” He’s been impressed plenty of times. But not moved.
Part of that might be awareness of the artifice. Part might be the slight coldness emerging whenever humans are rendered too realistically. Interestingly, animation may be the first domain AI fully cracks. Stylization gives audiences psychological permission to forgive imperfections. A cartoon character doesn’t need perfect micro-motions to evoke emotion. A photoreal person does.
Robert pointed out that recent AI film festivals have already leaned into tearjerker stories. “Animation offers a certain level of forgiveness,” Jake noted, big eyes, stylization, and implied effort allow audiences to accept emotional cues more freely.
Copyright Wars and “Safe” Models
The discussion moved naturally into the legal battles heating up across the industry. Disney is already in litigation with Midjourney over copyrighted characters appearing in outputs. The debate is thorny: if a user prompts “a Mickey Mouse-like character,” and a model reproduces something too close, whose fault is that?
The model? The company? The user?
There’s no consensus. Some companies, like Moon Valley, are trying to sidestep the issue entirely by claiming their models train only on non-copyright data while offering advanced control tools. Historically, these datasets produced inferior visuals. Whether this new generation can overcome that limitation remains to be seen.
The Tidal Wave of Content
Democratizing creativity unleashes opportunity, and noise. AI makes it easy to flood the internet with content designed only to hit algorithms. Robert noted that YouTube recently stopped monetizing certain classes of AI-generated material, a sign that platforms are already reacting to the coming deluge.
But Jake pointed out that this isn’t new: corporate gatekeeping has shaped creative output long before AI. The difference now is the scale. And the speed. And the sheer volume of content that can be churned out by anyone.
Pivoting as a Creative Reflex
One of the strongest themes of the conversation was creative adaptability. AI outputs often fail to deliver exactly what you ask for, but sometimes they deliver something unexpectedly compelling. The true skill becomes knowing when to push the model harder and when to pivot gracefully.
As Jake said: “Plenty of times it didn't give me what I wanted, but it gave me something interesting.” Working with AI becomes a dance between control and surrender.
The Hybrid Future: AI Actors and Live Performance
Near the end of the conversation, Robert floated a vision for a hybrid future: live theaters equipped with LED volume walls where human actors perform alongside AI-generated characters backed by persistent LLM “brains.” These AI actors could appear in multiple productions, retain memories, take direction conversationally, and evolve over time.
The idea isn’t to remove humans, but to build new forms of interaction between human presence and synthetic performance. Think interviews with AI actors, live improvisation with virtual performers, hybrid theater experiences that merge real and digital in real time.
In a world of rising automation, audiences may seek the exact opposite: something alive, unpredictable, and deeply present. Jake predicted that live theater, smaller, more immediate, less commercial, may be ready for a comeback.
Where We Stand
Two years into modern AI filmmaking, the landscape shifts daily. Models update. Legal frameworks wobble. Techniques evolve. What was true yesterday becomes obsolete today. Keeping up is nearly a full-time job.
But one thing hasn’t changed.
Filmmaking still needs vision.It still needs taste.It still needs people who care deeply about what images do to us.
We’re not here to celebrate technology for its own sake. We’re here to explore the edges of what’s possible, to make work that moves people, and to stay human, especially when the tools aren’t.
This is only episode one. And if this conversation taught us anything, it’s that the story of AI filmmaking is only just beginning.
Comments