Around the same time I was working on A.B.Z.D., a friend of mine told me about an upcoming AMV editing competition.
I’d been fiddling with Deepdreamgenerator’s deep-style tool for some time at this point, and found EbSynth. I did a quick look-around on Youtube for any significant AI-based music-videos and, at the time, there were none.
Some people had posted a few videos of AI art-generation layered upon itself to make an infinite-zoom effect, and there’d been one or two five-ish second clips of style transfer, but no music-videos.
Painter’s Dream used misty-mood as a style to base itself off of.
I didn’t want to compete with hard-cuts and tight sync, like a lot of the other videos in the competition would be focused on, so I tried to go laterally and make something that was at least unexpected.
It took a few tries to find the right style and figure out the aesthetic to go for, here are some test-runs I did with various styles
The AI was actually redrawing the given image, weighting itself towards the style passed. This redrawing was important, since it meant that it would actively try to make all objects in the image feel as if they actually existed in that space.
I had a great opportunity to blend several sources together.
Now, I had a dreamy artstyle where things blended together, and I could literally blend all sorts of sources together, even from really low-quality footage, since the AI redrew it all anyway. I had the chance to combine low-quality, old anime together. I’d use the most iconic scenes from each anime, essentially just harvesting the Opening Credits, since the stylisation would make more obscure-scenes much harder to identify. Most of these openings were in box-aspect-ratio, which I felt I could lean into for a melancholic mood.
This all pointed me towards 1979 by The Smashing Pumpkins, and once that clicked, all I had to do was slap together some sloppy masks of anime openings, and generate the stylised video.
It can help stabilise artsy-jitter, to get temporal coherence, by adding static-noise to the un-stylised input frames. EbSynth uses it to help keep track of where everything is on-screen, and thus needs to do a lot less guess-work.
Non-static-noise, the moving sort, is great to get the AI to just hallucinate. It’s especially useful for transitions.
I could generate about two stylised-keyframes per minute, manually queuing the styliser and waiting for it to finish each one (it could only do one image at a time), then saving and photoshopping out any artifacts while the next image was being stylised. I ended up making hundreds of keyframes to use as input to EbSynth, to generate roughly 12,000 stylised frames. There were only about 4000 frames in the final video, but I needed to blend between keyframes, both in and out, which meant 3x the number of required frames. I manually stacked each sequence atop each other sequence and setup the opacity to fade in and out for each one, exporting short but sizable lossless snippets from After Effects into Premiere Pro.
This was an extremely inefficient pipeline, but it was just fast enough to get the video done within the month.
My GPU and PSU both died during the making of this video, but I managed to find another computer to run EbSynth on and finish it just in time for the competition deadline.
Here’s the finished product.
Youtube’s compressor absolutely demolishes the stylisation effect, so for a better experience, watch in 1440p.
Painter’s Dream won Best Artistic Endeavor, for RICE 2021.
As far as AI-stylised videos will go in future, this one is extremely messy and unpolished. Completely thrown together, absolutely willy-nilly.
But! That doesn’t matter, because it was the first of its kind, and when you’re the first of your kind then people simply don’t know enough about you to notice how haggard you are.
As thrown-together as Painter’s Dream is, I haven’t been able to make something with as much impact, since. I moved in the opposite direction for the next competition, using only scribbles and a super-simple composition to get an idea across, as opposed to this effect-fest. Then in the next, I tried to improve the stylisation pipeline, but the resulting video lacked substance. It just felt derivative of Painter’s Dream, a hollow imitation. Fun to look at though.