This post is available originally on my website. Check it out if you’d like!

In this post, I created a fully automated YouTube channel that posts new videos every day, using mostly free and open-source AI tools, earned a whopping 52 subscribers (and counting), and hated what I created. Let’s go.

TLDR

How It Started

Well, none of us missed the AI breakthroughs in the past few years, right?

It’s been an incredible experience seeing the evolution of AI technologies in the last year or two. It seems like every month there’s a bunch of new things to play around.

I’m something of an open-source enthusiast myself. I like free stuff, and I don’t mind a bit of jank.

An interesting application for LLMs I’ve found is that it has the knowledge of pretty much all the books you can think of, at least until its knowledge date cutoff.

I used that idea and created a whole book summary website with over 3500 summaries using an open-source AI which was a derivative of llama2 running locally and a few Python scripts written with the help of gpt4.

Then I realized I could take this to the next level by turning all these summaries into videos. Short-form videos are all the rage these days, right? All I need is a (preferably free/open source) text-to-speech model that I can hook into a Python script. The “motion picture” part of the video can just be, well, a picture of the book cover, since I figured people who want to listen to book summaries probably don’t care about the visuals that much. And then I figured Google would probably have a way for me to upload videos to YouTube via API (it does, which is more than I can say for literally all the other video platforms like TikTok and Facebook/Instagram reels)

Lucky me, the (sadly now shut down) CoquiTTS just released their latest, incredible open-source text-to-speech model called xtts.

(At least it’s open-source, so even though it’s shut down, we can still use and fork the latest build of the program which is still very good even for today)

It’s a huge step forward for the free & open-source TTS community, as before this, there was pretty much no free TTS that got even close to matching the performance/computational cost of this model. If you wanted good TTS, you had to pony up like at least 20$/month for something like ElevenLabs, which is probably the leading player in the TTS space.

Now that I got all the ingredients, I just need to cook up the scripts. I’m not a real “developer” of any means, but I do dabble in Python and automation here and there. Also, gpt4 (and now claude opus) is a huge helper when it comes to building my vision into reality (sorry open-source LLMs, you are still way behind when it comes to coding). Because of its help, I just need to know enough to guide the AI to write chunks of code for me, fix the occasional bug, and then string them together.

The Technical

I’m not going to open source my code because for once, that would probably be insulting to people who actually do this for a living. Plus, I got credentials in the code which I don’t want to bother rewrite with good practices, but I’m going to talk about what goes through it. (If there’s enough interest, I’ll try to rewrite and open-source the code for you guys)

All the Random Things I Did Before the Main Script

The Main Script

The main script does all these things for me, which I wrote in chunks with the help of gpt4:

I then set up a cronjob to run the script every day around the time I use my laptop.

The Result

The Verdict