sia.hackernoon.com

This post is available originally on my website. Check it out if you’d like!

In this post, I created a fully automated YouTube channel that posts new videos every day, using mostly free and open-source AI tools, earned a whopping 52 subscribers (and counting), and hated what I created. Let’s go.

TLDR

With a bunch of free/open source AI tools (coqui tts, local LLMs), and a few Python scripts written with the help of gpt4, I created a fully automated YouTube channel that posts YouTube shorts every day.

I got a whopping 52 subscribers after 6 months of wasting electricity.
In the end, I loved the process of creating the channel, but I hated the videos I made.

Large Language Models (LLMs) have been amazing for me for being code monkeys that helped me write code. For other written purposes, not so much. Its writing is generic, bland, and “soulless”, as they say. More than ever now, I’m allergic to generic written shit (which seems like what most search results and articles are these days)

How It Started

Well, none of us missed the AI breakthroughs in the past few years, right?

It’s been an incredible experience seeing the evolution of AI technologies in the last year or two. It seems like every month there’s a bunch of new things to play around.

I’m something of an open-source enthusiast myself. I like free stuff, and I don’t mind a bit of jank.

An interesting application for LLMs I’ve found is that it has the knowledge of pretty much all the books you can think of, at least until its knowledge date cutoff.

I used that idea and created a whole book summary website with over 3500 summaries using an open-source AI which was a derivative of llama2 running locally and a few Python scripts written with the help of gpt4.

Then I realized I could take this to the next level by turning all these summaries into videos. Short-form videos are all the rage these days, right? All I need is a (preferably free/open source) text-to-speech model that I can hook into a Python script. The “motion picture” part of the video can just be, well, a picture of the book cover, since I figured people who want to listen to book summaries probably don’t care about the visuals that much. And then I figured Google would probably have a way for me to upload videos to YouTube via API (it does, which is more than I can say for literally all the other video platforms like TikTok and Facebook/Instagram reels)

Lucky me, the (sadly now shut down) CoquiTTS just released their latest, incredible open-source text-to-speech model called xtts.

(At least it’s open-source, so even though it’s shut down, we can still use and fork the latest build of the program which is still very good even for today)

It’s a huge step forward for the free & open-source TTS community, as before this, there was pretty much no free TTS that got even close to matching the performance/computational cost of this model. If you wanted good TTS, you had to pony up like at least 20$/month for something like ElevenLabs, which is probably the leading player in the TTS space.

Now that I got all the ingredients, I just need to cook up the scripts. I’m not a real “developer” of any means, but I do dabble in Python and automation here and there. Also, gpt4 (and now claude opus) is a huge helper when it comes to building my vision into reality (sorry open-source LLMs, you are still way behind when it comes to coding). Because of its help, I just need to know enough to guide the AI to write chunks of code for me, fix the occasional bug, and then string them together.

The Technical

I’m not going to open source my code because for once, that would probably be insulting to people who actually do this for a living. Plus, I got credentials in the code which I don’t want to bother rewrite with good practices, but I’m going to talk about what goes through it. (If there’s enough interest, I’ll try to rewrite and open-source the code for you guys)

All the Random Things I Did Before the Main Script

I self-hosted a quantized 13b llm on my laptop with a Nvidia 3060 mobile GPU on it.

I found a list of thousands of book names and author names on the internet. I then used some Vim magic (which gpt4 happily helped me concoct them up) to format them in JSON.

I then created a Python script that would go through each JSON entry and asked the self-hosted LLM to write a summary for.

A YouTube short has to be less than 1 minute to be qualified as a short, and therefore, I limited the AI output to 80 words, which it followed, kinda. I also tried making full YouTube videos, as you can probably see on the channel, but it took a lot longer to generate, and those videos didn’t even do as well as the shorts, probably because the YouTube algo for shorts is more relaxed.

I then run this script to generate thousands of short book summaries txt files on my computer. If I do it now, I’ll probably use something like Ollama and work that into the main script instead of generating all of this at once, because Ollama can hook the model in and out of the GPU at will. Ollama wasn’t there when I made this, and the TTS model needed the GPU as well, so this looked like the best way to do things.

The Main Script

The main script does all these things for me, which I wrote in chunks with the help of gpt4:

Generate an audio file via coqui xtts by feeding it the text of a summarized title.

Get an image of the cover of the book. I figured there was going to be some sort of free book API on the internet, and there is this site called open library. I just need to query for the image of the title.

Create the video using moviepy. I have the audio and the book cover now. I also have a royalty-free background image and music file, and I mesh everything together using moviepy. This was the most challenging aspect of the code (read: it was a pain in the ass), mainly because it was challenging for gpt4 for some reason. It kept hallucinating and giving me nonexistent functions. I figured it’s probably because moviepy is not very well documented, and few people are crazy enough to create videos using Python like me.

Uploading to YouTube via API. This is also kinda of a problem for gpt4, because of all the API changes and old code getting deprecated, but I eventually figured this out too, with a lot of janky code.

I put an affiliate link to the book on Amazon in the description, because I figured might as well, but the way YouTube shorts work, it’s not very easy for people to click on it anyway. It’s sort of an afterthought.

You can upload 6 videos a day before you have to apply for an additional quota, which is completely fine with me since theoretically, the YouTube algo likes you uploading daily consistently anyway. I have around 3000 scripts, so in theory, I could do this for years.

I then set up a cronjob to run the script every day around the time I use my laptop.

The Result

As of today, after around 6 months of wasting electricity and depreciating my laptop, I earned a whopping 52 subscribers on YouTube.

I also earned a whopping 30$ with an Amazon affiliate in the last year (the majority of that came from the website anyways before I started the YouTube channel; this YouTube channel probably got me 2-3$, max).

I also tried manually posting those videos on TikTok because I already got them, but it has not taken off either, and it’s become more tedious than it’s worth in the end.

You need 1000 subs to start earning via ads on YouTube, which probably will never happen for this channel without additional help. Frankly, I’m not sure if I’ll feel better or not if this channel ever blows up. On one hand, success and accomplishment feel good. On the other hand, if a janky AI channel like this can blow up, what does that mean for the future of the content we consume? (and yeah it’s probably already happening on other YouTube and TikTok channels; I just want to live in denial is all)

The Verdict

Generative AI is a huge game-changer when it comes to writing code. It’s not there yet for someone with no technical background, but for someone who dabbles in and out like me, it’s been an enormous help. I’m looking forward to the day that coding agents can write full-fledged programs from scratch.

Not gonna lie, I kinda hated the videos I made (which is probably part of the reason why I never bother doing any marketing for it). I never watched my own channel, which is probably not a good look. I love automation and making things, but I hate consuming something that’s so bland. The only reason I keep the channel (and the website by extension) is because I’ve gotten an email or two that thanked me for the helpful content, and I like to think I kept it going for them.

As incredible as AI technologies are in terms of written, spoken, and drawn things, there is still something about human-made things that are not replicable (which is probably how artisan/handmade things still compete with machine-made things).

It’s the same story with algorithms too, which is why I created this newsletter in the first place and pledged to keep it personal and human-curated, as I realized that algo-driven article aggregators don’t know if an article is actually interesting or worth reading or not; it just presents those article because of, well, some meaningless metrics.

Please subscribe if you haven’t yet, and consider getting premium for the full experience too because I have no shame.

After reading so many LLM-written things, I realized I’ve subconsciously developed some kind of antenna in my head that detects AI-written shit. “Soulless” sounds like a good description.
Maybe as AI/LLM advances, it will be smart enough to fool all of us. I look forward to that, and I’m terrified of that.

Using Free AI Tools to Create a 100% Automated Youtube Shorts Channel