Building a Video Generator from Scratch: A Comprehensive Guide
Welcome to the exciting world of video generation! In this in-depth blog post, we’ll embark on a journey to build our own video generator from the ground up. We’ll be using powerful AI tools like Stable Diffusion for image creation and Google Cloud AI for object detection and description, all powered by the Flask web framework.
This guide is perfect for beginners, and we’ll break down every step, explaining the code, the technologies involved, and the magic behind making it all work.
What You’ll Learn
- Setting Up a Flask Project: We’ll create a basic Flask application to serve our video generator.
- Integrating Google Cloud AI: Leveraging Google Cloud AI’s Vision API to analyze images.
- Stable Diffusion for Image Generation: Using Stable Diffusion to create stunning images from textual prompts.
- Text-to-Speech with gTTS: Converting text into audio using Google’s Text-to-Speech API.
- Video Editing with MoviePy: Combining images and audio into a cohesive video.
- Designing User Interfaces with HTML: Creating user-friendly forms for text and image input.
- Deployment: Making your video generator accessible online.
Let’s Get Started!
1. Project Setup: Your First Steps
1.1 Create a Virtual Environment
A virtual environment helps us manage project dependencies separately, ensuring that our video generator project doesn’t conflict with other Python projects.
-
Open your terminal and create a new virtual environment:
python3 -m venv my_video_env
python3 -m venv
: This command uses thevenv
module to create a virtual environment.my_video_env
: This is the name of our virtual environment. You can choose a different name.
-
Activate the virtual environment:
source my_video_env/bin/activate
source
: This command runs a script from the specified path.my_video_env/bin/activate
: This is the path to the activation script for our virtual environment.
1.2 Install Required Libraries
Now, let’s install the necessary libraries for our project. We’ll use pip
, the package installer for Python.
pip install flask moviepy gTTS langchain langchain_google_genai pillow requests
- Flask: Web framework for building our application.
- MoviePy: For video editing and combining images and audio.
- gTTS: Text-to-speech conversion.
- LangChain: A framework for building and interacting with AI applications.
- LangChain Google GenAI: A component of LangChain that allows us to use Google Cloud AI models.
- Pillow: Image processing and manipulation.
- Requests: Making API calls.
2. Securely Handling API Keys
2.1 Obtaining API Keys
We’ll need API keys from two services: Google Cloud AI and Hugging Face.
-
Google Cloud AI:
- Sign up for a free trial of Google Cloud Platform (https://cloud.google.com/).
- Enable the Google Cloud AI Platform API.
- Create a service account and get your API key.
-
Hugging Face:
- Sign up for a Hugging Face account (https://huggingface.co/).
- Find the Stable Diffusion model on the Hugging Face model hub (https://huggingface.co/models) and get your API token.
2.2 Setting Up Environment Variables
To store our API keys securely, we’ll use environment variables.
-
Create a
.env
file in your project directory. -
Add the following lines to your
.env
file:GOOGLE_API_KEY=YOUR_GOOGLE_API_KEY HUG_API_KEY=YOUR_HUGGING_FACE_API_KEY
- Replace
YOUR_GOOGLE_API_KEY
andYOUR_HUGGING_FACE_API_KEY
with the actual API keys you obtained.
- Replace
-
Make sure your environment variables are loaded using the
dotenv
package:from dotenv import load_dotenv load_dotenv()
3. HTML Templates: Designing User Interfaces
Now, we’ll create the HTML templates that will form the user interface of our video generator.
3.1 index.html
(Main Landing Page)
<!DOCTYPE html>
<html>
<head>
<title>Video Generator</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<header class="container-fluid bg-white py-3">
<div class="container d-flex justify-content-between align-items-center">
<h1>Video Generator</h1>
<nav class="d-flex">
<a href="/" class="mr-3 text-dark">Get Started</a>
<a href="#" class="text-dark">Help</a>
</nav>
</div>
</header>
<main class="container py-5">
<h3 class="text-center mb-5">Which content would you like to repurpose into videos?</h3>
<div class="row options">
<div class="col-md-6 mb-4">
<div class="option bg-white rounded shadow p-4">
<div class="icon text-center text-primary mb-3">
<i class="fas fa-file-alt fa-3x"></i>
</div>
<h4 class="text-center">Text to Video</h4>
<p class="text-muted text-center">Enter the content and let AI create engaging videos.</p>
<a href="text-to-video"><button class="btn btn-primary btn-block" >Proceed</button></a>
<ul class="recommendations list-unstyled mt-3 text-muted">
<li><i class="fas fa-check-circle mr-2"></i> Educational videos</li>
<li><i class="fas fa-check-circle mr-2"></i> Coaching videos</li>
<li><i class="fas fa-check-circle mr-2"></i> Step-by-step guides</li>
</ul>
</div>
</div>
<div class="col-md-6 mb-4">
<div class="option bg-white rounded shadow p-4">
<div class="icon text-center text-primary mb-3">
<i class="far fa-image fa-3x"></i>
</div>
<h4 class="text-center">Image to Video</h4>
<p class="text-muted text-center">Turn your static images into dynamic video experiences.</p>
<a href="image-to-video"><button class="btn btn-primary btn-block" >Proceed</button></a>
<ul class="recommendations list-unstyled mt-3 text-muted">
<li><i class="fas fa-check-circle mr-2"></i> Marketing and Promotion</li>
<li><i class="fas fa-check-circle mr-2"></i> Social Proof</li>
<li><i class="fas fa-check-circle mr-2"></i> Community Engagement</li>
</ul>
</div>
</div>
</div>
</main>
<footer class="container-fluid bg-white py-3 mt-5">
<div class="container text-center">
<p class="mb-0">© 2024 Video Generator. All rights reserved.</p>
<nav class="nav justify-content-center">
<a href="#" class="nav-link text-muted">Terms of Service</a>
<a href="#" class="nav-link text-muted">Privacy Policy</a>
<a href="#" class="nav-link text-muted">Contact Us</a>
</nav>
</div>
</footer>
<script src="https://kit.fontawesome.com/your-fontawesome-kit-id.js"></script> </body>
</html>
- Bootstrap: This template uses Bootstrap (https://getbootstrap.com/) to provide a nice-looking and responsive layout. Make sure to include the Bootstrap CSS file in your
index.html
(replaceyour-fontawesome-kit-id
with your actual kit ID). - Navigation Links: The header includes links for “Get Started” and “Help.”
- Content Options: This template presents two options for video generation: “Text to Video” and “Image to Video.”
- Buttons: Buttons are used to navigate to the respective pages (
text-to-video
andimage-to-video
).
3.2 text-to-video.html
(Text Input Form)
<!DOCTYPE html>
<html>
<head>
<title>Video Generator</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<header class="container-fluid bg-white py-3">
<div class="container d-flex justify-content-between align-items-center">
<h1>Video Generator</h1>
<nav class="d-flex">
<a href="/" class="mr-3 text-dark">Get Started</a>
<a href="#" class="text-dark">Help</a>
</nav>
</div>
</header>
<main class="container py-5">
<div class="row options">
<div class="col-md-6 mx-auto mb-4">
<div class="option bg-white rounded shadow p-4">
<h3 class="text-center mb-3">Enter your text:</h3>
<form action="/text-to-video" method="post">
<div class="form-group">
<textarea class="form-control" name="textInput" rows="5" placeholder="Describe the scene you want to create..." required></textarea>
</div>
<button type="submit" class="btn btn-primary btn-block">Generate Video</button>
</form>
</div>
</div>
</div>
</main>
<footer class="container-fluid bg-white py-3 mt-5">
<div class="container text-center">
<p class="mb-0">© 2024 Video Generator. All rights reserved.</p>
<nav class="nav justify-content-center">
<a href="#" class="nav-link text-muted">Terms of Service</a>
<a href="#" class="nav-link text-muted">Privacy Policy</a>
<a href="#" class="nav-link text-muted">Contact Us</a>
</nav>
</div>
</footer>
</body>
</html>
- Form Action: The form submits data to the
/text-to-video
route using thePOST
method. - Text Input: The user enters their text description into the
textarea
field (namedtextInput
).
3.3 image-to-video.html
(Image Upload Form)
<!DOCTYPE html>
<html>
<head>
<title>Video Generator</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<header class="container-fluid bg-white py-3">
<div class="container d-flex justify-content-between align-items-center">
<h1>Video Generator</h1>
<nav class="d-flex">
<a href="/" class="mr-3 text-dark">Get Started</a>
<a href="#" class="text-dark">Help</a>
</nav>
</div>
</header>
<main class="container py-5">
<div class="row options">
<div class="col-md-6 mx-auto mb-4">
<div class="option bg-white rounded shadow p-4">
<div class="icon text-center text-primary mb-3">
<i class="fas fa-file-alt fa-3x"></i>
<h3 class="text-center mb-5">Upload your image</h3>
<form action="/image-to-video" method="post" enctype="multipart/form-data">
<label for="file-input" class="btn btn-primary btn-block">Choose File</label>
<input type="file" name="file" id="file-input">
<input type="submit" class="btn btn-primary btn-block" value="Generate Video">
</form>
<ul class="recommendations list-unstyled mt-3 text-muted">
</ul>
</div>
</div>
</div>
</div>
</main>
<footer class="container-fluid bg-white py-3 mt-5">
<div class="container text-center">
<p class="mb-0">© 2024 Video Generator. All rights reserved.</p>
<nav class="nav justify-content-center">
<a href="#" class="nav-link text-muted">Terms of Service</a>
<a href="#" class="nav-link text-muted">Privacy Policy</a>
<a href="#" class="nav-link text-muted">Contact Us</a>
</nav>
</div>
</footer>
<script src="https://kit.fontawesome.com/your-fontawesome-kit-id.js"></script>
</body>
</html>
- Form Action: The form submits data to the
/image-to-video
route using thePOST
method. Theenctype="multipart/form-data"
is essential for handling file uploads. - File Input: The user selects an image file using the
input type="file"
.
3.4 player.html
(Video Player)
<!DOCTYPE html>
<html>
<head>
<title>Video Generator - Video Player</title>
<link rel="stylesheet" href="https://stackpath.bootstrapcdn.com/bootstrap/4.5.2/css/bootstrap.min.css">
<link rel="stylesheet" href="/static/css/style.css">
</head>
<body>
<header class="container-fluid bg-white py-3">
<div class="container d-flex justify-content-between align-items-center">
<h1>Video Generator</h1>
<nav class="d-flex">
<a href="/" class="mr-3 text-dark">Get Started</a>
<a href="#" class="text-dark">Help</a>
</nav>
</div>
</header>
<main class="container py-5">
<div class="row">
<div class="col-md-8 mx-auto">
<h2>Generated Video</h2>
<video width="100%" controls>
<source src="/static/output/gen_video.mp4" type="video/mp4">
Your browser does not support the video tag.
</video>
<!-- Download Button -->
<a href="/static/output/gen_video.mp4" class="btn btn-primary mt-3" download>
Download Video
</a>
<div>
<h3>Detected Objects:</h3>
<p>{{ object_detected }}</p>
</div>
<div>
<h3>Object Description:</h3>
<p>{{ object_description }}</p>
</div>
</div>
</div>
</main>
<footer class="container-fluid bg-white py-3 mt-5">
<div class="container text-center">
<p class="mb-0">© 2024 Video Generator. All rights reserved.</p>
<nav class="nav justify-content-center">
<a href="#" class="nav-link text-muted">Terms of Service</a>
<a href="#" class="nav-link text-muted">Privacy Policy</a>
<a href="#" class="nav-link text-muted">Contact Us</a>
</nav>
</div>
</footer>
</body>
</html>
- Video Player: The
<video>
tag displays the generated video. - Download Button: A button allows users to download the video.
- Object Information: The detected objects and their descriptions are displayed here, using Jinja templating (
{{ object_detected }}
and{{ object_description }}
).
4. Python Code: The Heart of Your Video Generator
Now, let’s delve into the core of our application, the main.py
file.
from flask import Flask, render_template, request, redirect, url_for
import os
import requests
import io
from PIL import Image, UnidentifiedImageError
from gtts import gTTS
from moviepy.editor import ImageSequenceClip, AudioFileClip, concatenate_videoclips
from dotenv import load_dotenv
from langchain_core.messages import HumanMessage
from langchain_google_genai import ChatGoogleGenerativeAI
app = Flask(__name__)
# Load API keys
load_dotenv()
# Set the API keys
GOOGLE_API_KEY = os.environ.get('GOOGLE_API_KEY')
hug_api_key = os.environ.get("HUG_API_KEY")
API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-2-1"
headers = {"Authorization": f"Bearer {hug_api_key}"}
# Initialize model to get inference from the image
llm = ChatGoogleGenerativeAI(model="gemini-pro-vision", google_api_key=GOOGLE_API_KEY, max_output_tokens=100)
# Configure allowed upload extensions
ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}
# Define paths
output_path = os.path.join('static', 'output')
image_save_path = os.path.join(output_path, 'gen_image1.jpg')
image1_save_path = os.path.join(output_path, 'gen_image2.jpg')
image2_save_path = os.path.join(output_path, 'gen_image3.jpg')
audio_save_path = os.path.join(output_path, 'gen_audio.mp3')
video_save_path = os.path.join(output_path, 'gen_video.mp4')
# Function to check allowed file extensions
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
# Function to detect object from the image
def detect_object(img_path):
message = HumanMessage(content=[
{
"type": "text",
"text": "detect the object in the image, just return the detected objects name, for example if it's a laptop return laptop, don't make things up"
},
{
"type": "image_url",
"image_url": img_path
},
])
response = llm.invoke([message])
return response.content
# Function to get inference from the image
def get_inference(img_path):
message = HumanMessage(content=[
{
"type": "text",
"text": "what's in this image, give a short detail on the image"
},
{
"type": "image_url",
"image_url": img_path
},
])
response = llm.invoke([message])
return response.content
# Function to generate image from response
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
if response.status_code != 200:
print(f"Error: API request failed with status code {response.status_code}")
print(f"Response: {response.text}")
return response.content
def generate_image(detect_object):
try:
print(f"Generating images for: {detect_object}")
# First image
image_bytes = query({"inputs": detect_object})
print(f"First image bytes length: {len(image_bytes)}")
if len(image_bytes) < 100:
print(f"Error: Received invalid image data for the first image.")
return
try:
image = Image.open(io.BytesIO(image_bytes))
image.save(image_save_path)
except UnidentifiedImageError as e:
print(f"Error: Cannot identify the first image file. Details: {e}")
return
# Second image
image_bytes = query({"inputs": f'dramatic,{detect_object}'})
print(f"Second image bytes length: {len(image_bytes)}")
if len(image_bytes) < 100:
print(f"Error: Received invalid image data for the second image.")
return
try:
image1 = Image.open(io.BytesIO(image_bytes))
image1.save(image1_save_path)
except UnidentifiedImageError as e:
print(f"Error: Cannot identify the second image file. Details: {e}")
return
# Third image
image_bytes = query({"inputs": f'A photorealistic,{detect_object}'})
print(f"Third image bytes length: {len(image_bytes)}")
if len(image_bytes) < 100:
print(f"Error: Received invalid image data for the third image.")
return
try:
image2 = Image.open(io.BytesIO(image_bytes))
image2.save(image2_save_path)
except UnidentifiedImageError as e:
print(f"Error: Cannot identify the third image file. Details: {e}")
return
except Exception as e:
print(f"Unexpected error occurred: {e}")
# Function to convert text to speech
def generate_audio(object_description):
tts = gTTS(object_description, lang='en')
tts.save(audio_save_path)
# Function for generating video
def generate_video():
# Check if images exist
if not (os.path.exists(image_save_path) and os.path.exists(image1_save_path) and os.path.exists(image2_save_path)):
print("Error: One or more image files are missing.")
return None
# Load audio
audio = AudioFileClip(audio_save_path)
# Split audio
avg_duration = audio.duration / 3
audio1 = audio.subclip(0, avg_duration)
audio2 = audio.subclip(avg_duration, 2 * avg_duration)
audio3 = audio.subclip(2 * avg_duration, audio.duration)
# Load images
image1 = ImageSequenceClip([image_save_path], durations=[audio1.duration]) # Set duration for each image
image2 = ImageSequenceClip([image1_save_path], durations=[audio2.duration])
image3 = ImageSequenceClip([image2_save_path], durations=[audio3.duration])
# Combine images
final_clip = concatenate_videoclips([image1, image2, image3], method="compose")
# Set audio to video clips
final_clip = final_clip.set_audio(audio)
# Write the final video
final_clip.write_videofile(video_save_path, fps=1)
return video_save_path
@app.route('/')
def index():
return render_template('index.html')
@app.route('/text-to-video', methods=['GET', 'POST'])
def text_to_video():
if request.method == 'POST':
# Get the text data from the request
text_data = request.form['textInput'] # Assuming the text input has id "textInput"
# Process the text data
object_description = text_data # Use the text as the description
generate_image(object_description)
generate_audio(object_description)
video_save_path = generate_video()
return redirect(url_for('videoplayer', source='text'))
return render_template('text-to-video.html')
@app.route('/image-to-video', methods=['GET', 'POST'])
def image_to_video():
if request.method == 'POST':
# Check if the post request has the file part
if 'file' not in request.files:
return "No file part"
file = request.files['file']
# If user does not select file, browser also
# submit an empty part without filename
if file.filename == '':
return "No selected file"
if file and allowed_file(file.filename):
file_path = os.path.join(app.static_folder, 'uploads', 'temp_image.jpg')
file.save(file_path)
# Process image with AI functions
object_detected = detect_object(file_path)
object_description = get_inference(file_path)
# Save detected objects to gen_keyword.txt
keywords_file_path = os.path.join(app.static_folder, 'output', 'gen_keyword.txt')
with open(keywords_file_path, 'w') as f:
for obj in object_detected:
f.write(obj)
description_file_path = os.path.join(app.static_folder, 'output', 'gen_desc.txt')
with open(description_file_path, 'w') as f:
for obj in object_description:
f.write(obj)
generate_image(object_description)
generate_audio(object_description)
video_save_path = generate_video()
return redirect(url_for('videoplayer', source='image'))
return render_template('image-to-video.html')
@app.route('/videoplayer')
def videoplayer():
# Read detected keywords from gen_keywords.txt
keywords_file_path = os.path.join(app.static_folder, 'output', 'gen_keyword.txt')
try:
with open(keywords_file_path, 'r') as f:
object_detected = f.read()
except FileNotFoundError:
object_detected = "No objects detected"
description_file_path = os.path.join(app.static_folder, 'output', 'gen_desc.txt')
try:
with open(description_file_path, 'r') as f:
object_description = f.read()
except FileNotFoundError:
object_description = "No description available"
return render_template('player.html', object_detected=object_detected, object_description=object_description)
if __name__ == '__main__':
app.run(host='0.0.0.0', port=80)
4.1 Initializing the Flask Application
from flask import Flask, render_template, request, redirect, url_for
import os
app = Flask(__name__)
Flask(__name__)
: Creates a Flask application instance. The__name__
variable is used to specify the current module, which Flask needs to determine the application’s root directory.
4.2 Loading API Keys
# Load API keys
load_dotenv()
# Set the API keys
GOOGLE_API_KEY = os.environ.get('GOOGLE_API_KEY')
hug_api_key = os.environ.get("HUG_API_KEY")
load_dotenv()
: Loads environment variables from the.env
file we created earlier, making the API keys accessible.os.environ.get('GOOGLE_API_KEY')
andos.environ.get("HUG_API_KEY")
: Retrieves the values of the environment variablesGOOGLE_API_KEY
andHUG_API_KEY
, respectively.
4.3 Setting Up Stable Diffusion
API_URL = "https://api-inference.huggingface.co/models/stabilityai/stable-diffusion-2-1"
headers = {"Authorization": f"Bearer {hug_api_key}"}
API_URL
: The URL for the Hugging Face Stable Diffusion API.headers
: A dictionary containing the authorization header with the Hugging Face API token.
4.4 Initializing Google Cloud AI
# Initialize model to get inference from the image
llm = ChatGoogleGenerativeAI(model="gemini-pro-vision", google_api_key=GOOGLE_API_KEY, max_output_tokens=100)
ChatGoogleGenerativeAI
: This class, from thelangchain_google_genai
module, is used to access Google Cloud AI’s text-to-image and vision models.model="gemini-pro-vision"
: Specifies the Google Gemini Pro vision model for object detection and image description.google_api_key=GOOGLE_API_KEY
: Provides the Google Cloud AI API key.max_output_tokens=100
: Sets the maximum number of tokens for the model’s response.
4.5 Configuring Allowed File Extensions
# Configure allowed upload extensions
ALLOWED_EXTENSIONS = {'png', 'jpg', 'jpeg'}
ALLOWED_EXTENSIONS
: A set containing the allowed file extensions for image uploads.
4.6 Defining File Paths
# Define paths
output_path = os.path.join('static', 'output')
image_save_path = os.path.join(output_path, 'gen_image1.jpg')
image1_save_path = os.path.join(output_path, 'gen_image2.jpg')
image2_save_path = os.path.join(output_path, 'gen_image3.jpg')
audio_save_path = os.path.join(output_path, 'gen_audio.mp3')
video_save_path = os.path.join(output_path, 'gen_video.mp4')
output_path
: The path to the directory where generated files (images, audio, and video) will be saved.- Other paths: Paths for individual files within the
output
directory.
4.7 allowed_file
Function
# Function to check allowed file extensions
def allowed_file(filename):
return '.' in filename and filename.rsplit('.', 1)[1].lower() in ALLOWED_EXTENSIONS
allowed_file
: This function checks if a filename has a valid extension (i.e.,png
,jpg
, orjpeg
).
4.8 detect_object
Function
# Function to detect object from the image
def detect_object(img_path):
message = HumanMessage(content=[
{
"type": "text",
"text": "detect the object in the image, just return the detected objects name, for example if it's a laptop return laptop, don't make things up"
},
{
"type": "image_url",
"image_url": img_path
},
])
response = llm.invoke([message])
return response.content
detect_object
: This function uses the Google Gemini Pro vision model to detect objects in an image.HumanMessage
: Creates a LangChain message object with the prompt “detect the object in the image…” and the image URL.llm.invoke
: Invokes the Google Gemini Pro vision model with the message, sending the image and the prompt.response.content
: Returns the model’s response containing the detected objects.
4.9 get_inference
Function
# Function to get inference from the image
def get_inference(img_path):
message = HumanMessage(content=[
{
"type": "text",
"text": "what's in this image, give a short detail on the image"
},
{
"type": "image_url",
"image_url": img_path
},
])
response = llm.invoke([message])
return response.content
get_inference
: This function obtains a textual description of an image using the Google Gemini Pro vision model.HumanMessage
: Creates a LangChain message object with the prompt “what’s in this image…” and the image URL.llm.invoke
: Invokes the Google Gemini Pro vision model with the message, sending the image and the prompt.response.content
: Returns the model’s response containing a description of the image.
4.10 query
Function
# Function to generate image from response
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
if response.status_code != 200:
print(f"Error: API request failed with status code {response.status_code}")
print(f"Response: {response.text}")
return response.content
4.11 generate_image
Function
def generate_image(detect_object):
try:
print(f"Generating images for: {detect_object}")
# First image
image_bytes = query({"inputs": detect_object})
print(f"First image bytes length: {len(image_bytes)}")
if len(image_bytes) < 100:
print(f"Error: Received invalid image data for the first image.")
return
try:
image = Image.open(io.BytesIO(image_bytes))
image.save(image_save_path)
except UnidentifiedImageError as e:
print(f"Error: Cannot identify the first image file. Details: {e}")
return
# Second image
image_bytes = query({"inputs": f'dramatic,{detect_object}'})
print(f"Second image bytes length: {len(image_bytes)}")
if len(image_bytes) < 100:
print(f"Error: Received invalid image data for the second image.")
return
try:
image1 = Image.open(io.BytesIO(image_bytes))
image1.save(image1_save_path)
except UnidentifiedImageError as e:
print(f"Error: Cannot identify the second image file. Details: {e}")
return
# Third image
image_bytes = query({"inputs": f'A photorealistic,{detect_object}'})
print(f"Third image bytes length: {len(image_bytes)}")
if len(image_bytes) < 100:
print(f"Error: Received invalid image data for the third image.")
return
try:
image2 = Image.open(io.BytesIO(image_bytes))
image2.save(image2_save_path)
except UnidentifiedImageError as e:
print(f"Error: Cannot identify the third image file. Details: {e}")
return
except Exception as e:
print(f"Unexpected error occurred: {e}")
generate_image
: This function is responsible for generating three images using Stable Diffusion.query
: This function (which we’ll look at shortly) sends a request to the Stable Diffusion API with a prompt and receives image data.Image.open(io.BytesIO(image_bytes))
: Opens the image data received from the API.image.save(image_save_path)
: Saves the generated image to a file.try-except
blocks: Error handling to catch potential issues with image loading.- Style Variations: This function generates three images with different styles: the first image is generated directly from the prompt, the second image adds a “dramatic” style, and the third image uses a “photorealistic” style.
4.12 query
Function (Stable Diffusion API Call)
# Function to generate image from response
def query(payload):
response = requests.post(API_URL, headers=headers, json=payload)
if response.status_code != 200:
print(f"Error: API request failed with status code {response.status_code}")
print(f"Response: {response.text}")
return response.content
query
: This function makes the actual request to the Hugging Face Stable Diffusion API.requests.post(API_URL, headers=headers, json=payload)
: Sends a POST request to the Stable Diffusion API.API_URL
: The API endpoint.headers
: The authorization header (containing the Hugging Face API token).json=payload
: The prompt for image generation is sent as JSON data.
response.status_code != 200
: Checks if the API request was successful. If not, an error message is printed.response.content
: Returns the image data from the API response.
4.13 generate_audio
Function
# Function to convert text to speech
def generate_audio(object_description):
tts = gTTS(object_description, lang='en')
tts.save(audio_save_path)
generate_audio
: This function converts text into speech using the gTTS library.gTTS(object_description, lang='en')
: Creates agTTS
object with the text to convert (object_description
) and the language (en
for English).tts.save(audio_save_path)
: Saves the generated speech as an MP3 file.
4.14 generate_video
Function
# Function for generating video
def generate_video():
# Check if images exist
if not (os.path.exists(image_save_path) and os.path.exists(image1_save_path) and os.path.exists(image2_save_path)):
print("Error: One or more image files are missing.")
return None
# Load audio
audio = AudioFileClip(audio_save_path)
# Split audio
avg_duration = audio.duration / 3
audio1 = audio.subclip(0, avg_duration)
audio2 = audio.subclip(avg_duration, 2 * avg_duration)
audio3 = audio.subclip(2 * avg_duration, audio.duration)
# Load images
image1 = ImageSequenceClip([image_save_path], durations=[audio1.duration]) # Set duration for each image
image2 = ImageSequenceClip([image1_save_path], durations=[audio2.duration])
image3 = ImageSequenceClip([image2_save_path], durations=[audio3.duration])
# Combine images
final_clip = concatenate_videoclips([image1, image2, image3], method="compose")
# Set audio to video clips
final_clip = final_clip.set_audio(audio)
# Write the final video
final_clip.write_videofile(video_save_path, fps=1)
return video_save_path
generate_video
: This function combines the generated images and audio into a video using MoviePy.- File Check: Ensures that all three generated images exist before proceeding.
- Audio Loading: Loads the generated audio file.
- Audio Splitting: Divides the audio into three equal segments to match the three images.
- Image Loading: Creates
ImageSequenceClip
objects for each image, setting the duration of each image to match the corresponding audio segment. - Video Concatenation: Combines the three image clips using
concatenate_videoclips
. - Setting Audio: Attaches the audio to the video clips.
- Video Writing: Writes the final video to a file.
4.15 Flask Routes: Handling User Requests
4.15.1 index
Route
@app.route('/')
def index():
return render_template('index.html')
@app.route('/')
: This decorator registers theindex
function as the handler for the root route (/
).return render_template('index.html')
: Renders theindex.html
template, which is our main landing page.
4.15.2 text-to-video
Route
@app.route('/text-to-video', methods=['GET', 'POST'])
def text_to_video():
if request.method == 'POST':
# Get the text data from the request
text_data = request.form['textInput'] # Assuming the text input has id "textInput"
# Process the text data
object_description = text_data # Use the text as the description
generate_image(object_description)
generate_audio(object_description)
video_save_path = generate_video()
return redirect(url_for('videoplayer', source='text'))
return render_template('text-to-video.html')
@app.route('/text-to-video', methods=['GET', 'POST'])
: Registers thetext_to_video
function as the handler for the/text-to-video
route. It handles both GET and POST requests.if request.method == 'POST'
: If the request is a POST request (i.e., the form has been submitted), process the data.request.form['textInput']
: Retrieves the text description submitted by the user from the form.generate_image
: Generates images based on the text description.generate_audio
: Creates audio from the text description.generate_video
: Combines the images and audio into a video.redirect(url_for('videoplayer', source='text'))
: Redirects the user to the/videoplayer
route, passing information about the source of the video (“text”).
return render_template('text-to-video.html')
: Renders thetext-to-video.html
template if the request is a GET request.
4.15.3 image-to-video
Route
@app.route('/image-to-video', methods=['GET', 'POST'])
def image_to_video():
if request.method == 'POST':
# Check if the post request has the file part
if 'file' not in request.files:
return "No file part"
file = request.files['file']
# If user does not select file, browser also
# submit an empty part without filename
if file.filename == '':
return "No selected file"
if file and allowed_file(file.filename):
file_path = os.path.join(app.static_folder, 'uploads', 'temp_image.jpg')
file.save(file_path)
# Process image with AI functions
object_detected = detect_object(file_path)
object_description = get_inference(file_path)
# Save detected objects to gen_keyword.txt
keywords_file_path = os.path.join(app.static_folder, 'output', 'gen_keyword.txt')
with open(keywords_file_path, 'w') as f:
for obj in object_detected:
f.write(obj)
description_file_path = os.path.join(app.static_folder, 'output', 'gen_desc.txt')
with open(description_file_path, 'w') as f:
for obj in object_description:
f.write(obj)
generate_image(object_description)
generate_audio(object_description)
video_save_path = generate_video()
return redirect(url_for('videoplayer', source='image'))
return render_template('image-to-video.html')
@app.route('/image-to-video', methods=['GET', 'POST'])
: Registers theimage_to_video
function as the handler for the/image-to-video
route.if request.method == 'POST'
: If the request is a POST request (i.e., the form has been submitted with an image file), process the data.if 'file' not in request.files:
: Checks if the form includes an uploaded file.file = request.files['file']
: Retrieves the uploaded file from the form.if file and allowed_file(file.filename)
: Checks if a file was uploaded and if it has a valid extension.file.save(file_path)
: Saves the uploaded image to a temporary file.object_detected = detect_object(file_path)
: Detects objects in the uploaded image using thedetect_object
function.object_description = get_inference(file_path)
: Gets a description of the uploaded image using theget_inference
function.- Saving Keywords and Description: Saves the detected objects and description to files (
gen_keyword.txt
andgen_desc.txt
) to be used later. generate_image
: Generates images based on the text description.generate_audio
: Creates audio from the text description.generate_video
: Combines the images and audio into a video.redirect(url_for('videoplayer', source='image'))
: Redirects the user to the/videoplayer
route, passing information about the source of the video (“image”).
return render_template('image-to-video.html')
: Renders theimage-to-video.html
template if the request is a GET request.
4.15.4 videoplayer
Route
@app.route('/videoplayer')
def videoplayer():
# Read detected keywords from gen_keywords.txt
keywords_file_path = os.path.join(app.static_folder, 'output', 'gen_keyword.txt')
try:
with open(keywords_file_path, 'r') as f:
object_detected = f.read()
except FileNotFoundError:
object_detected = "No objects detected"
description_file_path = os.path.join(app.static_folder, 'output', 'gen_desc.txt')
try:
with open(description_file_path, 'r') as f:
object_description = f.read()
except FileNotFoundError:
object_description = "No description available"
return render_template('player.html', object_detected=object_detected, object_description=object_description)
@app.route('/videoplayer')
: Registers thevideoplayer
function as the handler for the/videoplayer
route.- Reading Keywords and Description: Reads the detected objects and description from the saved files.
return render_template('player.html', object_detected=object_detected, object_description=object_description)
: Renders theplayer.html
template, passing the detected objects and description to the template.
4.16 Running the Flask Application
- Open your terminal and navigate to the directory where your
main.py
file is located. - Run the Flask app using either of these commands:
flask run
or
python main.py
- This will start the Flask development server, which will usually run on
http://127.0.0.1:5000/
.
- This will start the Flask development server, which will usually run on
5. Using Your Video Generator
- Open your web browser and visit
http://127.0.0.1:5000/
. - You’ll see the main landing page (
index.html
) with the options for “Text to Video” and “Image to Video.” - Text to Video:
- Click “Proceed” under “Text to Video.”
- Enter a text description in the form on the
text-to-video.html
page. - Click “Generate Video.”
- Image to Video:
- Click “Proceed” under “Image to Video.”
- Upload an image using the form on the
image-to-video.html
page. - Click “Generate Video.”
- The application will generate a video based on your input and redirect you to the
/videoplayer
route to view the generated video.
Conclusion
You’ve successfully built a video generator using Flask, Google Cloud AI, Stable Diffusion, gTTS, and MoviePy. This application allows you to turn your words and images into dynamic, engaging videos.
What’s Next?
This is just the beginning of your video generation journey. Consider these enhancements:
- Customization: Experiment with different Stable Diffusion prompts to explore diverse styles and aesthetics for your videos.
- Advanced Features: Implement features like background music, transitions between scenes, text overlays, and more.
- Deployment: Deploy your application to a web server (like Heroku or AWS) so you can share your video generator with others.