The AI Video Production Pipeline
Creating videos traditionally requires:
With the right AI tools, you can automate this entire workflow. In this tutorial, we will build a pipeline that:
1. OpenAI GPT - Generates video scripts from a topic
2. Lovo AI - Converts scripts to professional voiceovers
3. Remotion - Renders React components as video with synchronized audio
By the end, you will have a system that can produce a complete video from just a topic prompt.
Architecture Overview
[Topic Input]
|
v
[OpenAI GPT] --> Script with timestamps
|
v
[Lovo AI API] --> Audio file (.mp3) + word timestamps
|
v
[Remotion] --> Synchronized video with visuals + audio
|
v
[Final MP4]Prerequisites
Step 1: Project Setup
npx create-video@latest ai-video-pipeline
cd ai-video-pipeline
npm install openai axiosThe `create-video` command sets up a Remotion project with everything configured.
Step 2: Script Generation with OpenAI
Create `src/lib/script-generator.ts`:
import OpenAI from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
});
export interface ScriptSegment {
text: string;
visualDescription: string;
duration: number; // estimated seconds
}
export interface VideoScript {
title: string;
segments: ScriptSegment[];
totalDuration: number;
}
export async function generateScript(topic: string): Promise<VideoScript> {
const response = await openai.chat.completions.create({
model: 'gpt-4o',
messages: [
{
role: 'system',
content: `You are a professional video scriptwriter. Generate engaging, concise scripts for educational/explainer videos.
Output format (JSON):
{
"title": "Video title",
"segments": [
{
"text": "Narration text for this segment",
"visualDescription": "What should appear on screen",
"duration": 5
}
]
}
Guidelines:
- Each segment should be 3-8 seconds of narration
- Keep total video under 2 minutes
- Write conversational, engaging copy
- Visual descriptions should be simple and achievable with motion graphics`,
},
{
role: 'user',
content: `Create a video script about: ${topic}`,
},
],
response_format: { type: 'json_object' },
});
const script = JSON.parse(response.choices[0].message.content!) as VideoScript;
script.totalDuration = script.segments.reduce((sum, s) => sum + s.duration, 0);
return script;
}Step 3: Voice Generation with Lovo AI
Create `src/lib/voice-generator.ts`:
import axios from 'axios';
import fs from 'fs';
import path from 'path';
const LOVO_API_URL = 'https://api.genny.lovo.ai/api/v1';
interface LovoVoice {
id: string;
displayName: string;
locale: string;
}
interface WordTimestamp {
word: string;
start: number; // milliseconds
end: number;
}
export interface VoiceResult {
audioPath: string;
duration: number;
wordTimestamps: WordTimestamp[];
}
export async function getVoices(): Promise<LovoVoice[]> {
const response = await axios.get(`${LOVO_API_URL}/speakers`, {
headers: {
'X-API-KEY': process.env.LOVO_API_KEY,
},
});
return response.data.data;
}
export async function generateVoiceover(
text: string,
voiceId: string,
outputPath: string
): Promise<VoiceResult> {
// Step 1: Create TTS job
const createResponse = await axios.post(
`${LOVO_API_URL}/tts`,
{
speaker: voiceId,
text: text,
speed: 1.0,
},
{
headers: {
'X-API-KEY': process.env.LOVO_API_KEY,
'Content-Type': 'application/json',
},
}
);
const jobId = createResponse.data.id;
// Step 2: Poll for completion
let result;
while (true) {
const statusResponse = await axios.get(`${LOVO_API_URL}/tts/${jobId}`, {
headers: { 'X-API-KEY': process.env.LOVO_API_KEY },
});
if (statusResponse.data.status === 'succeeded') {
result = statusResponse.data;
break;
} else if (statusResponse.data.status === 'failed') {
throw new Error('Voice generation failed');
}
await new Promise((r) => setTimeout(r, 1000));
}
// Step 3: Download audio file
const audioResponse = await axios.get(result.urls[0], {
responseType: 'arraybuffer',
});
fs.writeFileSync(outputPath, Buffer.from(audioResponse.data));
return {
audioPath: outputPath,
duration: result.duration,
wordTimestamps: result.wordTimestamps || [],
};
}
export async function generateFullVoiceover(
segments: { text: string }[],
voiceId: string,
outputDir: string
): Promise<VoiceResult[]> {
const results: VoiceResult[] = [];
for (let i = 0; i < segments.length; i++) {
const outputPath = path.join(outputDir, `segment-${i}.mp3`);
console.log(`Generating voiceover for segment ${i + 1}/${segments.length}`);
const result = await generateVoiceover(
segments[i].text,
voiceId,
outputPath
);
results.push(result);
}
return results;
}Step 4: Video Composition with Remotion
Create `src/Video.tsx`:
import { AbsoluteFill, Audio, Sequence, useCurrentFrame, useVideoConfig, interpolate, spring } from 'remotion';
interface Segment {
text: string;
visualDescription: string;
audioPath: string;
duration: number; // in seconds
wordTimestamps?: { word: string; start: number; end: number }[];
}
interface VideoProps {
title: string;
segments: Segment[];
}
const TitleCard: React.FC<{ title: string }> = ({ title }) => {
const frame = useCurrentFrame();
const { fps } = useVideoConfig();
const opacity = interpolate(frame, [0, 30], [0, 1], { extrapolateRight: 'clamp' });
const scale = spring({ frame, fps, config: { damping: 200 } });
return (
<AbsoluteFill className="bg-gradient-to-br from-blue-900 to-purple-900 flex items-center justify-center">
<h1
style={{ opacity, transform: `scale(${scale})` }}
className="text-6xl font-bold text-white text-center px-20"
>
{title}
</h1>
</AbsoluteFill>
);
};
const ContentSegment: React.FC<{ segment: Segment }> = ({ segment }) => {
const frame = useCurrentFrame();
const { fps } = useVideoConfig();
const textOpacity = interpolate(frame, [0, 15], [0, 1]);
const slideIn = spring({ frame, fps, config: { damping: 100 } });
return (
<AbsoluteFill className="bg-gradient-to-br from-slate-900 to-slate-800">
{/* Audio for this segment */}
<Audio src={segment.audioPath} />
{/* Visual description as background context */}
<div className="absolute top-10 left-10 text-slate-500 text-sm">
{segment.visualDescription}
</div>
{/* Main text with animation */}
<div className="flex items-center justify-center h-full px-20">
<p
style={{
opacity: textOpacity,
transform: `translateY(${(1 - slideIn) * 50}px)`,
}}
className="text-4xl text-white text-center leading-relaxed font-medium"
>
{segment.text}
</p>
</div>
{/* Animated word highlights (if timestamps available) */}
{segment.wordTimestamps && (
<WordHighlighter
words={segment.wordTimestamps}
frame={frame}
fps={fps}
/>
)}
</AbsoluteFill>
);
};
const WordHighlighter: React.FC<{
words: { word: string; start: number; end: number }[];
frame: number;
fps: number;
}> = ({ words, frame, fps }) => {
const currentTimeMs = (frame / fps) * 1000;
return (
<div className="absolute bottom-20 left-0 right-0 flex justify-center gap-2 px-10 flex-wrap">
{words.map((w, i) => {
const isActive = currentTimeMs >= w.start && currentTimeMs <= w.end;
return (
<span
key={i}
className={`text-2xl transition-colors ${
isActive ? 'text-yellow-400 font-bold' : 'text-slate-400'
}`}
>
{w.word}
</span>
);
})}
</div>
);
};
export const MyVideo: React.FC<VideoProps> = ({ title, segments }) => {
const { fps } = useVideoConfig();
const TITLE_DURATION = 3 * fps; // 3 seconds for title
let currentFrame = TITLE_DURATION;
return (
<>
{/* Title sequence */}
<Sequence from={0} durationInFrames={TITLE_DURATION}>
<TitleCard title={title} />
</Sequence>
{/* Content segments */}
{segments.map((segment, index) => {
const segmentFrames = Math.ceil(segment.duration * fps);
const sequence = (
<Sequence
key={index}
from={currentFrame}
durationInFrames={segmentFrames}
>
<ContentSegment segment={segment} />
</Sequence>
);
currentFrame += segmentFrames;
return sequence;
})}
</>
);
};Step 5: The Main Pipeline
Create `src/pipeline.ts`:
import { bundle } from '@remotion/bundler';
import { renderMedia, selectComposition } from '@remotion/renderer';
import path from 'path';
import fs from 'fs';
import { generateScript } from './lib/script-generator';
import { generateFullVoiceover, getVoices } from './lib/voice-generator';
const OUTPUT_DIR = './output';
const AUDIO_DIR = './output/audio';
async function createVideo(topic: string) {
console.log('Step 1: Generating script...');
const script = await generateScript(topic);
console.log(`Generated script: "${script.title}" with ${script.segments.length} segments`);
// Ensure output directories exist
fs.mkdirSync(AUDIO_DIR, { recursive: true });
console.log('Step 2: Getting available voices...');
const voices = await getVoices();
const selectedVoice = voices.find((v) => v.locale.startsWith('en-US')) || voices[0];
console.log(`Using voice: ${selectedVoice.displayName}`);
console.log('Step 3: Generating voiceovers...');
const voiceResults = await generateFullVoiceover(
script.segments,
selectedVoice.id,
AUDIO_DIR
);
// Combine script with audio results
const segmentsWithAudio = script.segments.map((segment, i) => ({
...segment,
audioPath: voiceResults[i].audioPath,
duration: voiceResults[i].duration / 1000, // Convert ms to seconds
wordTimestamps: voiceResults[i].wordTimestamps,
}));
console.log('Step 4: Bundling Remotion project...');
const bundleLocation = await bundle({
entryPoint: path.resolve('./src/index.ts'),
webpackOverride: (config) => config,
});
console.log('Step 5: Rendering video...');
const composition = await selectComposition({
serveUrl: bundleLocation,
id: 'MyVideo',
inputProps: {
title: script.title,
segments: segmentsWithAudio,
},
});
const outputPath = path.join(OUTPUT_DIR, `${script.title.replace(/[^a-z0-9]/gi, '-')}.mp4`);
await renderMedia({
composition,
serveUrl: bundleLocation,
codec: 'h264',
outputLocation: outputPath,
inputProps: {
title: script.title,
segments: segmentsWithAudio,
},
});
console.log(`Video rendered successfully: ${outputPath}`);
return outputPath;
}
// Run the pipeline
const topic = process.argv[2] || 'How to learn programming in 2026';
createVideo(topic).catch(console.error);Step 6: Run the Pipeline
# Set environment variables
export OPENAI_API_KEY=your_openai_key
export LOVO_API_KEY=your_lovo_key
# Generate a video
npx ts-node src/pipeline.ts "5 Tips for Better Code Reviews"The pipeline will:
1. Generate a script with GPT-4o
2. Create voiceovers for each segment with Lovo AI
3. Render a synchronized video with Remotion
Advanced: Adding B-Roll and Images
Enhance your videos with AI-generated images:
import OpenAI from 'openai';
const openai = new OpenAI();
async function generateVisual(description: string): Promise<string> {
const response = await openai.images.generate({
model: 'dall-e-3',
prompt: `Clean, modern illustration for a video: ${description}. Minimal style, suitable for educational content.`,
size: '1792x1024',
quality: 'standard',
});
return response.data[0].url!;
}Then in your Remotion component:
<Img src={segment.visualUrl} className="absolute inset-0 object-cover opacity-30" />Cost Analysis
For a 2-minute video:
| Service | Usage | Cost |
|---|---|---|
| OpenAI GPT-4o | ~500 tokens | ~$0.01 |
| Lovo AI | ~2 min audio | ~$0.50 |
| DALL-E 3 (optional) | 5 images | ~$0.40 |
| Total | ~$1.00 |
Compare this to hiring a voice actor ($50-200) and video editor ($100-500).
Production Tips
1. Batch processing - Generate multiple videos in parallel
2. Cache voices - Lovo voices are consistent, cache common phrases
3. Template variations - Create multiple Remotion templates for variety
4. Quality control - Always preview before publishing
5. A/B test intros - Different title cards perform differently
Conclusion
You now have a complete AI video production pipeline. From a single topic prompt, you can generate professional videos with:
The entire process takes minutes instead of hours, and costs dollars instead of hundreds. Scale this to produce educational content, marketing videos, or social media clips at unprecedented speed.