Automate Video Production with AI: Combining OpenAI + Lovo AI + Remotion (React)

The AI Video Production Pipeline

Creating videos traditionally requires:

A scriptwriter

A voice actor or narrator

Video editing software

Hours of manual work per video

With the right AI tools, you can automate this entire workflow. In this tutorial, we will build a pipeline that:

1. OpenAI GPT - Generates video scripts from a topic

2. Lovo AI - Converts scripts to professional voiceovers

3. Remotion - Renders React components as video with synchronized audio

By the end, you will have a system that can produce a complete video from just a topic prompt.

Architecture Overview

[Topic Input]
     |
     v
[OpenAI GPT] --> Script with timestamps
     |
     v
[Lovo AI API] --> Audio file (.mp3) + word timestamps
     |
     v
[Remotion] --> Synchronized video with visuals + audio
     |
     v
[Final MP4]

Prerequisites

Node.js 18+

OpenAI API key

Lovo AI API key (get from genny.lovo.ai)

Basic React knowledge

Step 1: Project Setup

npx create-video@latest ai-video-pipeline
cd ai-video-pipeline
npm install openai axios

The `create-video` command sets up a Remotion project with everything configured.

Step 2: Script Generation with OpenAI

Create `src/lib/script-generator.ts`:

import OpenAI from 'openai';

const openai = new OpenAI({
  apiKey: process.env.OPENAI_API_KEY,
});

export interface ScriptSegment {
  text: string;
  visualDescription: string;
  duration: number; // estimated seconds
}

export interface VideoScript {
  title: string;
  segments: ScriptSegment[];
  totalDuration: number;
}

export async function generateScript(topic: string): Promise<VideoScript> {
  const response = await openai.chat.completions.create({
    model: 'gpt-4o',
    messages: [
      {
        role: 'system',
        content: `You are a professional video scriptwriter. Generate engaging, concise scripts for educational/explainer videos.

Output format (JSON):
{
  "title": "Video title",
  "segments": [
    {
      "text": "Narration text for this segment",
      "visualDescription": "What should appear on screen",
      "duration": 5
    }
  ]
}

Guidelines:
- Each segment should be 3-8 seconds of narration
- Keep total video under 2 minutes
- Write conversational, engaging copy
- Visual descriptions should be simple and achievable with motion graphics`,
      },
      {
        role: 'user',
        content: `Create a video script about: ${topic}`,
      },
    ],
    response_format: { type: 'json_object' },
  });

  const script = JSON.parse(response.choices[0].message.content!) as VideoScript;
  script.totalDuration = script.segments.reduce((sum, s) => sum + s.duration, 0);
  
  return script;
}

Step 3: Voice Generation with Lovo AI

Create `src/lib/voice-generator.ts`:

import axios from 'axios';
import fs from 'fs';
import path from 'path';

const LOVO_API_URL = 'https://api.genny.lovo.ai/api/v1';

interface LovoVoice {
  id: string;
  displayName: string;
  locale: string;
}

interface WordTimestamp {
  word: string;
  start: number; // milliseconds
  end: number;
}

export interface VoiceResult {
  audioPath: string;
  duration: number;
  wordTimestamps: WordTimestamp[];
}

export async function getVoices(): Promise<LovoVoice[]> {
  const response = await axios.get(`${LOVO_API_URL}/speakers`, {
    headers: {
      'X-API-KEY': process.env.LOVO_API_KEY,
    },
  });
  return response.data.data;
}

export async function generateVoiceover(
  text: string,
  voiceId: string,
  outputPath: string
): Promise<VoiceResult> {
  // Step 1: Create TTS job
  const createResponse = await axios.post(
    `${LOVO_API_URL}/tts`,
    {
      speaker: voiceId,
      text: text,
      speed: 1.0,
    },
    {
      headers: {
        'X-API-KEY': process.env.LOVO_API_KEY,
        'Content-Type': 'application/json',
      },
    }
  );

  const jobId = createResponse.data.id;

  // Step 2: Poll for completion
  let result;
  while (true) {
    const statusResponse = await axios.get(`${LOVO_API_URL}/tts/${jobId}`, {
      headers: { 'X-API-KEY': process.env.LOVO_API_KEY },
    });

    if (statusResponse.data.status === 'succeeded') {
      result = statusResponse.data;
      break;
    } else if (statusResponse.data.status === 'failed') {
      throw new Error('Voice generation failed');
    }

    await new Promise((r) => setTimeout(r, 1000));
  }

  // Step 3: Download audio file
  const audioResponse = await axios.get(result.urls[0], {
    responseType: 'arraybuffer',
  });

  fs.writeFileSync(outputPath, Buffer.from(audioResponse.data));

  return {
    audioPath: outputPath,
    duration: result.duration,
    wordTimestamps: result.wordTimestamps || [],
  };
}

export async function generateFullVoiceover(
  segments: { text: string }[],
  voiceId: string,
  outputDir: string
): Promise<VoiceResult[]> {
  const results: VoiceResult[] = [];

  for (let i = 0; i < segments.length; i++) {
    const outputPath = path.join(outputDir, `segment-${i}.mp3`);
    console.log(`Generating voiceover for segment ${i + 1}/${segments.length}`);
    
    const result = await generateVoiceover(
      segments[i].text,
      voiceId,
      outputPath
    );
    results.push(result);
  }

  return results;
}

Step 4: Video Composition with Remotion

Create `src/Video.tsx`:

import { AbsoluteFill, Audio, Sequence, useCurrentFrame, useVideoConfig, interpolate, spring } from 'remotion';

interface Segment {
  text: string;
  visualDescription: string;
  audioPath: string;
  duration: number; // in seconds
  wordTimestamps?: { word: string; start: number; end: number }[];
}

interface VideoProps {
  title: string;
  segments: Segment[];
}

const TitleCard: React.FC<{ title: string }> = ({ title }) => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();
  
  const opacity = interpolate(frame, [0, 30], [0, 1], { extrapolateRight: 'clamp' });
  const scale = spring({ frame, fps, config: { damping: 200 } });

  return (
    <AbsoluteFill className="bg-gradient-to-br from-blue-900 to-purple-900 flex items-center justify-center">
      <h1
        style={{ opacity, transform: `scale(${scale})` }}
        className="text-6xl font-bold text-white text-center px-20"
      >
        {title}
      </h1>
    </AbsoluteFill>
  );
};

const ContentSegment: React.FC<{ segment: Segment }> = ({ segment }) => {
  const frame = useCurrentFrame();
  const { fps } = useVideoConfig();
  
  const textOpacity = interpolate(frame, [0, 15], [0, 1]);
  const slideIn = spring({ frame, fps, config: { damping: 100 } });

  return (
    <AbsoluteFill className="bg-gradient-to-br from-slate-900 to-slate-800">
      {/* Audio for this segment */}
      <Audio src={segment.audioPath} />
      
      {/* Visual description as background context */}
      <div className="absolute top-10 left-10 text-slate-500 text-sm">
        {segment.visualDescription}
      </div>
      
      {/* Main text with animation */}
      <div className="flex items-center justify-center h-full px-20">
        <p
          style={{
            opacity: textOpacity,
            transform: `translateY(${(1 - slideIn) * 50}px)`,
          }}
          className="text-4xl text-white text-center leading-relaxed font-medium"
        >
          {segment.text}
        </p>
      </div>
      
      {/* Animated word highlights (if timestamps available) */}
      {segment.wordTimestamps && (
        <WordHighlighter
          words={segment.wordTimestamps}
          frame={frame}
          fps={fps}
        />
      )}
    </AbsoluteFill>
  );
};

const WordHighlighter: React.FC<{
  words: { word: string; start: number; end: number }[];
  frame: number;
  fps: number;
}> = ({ words, frame, fps }) => {
  const currentTimeMs = (frame / fps) * 1000;
  
  return (
    <div className="absolute bottom-20 left-0 right-0 flex justify-center gap-2 px-10 flex-wrap">
      {words.map((w, i) => {
        const isActive = currentTimeMs >= w.start && currentTimeMs <= w.end;
        return (
          <span
            key={i}
            className={`text-2xl transition-colors ${
              isActive ? 'text-yellow-400 font-bold' : 'text-slate-400'
            }`}
          >
            {w.word}
          </span>
        );
      })}
    </div>
  );
};

export const MyVideo: React.FC<VideoProps> = ({ title, segments }) => {
  const { fps } = useVideoConfig();
  const TITLE_DURATION = 3 * fps; // 3 seconds for title

  let currentFrame = TITLE_DURATION;

  return (
    <>
      {/* Title sequence */}
      <Sequence from={0} durationInFrames={TITLE_DURATION}>
        <TitleCard title={title} />
      </Sequence>

      {/* Content segments */}
      {segments.map((segment, index) => {
        const segmentFrames = Math.ceil(segment.duration * fps);
        const sequence = (
          <Sequence
            key={index}
            from={currentFrame}
            durationInFrames={segmentFrames}
          >
            <ContentSegment segment={segment} />
          </Sequence>
        );
        currentFrame += segmentFrames;
        return sequence;
      })}
    </>
  );
};

Step 5: The Main Pipeline

Create `src/pipeline.ts`:

import { bundle } from '@remotion/bundler';
import { renderMedia, selectComposition } from '@remotion/renderer';
import path from 'path';
import fs from 'fs';
import { generateScript } from './lib/script-generator';
import { generateFullVoiceover, getVoices } from './lib/voice-generator';

const OUTPUT_DIR = './output';
const AUDIO_DIR = './output/audio';

async function createVideo(topic: string) {
  console.log('Step 1: Generating script...');
  const script = await generateScript(topic);
  console.log(`Generated script: "${script.title}" with ${script.segments.length} segments`);

  // Ensure output directories exist
  fs.mkdirSync(AUDIO_DIR, { recursive: true });

  console.log('Step 2: Getting available voices...');
  const voices = await getVoices();
  const selectedVoice = voices.find((v) => v.locale.startsWith('en-US')) || voices[0];
  console.log(`Using voice: ${selectedVoice.displayName}`);

  console.log('Step 3: Generating voiceovers...');
  const voiceResults = await generateFullVoiceover(
    script.segments,
    selectedVoice.id,
    AUDIO_DIR
  );

  // Combine script with audio results
  const segmentsWithAudio = script.segments.map((segment, i) => ({
    ...segment,
    audioPath: voiceResults[i].audioPath,
    duration: voiceResults[i].duration / 1000, // Convert ms to seconds
    wordTimestamps: voiceResults[i].wordTimestamps,
  }));

  console.log('Step 4: Bundling Remotion project...');
  const bundleLocation = await bundle({
    entryPoint: path.resolve('./src/index.ts'),
    webpackOverride: (config) => config,
  });

  console.log('Step 5: Rendering video...');
  const composition = await selectComposition({
    serveUrl: bundleLocation,
    id: 'MyVideo',
    inputProps: {
      title: script.title,
      segments: segmentsWithAudio,
    },
  });

  const outputPath = path.join(OUTPUT_DIR, `${script.title.replace(/[^a-z0-9]/gi, '-')}.mp4`);

  await renderMedia({
    composition,
    serveUrl: bundleLocation,
    codec: 'h264',
    outputLocation: outputPath,
    inputProps: {
      title: script.title,
      segments: segmentsWithAudio,
    },
  });

  console.log(`Video rendered successfully: ${outputPath}`);
  return outputPath;
}

// Run the pipeline
const topic = process.argv[2] || 'How to learn programming in 2026';
createVideo(topic).catch(console.error);

Step 6: Run the Pipeline

# Set environment variables
export OPENAI_API_KEY=your_openai_key
export LOVO_API_KEY=your_lovo_key

# Generate a video
npx ts-node src/pipeline.ts "5 Tips for Better Code Reviews"

The pipeline will:

1. Generate a script with GPT-4o

2. Create voiceovers for each segment with Lovo AI

3. Render a synchronized video with Remotion

Advanced: Adding B-Roll and Images

Enhance your videos with AI-generated images:

import OpenAI from 'openai';

const openai = new OpenAI();

async function generateVisual(description: string): Promise<string> {
  const response = await openai.images.generate({
    model: 'dall-e-3',
    prompt: `Clean, modern illustration for a video: ${description}. Minimal style, suitable for educational content.`,
    size: '1792x1024',
    quality: 'standard',
  });

  return response.data[0].url!;
}

Then in your Remotion component:

<Img src={segment.visualUrl} className="absolute inset-0 object-cover opacity-30" />

Cost Analysis

For a 2-minute video:

Service	Usage	Cost
OpenAI GPT-4o	~500 tokens	~$0.01
Lovo AI	~2 min audio	~$0.50
DALL-E 3 (optional)	5 images	~$0.40
Total		~$1.00

Compare this to hiring a voice actor ($50-200) and video editor ($100-500).

Production Tips

1. Batch processing - Generate multiple videos in parallel

2. Cache voices - Lovo voices are consistent, cache common phrases

3. Template variations - Create multiple Remotion templates for variety

4. Quality control - Always preview before publishing

5. A/B test intros - Different title cards perform differently

Conclusion

You now have a complete AI video production pipeline. From a single topic prompt, you can generate professional videos with:

AI-written scripts

Natural-sounding voiceovers

Synchronized animations

Consistent branding

The entire process takes minutes instead of hours, and costs dollars instead of hundreds. Scale this to produce educational content, marketing videos, or social media clips at unprecedented speed.