Back to articles
Tutorial

Automate Video Production with AI: Combining OpenAI + Lovo AI + Remotion (React)

Build a complete AI video pipeline that generates scripts with GPT, creates voiceovers with Lovo AI, and renders professional videos with Remotion. Full code included.

18 min read

The AI Video Production Pipeline

Creating videos traditionally requires:

  • A scriptwriter
  • A voice actor or narrator
  • Video editing software
  • Hours of manual work per video
  • With the right AI tools, you can automate this entire workflow. In this tutorial, we will build a pipeline that:

    1. OpenAI GPT - Generates video scripts from a topic

    2. Lovo AI - Converts scripts to professional voiceovers

    3. Remotion - Renders React components as video with synchronized audio

    By the end, you will have a system that can produce a complete video from just a topic prompt.

    Architecture Overview

    [Topic Input]
         |
         v
    [OpenAI GPT] --> Script with timestamps
         |
         v
    [Lovo AI API] --> Audio file (.mp3) + word timestamps
         |
         v
    [Remotion] --> Synchronized video with visuals + audio
         |
         v
    [Final MP4]

    Prerequisites

  • Node.js 18+
  • OpenAI API key
  • Lovo AI API key (get from genny.lovo.ai)
  • Basic React knowledge
  • Step 1: Project Setup

    npx create-video@latest ai-video-pipeline
    cd ai-video-pipeline
    npm install openai axios

    The `create-video` command sets up a Remotion project with everything configured.

    Step 2: Script Generation with OpenAI

    Create `src/lib/script-generator.ts`:

    import OpenAI from 'openai';
    
    const openai = new OpenAI({
      apiKey: process.env.OPENAI_API_KEY,
    });
    
    export interface ScriptSegment {
      text: string;
      visualDescription: string;
      duration: number; // estimated seconds
    }
    
    export interface VideoScript {
      title: string;
      segments: ScriptSegment[];
      totalDuration: number;
    }
    
    export async function generateScript(topic: string): Promise<VideoScript> {
      const response = await openai.chat.completions.create({
        model: 'gpt-4o',
        messages: [
          {
            role: 'system',
            content: `You are a professional video scriptwriter. Generate engaging, concise scripts for educational/explainer videos.
    
    Output format (JSON):
    {
      "title": "Video title",
      "segments": [
        {
          "text": "Narration text for this segment",
          "visualDescription": "What should appear on screen",
          "duration": 5
        }
      ]
    }
    
    Guidelines:
    - Each segment should be 3-8 seconds of narration
    - Keep total video under 2 minutes
    - Write conversational, engaging copy
    - Visual descriptions should be simple and achievable with motion graphics`,
          },
          {
            role: 'user',
            content: `Create a video script about: ${topic}`,
          },
        ],
        response_format: { type: 'json_object' },
      });
    
      const script = JSON.parse(response.choices[0].message.content!) as VideoScript;
      script.totalDuration = script.segments.reduce((sum, s) => sum + s.duration, 0);
      
      return script;
    }

    Step 3: Voice Generation with Lovo AI

    Create `src/lib/voice-generator.ts`:

    import axios from 'axios';
    import fs from 'fs';
    import path from 'path';
    
    const LOVO_API_URL = 'https://api.genny.lovo.ai/api/v1';
    
    interface LovoVoice {
      id: string;
      displayName: string;
      locale: string;
    }
    
    interface WordTimestamp {
      word: string;
      start: number; // milliseconds
      end: number;
    }
    
    export interface VoiceResult {
      audioPath: string;
      duration: number;
      wordTimestamps: WordTimestamp[];
    }
    
    export async function getVoices(): Promise<LovoVoice[]> {
      const response = await axios.get(`${LOVO_API_URL}/speakers`, {
        headers: {
          'X-API-KEY': process.env.LOVO_API_KEY,
        },
      });
      return response.data.data;
    }
    
    export async function generateVoiceover(
      text: string,
      voiceId: string,
      outputPath: string
    ): Promise<VoiceResult> {
      // Step 1: Create TTS job
      const createResponse = await axios.post(
        `${LOVO_API_URL}/tts`,
        {
          speaker: voiceId,
          text: text,
          speed: 1.0,
        },
        {
          headers: {
            'X-API-KEY': process.env.LOVO_API_KEY,
            'Content-Type': 'application/json',
          },
        }
      );
    
      const jobId = createResponse.data.id;
    
      // Step 2: Poll for completion
      let result;
      while (true) {
        const statusResponse = await axios.get(`${LOVO_API_URL}/tts/${jobId}`, {
          headers: { 'X-API-KEY': process.env.LOVO_API_KEY },
        });
    
        if (statusResponse.data.status === 'succeeded') {
          result = statusResponse.data;
          break;
        } else if (statusResponse.data.status === 'failed') {
          throw new Error('Voice generation failed');
        }
    
        await new Promise((r) => setTimeout(r, 1000));
      }
    
      // Step 3: Download audio file
      const audioResponse = await axios.get(result.urls[0], {
        responseType: 'arraybuffer',
      });
    
      fs.writeFileSync(outputPath, Buffer.from(audioResponse.data));
    
      return {
        audioPath: outputPath,
        duration: result.duration,
        wordTimestamps: result.wordTimestamps || [],
      };
    }
    
    export async function generateFullVoiceover(
      segments: { text: string }[],
      voiceId: string,
      outputDir: string
    ): Promise<VoiceResult[]> {
      const results: VoiceResult[] = [];
    
      for (let i = 0; i < segments.length; i++) {
        const outputPath = path.join(outputDir, `segment-${i}.mp3`);
        console.log(`Generating voiceover for segment ${i + 1}/${segments.length}`);
        
        const result = await generateVoiceover(
          segments[i].text,
          voiceId,
          outputPath
        );
        results.push(result);
      }
    
      return results;
    }

    Step 4: Video Composition with Remotion

    Create `src/Video.tsx`:

    import { AbsoluteFill, Audio, Sequence, useCurrentFrame, useVideoConfig, interpolate, spring } from 'remotion';
    
    interface Segment {
      text: string;
      visualDescription: string;
      audioPath: string;
      duration: number; // in seconds
      wordTimestamps?: { word: string; start: number; end: number }[];
    }
    
    interface VideoProps {
      title: string;
      segments: Segment[];
    }
    
    const TitleCard: React.FC<{ title: string }> = ({ title }) => {
      const frame = useCurrentFrame();
      const { fps } = useVideoConfig();
      
      const opacity = interpolate(frame, [0, 30], [0, 1], { extrapolateRight: 'clamp' });
      const scale = spring({ frame, fps, config: { damping: 200 } });
    
      return (
        <AbsoluteFill className="bg-gradient-to-br from-blue-900 to-purple-900 flex items-center justify-center">
          <h1
            style={{ opacity, transform: `scale(${scale})` }}
            className="text-6xl font-bold text-white text-center px-20"
          >
            {title}
          </h1>
        </AbsoluteFill>
      );
    };
    
    const ContentSegment: React.FC<{ segment: Segment }> = ({ segment }) => {
      const frame = useCurrentFrame();
      const { fps } = useVideoConfig();
      
      const textOpacity = interpolate(frame, [0, 15], [0, 1]);
      const slideIn = spring({ frame, fps, config: { damping: 100 } });
    
      return (
        <AbsoluteFill className="bg-gradient-to-br from-slate-900 to-slate-800">
          {/* Audio for this segment */}
          <Audio src={segment.audioPath} />
          
          {/* Visual description as background context */}
          <div className="absolute top-10 left-10 text-slate-500 text-sm">
            {segment.visualDescription}
          </div>
          
          {/* Main text with animation */}
          <div className="flex items-center justify-center h-full px-20">
            <p
              style={{
                opacity: textOpacity,
                transform: `translateY(${(1 - slideIn) * 50}px)`,
              }}
              className="text-4xl text-white text-center leading-relaxed font-medium"
            >
              {segment.text}
            </p>
          </div>
          
          {/* Animated word highlights (if timestamps available) */}
          {segment.wordTimestamps && (
            <WordHighlighter
              words={segment.wordTimestamps}
              frame={frame}
              fps={fps}
            />
          )}
        </AbsoluteFill>
      );
    };
    
    const WordHighlighter: React.FC<{
      words: { word: string; start: number; end: number }[];
      frame: number;
      fps: number;
    }> = ({ words, frame, fps }) => {
      const currentTimeMs = (frame / fps) * 1000;
      
      return (
        <div className="absolute bottom-20 left-0 right-0 flex justify-center gap-2 px-10 flex-wrap">
          {words.map((w, i) => {
            const isActive = currentTimeMs >= w.start && currentTimeMs <= w.end;
            return (
              <span
                key={i}
                className={`text-2xl transition-colors ${
                  isActive ? 'text-yellow-400 font-bold' : 'text-slate-400'
                }`}
              >
                {w.word}
              </span>
            );
          })}
        </div>
      );
    };
    
    export const MyVideo: React.FC<VideoProps> = ({ title, segments }) => {
      const { fps } = useVideoConfig();
      const TITLE_DURATION = 3 * fps; // 3 seconds for title
    
      let currentFrame = TITLE_DURATION;
    
      return (
        <>
          {/* Title sequence */}
          <Sequence from={0} durationInFrames={TITLE_DURATION}>
            <TitleCard title={title} />
          </Sequence>
    
          {/* Content segments */}
          {segments.map((segment, index) => {
            const segmentFrames = Math.ceil(segment.duration * fps);
            const sequence = (
              <Sequence
                key={index}
                from={currentFrame}
                durationInFrames={segmentFrames}
              >
                <ContentSegment segment={segment} />
              </Sequence>
            );
            currentFrame += segmentFrames;
            return sequence;
          })}
        </>
      );
    };

    Step 5: The Main Pipeline

    Create `src/pipeline.ts`:

    import { bundle } from '@remotion/bundler';
    import { renderMedia, selectComposition } from '@remotion/renderer';
    import path from 'path';
    import fs from 'fs';
    import { generateScript } from './lib/script-generator';
    import { generateFullVoiceover, getVoices } from './lib/voice-generator';
    
    const OUTPUT_DIR = './output';
    const AUDIO_DIR = './output/audio';
    
    async function createVideo(topic: string) {
      console.log('Step 1: Generating script...');
      const script = await generateScript(topic);
      console.log(`Generated script: "${script.title}" with ${script.segments.length} segments`);
    
      // Ensure output directories exist
      fs.mkdirSync(AUDIO_DIR, { recursive: true });
    
      console.log('Step 2: Getting available voices...');
      const voices = await getVoices();
      const selectedVoice = voices.find((v) => v.locale.startsWith('en-US')) || voices[0];
      console.log(`Using voice: ${selectedVoice.displayName}`);
    
      console.log('Step 3: Generating voiceovers...');
      const voiceResults = await generateFullVoiceover(
        script.segments,
        selectedVoice.id,
        AUDIO_DIR
      );
    
      // Combine script with audio results
      const segmentsWithAudio = script.segments.map((segment, i) => ({
        ...segment,
        audioPath: voiceResults[i].audioPath,
        duration: voiceResults[i].duration / 1000, // Convert ms to seconds
        wordTimestamps: voiceResults[i].wordTimestamps,
      }));
    
      console.log('Step 4: Bundling Remotion project...');
      const bundleLocation = await bundle({
        entryPoint: path.resolve('./src/index.ts'),
        webpackOverride: (config) => config,
      });
    
      console.log('Step 5: Rendering video...');
      const composition = await selectComposition({
        serveUrl: bundleLocation,
        id: 'MyVideo',
        inputProps: {
          title: script.title,
          segments: segmentsWithAudio,
        },
      });
    
      const outputPath = path.join(OUTPUT_DIR, `${script.title.replace(/[^a-z0-9]/gi, '-')}.mp4`);
    
      await renderMedia({
        composition,
        serveUrl: bundleLocation,
        codec: 'h264',
        outputLocation: outputPath,
        inputProps: {
          title: script.title,
          segments: segmentsWithAudio,
        },
      });
    
      console.log(`Video rendered successfully: ${outputPath}`);
      return outputPath;
    }
    
    // Run the pipeline
    const topic = process.argv[2] || 'How to learn programming in 2026';
    createVideo(topic).catch(console.error);

    Step 6: Run the Pipeline

    # Set environment variables
    export OPENAI_API_KEY=your_openai_key
    export LOVO_API_KEY=your_lovo_key
    
    # Generate a video
    npx ts-node src/pipeline.ts "5 Tips for Better Code Reviews"

    The pipeline will:

    1. Generate a script with GPT-4o

    2. Create voiceovers for each segment with Lovo AI

    3. Render a synchronized video with Remotion

    Advanced: Adding B-Roll and Images

    Enhance your videos with AI-generated images:

    import OpenAI from 'openai';
    
    const openai = new OpenAI();
    
    async function generateVisual(description: string): Promise<string> {
      const response = await openai.images.generate({
        model: 'dall-e-3',
        prompt: `Clean, modern illustration for a video: ${description}. Minimal style, suitable for educational content.`,
        size: '1792x1024',
        quality: 'standard',
      });
    
      return response.data[0].url!;
    }

    Then in your Remotion component:

    <Img src={segment.visualUrl} className="absolute inset-0 object-cover opacity-30" />

    Cost Analysis

    For a 2-minute video:

    ServiceUsageCost
    OpenAI GPT-4o~500 tokens~$0.01
    Lovo AI~2 min audio~$0.50
    DALL-E 3 (optional)5 images~$0.40
    Total~$1.00

    Compare this to hiring a voice actor ($50-200) and video editor ($100-500).

    Production Tips

    1. Batch processing - Generate multiple videos in parallel

    2. Cache voices - Lovo voices are consistent, cache common phrases

    3. Template variations - Create multiple Remotion templates for variety

    4. Quality control - Always preview before publishing

    5. A/B test intros - Different title cards perform differently

    Conclusion

    You now have a complete AI video production pipeline. From a single topic prompt, you can generate professional videos with:

  • AI-written scripts
  • Natural-sounding voiceovers
  • Synchronized animations
  • Consistent branding
  • The entire process takes minutes instead of hours, and costs dollars instead of hundreds. Scale this to produce educational content, marketing videos, or social media clips at unprecedented speed.

    Found this helpful?Share this article with your network to help others discover useful AI insights.