ChatGPT + Gemini
This hands-on tutorial shows you how to turn one idea into a short, polished video using
ChatGPT for the script and Google Gemini for visuals, voice, and assembly.
The workflow is simple, repeatable, and designed for global readers. You will plan the idea, write a natural
spoken script, create or restyle your avatar, animate the delivery, add captions and background, and export in
the correct sizes for YouTube Shorts, Instagram Reels, and TikTok, without a studio.
Step 1 — Lock the idea and audience
Start with one clear promise. Pick a topic you can teach in under a minute and define who the video is for.
Keep outcomes specific, for example “3 focus hacks for exams” or “one Canva trick for faster thumbnails.”
Ask ChatGPT to turn that seed into a short outline with a hook, three quick beats, and a single call-to-action.
Prompt for ChatGPT (outline)
You are my shorts producer. Turn this seed topic into a 60–75s outline.
Topic: {your topic}. Audience: {students / creators / small business}.
Give me a hook under 7 seconds, three concise teaching beats, and one CTA.
Use global, simple English and avoid slang.
Step 2 — Write a speakable script
Convert the outline into short, spoken lines. Use present tense and keep sentences under fifteen words.
Add light stage directions in brackets so you know where to pause, gesture, or cut to B-roll. Aim for
130–160 words total. End with one clear action like “Comment your biggest blocker” or “Save this for later.”
Prompt for ChatGPT (script)
Turn this outline into a 60–75s script for an on-camera avatar. Short spoken sentences, present tense, global English. Add [stage directions] for gestures, captions, and B-roll cutaways. End with one CTA only.
Step 3 — Create or restyle your avatar in Gemini
If you have a good photo, upload it and ask Gemini to keep identity while applying soft daylight and natural skin texture.
If you prefer a stylized brand character, provide a style reference and request the same facial identity with simplified details.
Export a shoulders-up PNG on transparent and a version on a neutral background so you can composite easily later.
Prompt for Gemini (avatar from photo)
Create a clean, photoreal avatar from the uploaded photo, keeping identity 100%. Soft daylight, natural skin texture (avoid plastic look), subtle catchlights, shoulder-up framing. Export PNG on transparent + a neutral background version (2048 px).
Step 4 — Voice and lip-sync
Generate a friendly voice that fits your audience, or record your own and let Gemini clean it.
Animate the avatar with accurate lip sync, natural eye blinks, and slight head motion.
Keep pacing comfortable for subtitles and silent autoplay.
Prompt for Gemini (animate + voice)
Animate this avatar to speak the script. Keep lip-sync accurate, eye blinks natural, and gentle head motion. Pace for global comprehension. Export vertical 1080×1920 MP4.
Step 5 — Background, B-roll, and captions
Place the talking avatar on a simple blurred background (city, desk, or color gradient) to maintain focus.
Add two quick B-roll moments that visualize your key beats. Generate or auto-transcribe captions and keep
them large, high contrast, and no more than two lines at a time.
Prompt for Gemini (composite + captions)
Composite the speaking avatar over a softly blurred outdoor background. Insert two 5-second B-roll cutaways matching beats 2 and 3. Auto-add captions in large, high-contrast text with safe margins. Keep total duration 60–75 seconds.
Step 6 — Export for every platform
Export a vertical master at 1080×1920 for Shorts/Reels/TikTok. Create a landscape cut at 1920×1080 if
you want YouTube feed compatibility and a square 1200×1200 for Facebook or Instagram posts. Use the same
opening hook and CTA across all versions for consistent testing.
Prompt for Gemini (deliverables)
Export three versions of the finished video: 1) Vertical 1080×1920 (primary), 2) Square 1200×1200 (feed), 3) Landscape 1920×1080. Keep audio levels consistent (-14 LUFS), hardcode captions, and include a clean, text-free thumbnail frame.
Step 7 — Title, description, and posting rhythm
Write a title under sixty characters that names the payoff. Keep the description one short paragraph with one link and two or three hashtags.
Maintain a steady rhythm—two or three videos per week beats a single long upload with no follow-up.
Prompt for ChatGPT (metadata)
Write a high-CTR title under 60 characters, a one-paragraph description, and three short hashtags for this script. Keep it clear for a global audience.
Step 8 — Measure and improve
After posting, watch audience retention. If viewers drop before ten seconds, tighten the hook or start on a result shot.
If they rewatch the tip section, isolate that beat into a new Short. Keep a small spreadsheet of hook line, watch time,
and thumbnail used so you learn what works for your niche.
Quick troubleshooting
If the face looks too smooth, ask Gemini for more realistic skin texture. If the voice feels fast, re-synthesize at a slower rate.
If captions crowd the frame, increase line height and add a safe lower margin. If the avatar drifts across videos, always upload
your best base avatar as an identity reference and say “match identity exactly.”
One-shot build (optional)
If you want a fast run, paste the script and use the one-shot command below. It will produce a complete vertical short with captions,
then give you a square and landscape cut automatically.
Prompt for Gemini (one-shot)
Create a 60–75 second vertical AI vlog. Use the uploaded avatar as identity reference. Read this script naturally with accurate lip-sync and eye blinks. Place avatar over a clean, blurred outdoor background, add two short B-roll cutaways, and auto-caption in high-contrast text. Export 1080×1920 MP4, plus square 1200×1200 and landscape 1920×1080 versions.
