AI Short Drama Voice and Lip-Sync Guide

What you will learn

How to choose preset voices and use voice cloning
How to apply the 9 core emotion labels
How batch dubbing and incremental generation work
How the two lip-sync model paths differ
How to generate sound effects and schedule them on a timeline
How BGM generation and vocal modes work
How to balance dialogue, music, and effects in the audio mix panel

1. AI voice overview

Linghui AI uses TTS technology to turn script dialogue into speech, then applies lip-sync processing so characters can appear to speak naturally on screen.

Voice workflow

Dialogue text→

Choose voice→

Set emotion→

Generate speech→

Lip sync→

Generate sound effects→

Generate BGM→

Mix audio

2. Voice selection guide

2.1 Preset voice types

Voice type	Traits	Best for
Sweet female	Soft and cute	Young lead actresses and youthful roles
Mature female	Calm and steady	Professional women and supporting female roles
Sunny male	Bright and energetic	Young male leads and school-age roles
Deep male	Low and charismatic	Powerful or mature male characters
Child voice	Young and innocent	Child characters

Every voice supports preview. You can filter by gender and keep exploring additional preset voices inside the platform.

2.2 Selection tips

Match the personality - Softer characters need gentler voices, while dominant roles need stronger presence.
Keep voices distinct - Different roles should sound clearly different from each other.
Preview with the same line - Use the same dialogue line to compare multiple voices fairly.

2.3 Voice cloning

Recording mode

The system provides 5 sample reading scripts and displays one at random.
Recommended recording length is 10 to 30 seconds.
A real-time waveform preview is shown while recording.
Typical flow: start → pause → continue → stop → preview → rerecord or use.
Submission is disabled when the recording is shorter than 10 seconds.

Upload mode

You can also upload an audio file of at least 5 seconds in a supported format to create a cloned voice.

Cloned voice management

Cloned voices support viewing, renaming, deleting, and batch deletion.

✅ Voice cloning is currently a 0-credit feature and can be used repeatedly.

2.4 Global dubbing setup

At the top of the storyboard page, you can assign voices to all roles at once to avoid repeating the same setup shot by shot.

3. Emotion labels

Emotion labels directly affect speaking tone and delivery. The current base set contains 9 emotions.

😊Happy

Rising tone and lively delivery

😢Sad

Lower tone and slower pace

😠Angry

Heavier emphasis and quicker attack

😨Fearful

Tense and trembling delivery

😲Surprised

Rising pitch and short bursts

😐Calm

Flat and neutral delivery

😤Disgusted

Rejecting and dismissive tone

🤫Whisper

Soft, close, and private feeling

📢Shout

Higher volume with strong emphasis

More than 30 extended emotions are also available in script editing. Let emotion evolve with the plot instead of staying flat the whole time.

4. Batch dubbing and incremental generation

Batch generate voices for all storyboard dialogue in one run
Incremental mode only processes unfinished storyboard items and skips completed ones
Dubbing progress is shown as completed N / total M
Each dialogue segment can be previewed and downloaded independently
You can upload a custom recorded voice file for a specific storyboard item

5. Lip sync

5.1 How it works

Analyze phonemes and timing from the generated speech
Generate matching mouth animation
Apply the animation to the character image
Keep the rest of the character stable

5.2 Optimization tips

✅ Front-facing characters usually produce the most natural lip-sync result.

⚠️ Side-angle characters may need a more frontal dialogue shot for better results.

5.3 Model selection

Model	Traits	Recommended for
Tongyi	Natural and smooth result	Dialogue-heavy scenes
Kling	Stable and reliable result	Action-oriented scenes

5.4 Automatic skip

If the selected video generation model already contains embedded audio, such as Vidu, the system skips the lip-sync step automatically.

5.5 Lip-sync status

Progress states: processing (blue) / completed (green) / pending (gray).

6. Sound effect generation

6.1 Overview

Sound effects are generated from natural-language descriptions through AI audio generation.

6.2 Basic mode

Enter a sound-effect prompt such as "thunder in a storm" or "coffee shop ambience" and generate directly.

6.3 Timeline segment mode

You can place multiple sound-effect segments inside one storyboard shot.

Each segment defines start time, end time, and a sound description
Validation rules: end time must be greater than start time, the description cannot be empty, and it must stay within 1500 characters

0-3s: Footsteps approaching from a distance

3-5s: Door creaks open

5-8s: Thunder rolls

6.4 AI prompt recommendation

The system can recommend sound-effect prompts based on scene description, dialogue, emotion, and shot type.

6.5 Upload custom sound effects

Custom sound-effect files in common audio formats are supported.

7. BGM generation

7.1 Overview

Background music is generated through the Doubao Music API.

7.2 Music prompt writing

Edit prompts manually, such as "light piano music" or "tense electronic score"
Use AI-recommended prompts based on the current scene mood

7.3 Vocal mode selection

Mode	Description
Instrumental	Music only, recommended for most scenes
Light vocal hum	Music with soft vocal texture
Lead vocal	Full singing track. Lyrics are required, otherwise it falls back to light vocal hum.

7.4 Upload custom BGM

Custom BGM files such as MP3 are supported.

7.5 Credit usage

Before BGM generation starts, a credit confirmation dialog shows the duration and estimated cost.

8. Audio mixing panel

8.1 Independent three-track volume control

Dialogue volume (0-100)
Music volume (0-100)
Sound-effect volume (0-100)

8.2 Preset mixing profiles

You can apply one-click profiles such as "Dialogue First" with dialogue 100 / music 40 / effects 60.

8.3 Best practice

✅ Keep dialogue the loudest, music next, and sound effects lowest so viewers can always hear the lines clearly.

FAQ

Q: Can I add background music?

Yes. Linghui AI includes built-in BGM generation during the dubbing stage. It supports three vocal modes and also allows custom music uploads.

Q: What should I do if the voice does not match the visuals?

Check whether the storyboard duration is long enough. Longer dialogue often needs longer shot duration.

Q: What if the lip sync looks unnatural?

Use a front-facing character image, avoid overly long continuous dialogue, try switching the lip-sync model, and regenerate the affected shot if needed.

Q: Which is better, voice cloning or preset voices?

Preset voices are stable and require no sampling. Cloned voices are more personalized but depend on recording quality. Start with presets and only clone if needed.

Q: Can sound effects and BGM be used together?

Yes. Use the audio mixing panel to balance them independently and avoid overlap issues.

Q: How do I make dubbing feel more emotional?

Set an appropriate emotion label for each line and let the emotional profile change with the plot. Voice cloning can strengthen the result further.

Next step

Learn video composition and export →