Skip to main content
If you don’t have a song’s lyrics on hand, Youka can detect them from the audio. The AI listens to the recording, transcribes the words, and syncs each word to the exact moment it’s sung.

When to use auto-transcription

Auto-transcription works best when:
  • You don’t have access to the song’s lyrics
  • The lead vocal is clear and easy to follow
  • The song uses standard pronunciation in its language
  • You want to get a karaoke up and running quickly without sourcing lyrics manually
If you do have the lyrics, providing them manually typically gives better timing accuracy. See Add Lyrics.

Steps to use auto-transcription

1

Open the Create Karaoke page

Click Create Karaoke from the Youka home page, then upload your file or paste a URL.
2

Select Detect from audio

In the lyrics section, choose Detect from audio instead of I have lyrics.
3

Choose the AI model (optional)

Open Advanced Settings to select the transcription model. Each model offers a different balance of accuracy and credit cost:
ModelStrengthsCredit cost
AudioShakeBest accuracy for most songsStandard
MusicAIPremium detection with syllable-level timingHigher
WhisperBudget-friendly; good for common languagesLower
If you’re unsure which to choose, AudioShake is a reliable default for most use cases.
4

Select the song's language

Choose the language the lyrics are sung in. This helps the model apply the correct pronunciation patterns.
5

Click Create Karaoke

Click Create Karaoke. The AI will listen to the audio, identify the words, and sync each one to the music. Processing typically takes 2–3 minutes.

Credit cost

Auto-transcription uses more credits than providing your own lyrics because the AI performs an extra analysis pass. The exact cost depends on:
  • The duration of the song
  • The AI model you select
The precise credit amount is shown on the creation screen before you confirm.

Reviewing transcription results

AI transcription is accurate for most songs, but it can make mistakes. After your karaoke is created, open the project and check for:
  • Misheard words — the AI may transcribe a word phonetically rather than spelling it correctly
  • Names and proper nouns — song-specific references, artist names, and place names are common error points
  • Stylized pronunciations — words that are deliberately mispronounced or altered as part of the song’s style
Use the Studio editor to correct any errors before fine-tuning the timing.
For songs with heavy reverb, background noise, or multi-part harmonies, consider sourcing the lyrics and using the I have lyrics option instead. It gives the AI a precise reference to work from and usually produces tighter timing.

What’s next

Edit lyrics

Fix any transcription errors directly in the Studio editor.

Manual sync

Fine-tune the timing of individual words and lines.