AdvancedText to Speech Options

Speech & TTS Options

Speech Recognition Options

Model Configuration Options

OptionTypeDefaultDescription
languagestring’en’Target language for recognition
taskstring’transcribe’Task type (‘transcribe’ or ‘translate’)
onProgressfunction-Callback for loading progress updates
onCompletefunction-Callback when loading completes
onErrorfunction-Callback for error handling

Recording Parameters

ParameterTypeDefaultDescription
sampleRatenumber16000Audio sample rate
channelsnumber1Number of audio channels

Transcription Parameters

ParameterTypeDefaultDescription
return_timestampsbooleanfalseInclude word timestamps
chunk_length_snumber30Processing chunk length in seconds
stride_length_snumber5Overlap between chunks in seconds
languagestring’en’Force specific language

Text-to-Speech Options

Note: The TTS model can generate audio up to 30 seconds in length. Longer texts will be truncated.

Voice Options

BrowserAI supports multiple voices across different languages:

PrefixLanguageDescription
af_*American English (Female)Bella, Nicole, Sarah, Sky
am_*American English (Male)Adam, Michael
bf_*British English (Female)Emma, Isabella
bm_*British English (Male)George, Lewis
hf_*Hindi (Female)Alpha, Beta
hm_*Hindi (Male)Omega, Psi
ef_*Spanish (Female)Dora
em_*Spanish (Male)Alex, Santa
ff_*French (Female)Siwis
jf_*Japanese (Female)Alpha, Gongitsune, Nezumi, Tebukuro
jm_*Japanese (Male)Kumo
zf_*Chinese (Female)Xiaobei, Xiaoni, Xiaoxiao, Xiaoyi
zm_*Chinese (Male)Yunjian, Yunxi, Yunxia, Yunyang

TTS Parameters

ParameterTypeDefaultDescription
voicestring’af’Voice ID to use (e.g., ‘af_bella’)
speednumber1.0Speech rate multiplier
dtypestring’fp32’Model precision (‘fp32’ or ‘fp16’)

Example

const browserAI = new BrowserAI();
 
// Speech Recognition Example
await browserAI.loadModel('whisper-tiny-en', {
  language: 'en',
  task: 'transcribe',
  onProgress: (progress) => {
    console.log('Model loading:', progress.progress + '%');
  }
});
 
await browserAI.startRecording({
  sampleRate: 16000,
  channels: 1
});
 
const audioBlob = await browserAI.stopRecording();
const transcription = await browserAI.transcribeAudio(audioBlob, {
  return_timestamps: true,
  chunk_length_s: 30,
  stride_length_s: 5,
  language: 'en'
});
 
// Load the TTS model
await browserAI.loadModel('kokoro-tts', {
  dtype: 'fp32',
  onProgress: (progress) => {
    console.log('Model loading:', progress.progress + '%');
  }
});
 
// Generate speech from text
const audioData = await browserAI.textToSpeech(
  "Hello, this is a test message!", 
  {
    voice: "af_bella",
    speed: 1.0
  }
);
 
// Play the generated audio
if (audioData) {
  const blob = new Blob([audioData], { type: 'audio/wav' });
  const audioUrl = URL.createObjectURL(blob);
  const audio = new Audio(audioUrl);
  
  audio.onended = () => {
    URL.revokeObjectURL(audioUrl); // Clean up
  };
  
  await audio.play();
}