Speech & TTS Options
Speech Recognition Options
Model Configuration Options
| Option | Type | Default | Description |
|---|---|---|---|
| language | string | ’en’ | Target language for recognition |
| task | string | ’transcribe’ | Task type (‘transcribe’ or ‘translate’) |
| onProgress | function | - | Callback for loading progress updates |
| onComplete | function | - | Callback when loading completes |
| onError | function | - | Callback for error handling |
Recording Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| sampleRate | number | 16000 | Audio sample rate |
| channels | number | 1 | Number of audio channels |
Transcription Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| return_timestamps | boolean | false | Include word timestamps |
| chunk_length_s | number | 30 | Processing chunk length in seconds |
| stride_length_s | number | 5 | Overlap between chunks in seconds |
| language | string | ’en’ | Force specific language |
Text-to-Speech Options
Note: The TTS model can generate audio up to 30 seconds in length. Longer texts will be truncated.
Voice Options
BrowserAI supports multiple voices across different languages:
| Prefix | Language | Description |
|---|---|---|
| af_* | American English (Female) | Bella, Nicole, Sarah, Sky |
| am_* | American English (Male) | Adam, Michael |
| bf_* | British English (Female) | Emma, Isabella |
| bm_* | British English (Male) | George, Lewis |
| hf_* | Hindi (Female) | Alpha, Beta |
| hm_* | Hindi (Male) | Omega, Psi |
| ef_* | Spanish (Female) | Dora |
| em_* | Spanish (Male) | Alex, Santa |
| ff_* | French (Female) | Siwis |
| jf_* | Japanese (Female) | Alpha, Gongitsune, Nezumi, Tebukuro |
| jm_* | Japanese (Male) | Kumo |
| zf_* | Chinese (Female) | Xiaobei, Xiaoni, Xiaoxiao, Xiaoyi |
| zm_* | Chinese (Male) | Yunjian, Yunxi, Yunxia, Yunyang |
TTS Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
| voice | string | ’af’ | Voice ID to use (e.g., ‘af_bella’) |
| speed | number | 1.0 | Speech rate multiplier |
| dtype | string | ’fp32’ | Model precision (‘fp32’ or ‘fp16’) |
Example
const browserAI = new BrowserAI();
// Speech Recognition Example
await browserAI.loadModel('whisper-tiny-en', {
language: 'en',
task: 'transcribe',
onProgress: (progress) => {
console.log('Model loading:', progress.progress + '%');
}
});
await browserAI.startRecording({
sampleRate: 16000,
channels: 1
});
const audioBlob = await browserAI.stopRecording();
const transcription = await browserAI.transcribeAudio(audioBlob, {
return_timestamps: true,
chunk_length_s: 30,
stride_length_s: 5,
language: 'en'
});
// Load the TTS model
await browserAI.loadModel('kokoro-tts', {
dtype: 'fp32',
onProgress: (progress) => {
console.log('Model loading:', progress.progress + '%');
}
});
// Generate speech from text
const audioData = await browserAI.textToSpeech(
"Hello, this is a test message!",
{
voice: "af_bella",
speed: 1.0
}
);
// Play the generated audio
if (audioData) {
const blob = new Blob([audioData], { type: 'audio/wav' });
const audioUrl = URL.createObjectURL(blob);
const audio = new Audio(audioUrl);
audio.onended = () => {
URL.revokeObjectURL(audioUrl); // Clean up
};
await audio.play();
}