Note
All models are from the repository: snakers4/silero-models
| Language | Model | Speakers | 
|---|---|---|
| Russian | v4_ru | 5: aidar, baya, kseniya, xenia, eugene | 
| Ukrainian | v4_ua | 1: mykyta | 
| Uzbek | v4_uz | 1: dilnavoz | 
| English | v3_en | 118: en_0, en_1, ..., en_117 | 
| Spanish | v3_es | 3: es_0, es_1, es_2 | 
| French | v3_fr | 6: fr_0, fr_1, fr_2, fr_3, fr_4, fr_5 | 
| German | v3_de | 5: bernd_ungerer, eva_k, friedrich, hokuspokus, karlsson | 
| Tatar | v3_tt | 1: dilyara | 
| Mongolian | v3_xal | 2: erdni, delghir | 
Important
This requires docker installed and the docker daemon running
docker run --rm -p 8000:8000 twirapp/silero-tts-api-serverBuild and run from local repository
Clone the repository:
git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-serverBuild docker image:
docker build -f docker/Dockerfile -t silero-tts-api-server .Run the container:
docker run --rm -p 8000:8000 silero-tts-api-serverOr use docker compose:
docker-compose -f docker/compose.yml upImportant
Minimum requirement python 3.9
This project uses rye for dependency management, it assumes you have installed it
- 
Clone the repository git clone https://github.com/twirapp/silero-tts-api-server.git && cd silero-tts-api-server 
- 
Install dependencies This will automatically create the virtual environment in the .venvdirectory and install the required dependenciesrye sync (not recommended) alternative install via pipCreate a virtual environment and activate:python3 -m venv .venv && source .venv/bin/activate Install only the required dependencies: pip3 install --no-deps -r requirements.lock 
- 
Download silero tts models bash ./install_models.sh 
- 
Run the server litestar run 
Note
The default will be localhost:8000
You can view the automatically generated documentation based on OpenAPI at:
- GET- /generate- Generate audio in wav format from text. Parameters:- text- speaker- sample_rate,- pitch,- rate
- GET- /speakers- Get list of speakers
sample_rate can be set from 8 000, 24 000, 48 000
pitch and rate can be set from 0 to 100
- TEXT_LENGTH_LIMIT- Maximum length of the text to be processed. Default is 930 characters.
- MKL_NUM_THREADS- Number of threads to use for generating audio. Default number of threads: number of CPU cores.
This repository is dedicated to twir.app and is designed to meet its requirements.
TwirApp needs to generate audio using the CPU. If support for other devices such as cuda or mps is needed, please open an issue.
