Skip to content

How this works

gexgd0419 edited this page Feb 27, 2024 · 1 revision

This TTS engine works by converting SAPI commands to Speech Synthesis Markup Language (SSML) text, then send it to Azure AI Speech Service via the Speech SDK.

SAPI has been using a different propietary XML format for TTS, see XML TTS Tutorial (SAPI 5.3). Later versions of SAPI support SSML as well.

The SAPI framework always parses the XML text into fragments before passing them into the TTS engine. So even if the input text is already in SSML, it will still be parsed into fragments by SAPI, then be reassembled into SSML by this engine, before it can be sent to Azure AI Speech Service.

This means that voice changing is handled locally by SAPI. <voice> elements won't be passed to the TTS engine or the Speech Service, so you cannot use <voice> to switch to an arbitrary voice supported by the Speech Service. You can still use <voice> to switch to another SAPI voice, including other NaturalVoiceSAPIAdapter voices.

Clone this wiki locally