TTS Speaker API

tetos.get_speaker(name: str) type[Speaker]

Get a speaker by name.

Parameters:

name (str) – The lowercase name of the speaker.

Raises:

ValueError – If the speaker is not found.

Returns:

The speaker class.

Return type:

type[Speaker]

Azure

class tetos.azure.AzureSpeaker(speech_key: str, speech_region: str, voice: str | None = None)

Azure TTS speaker.

Parameters:
  • speech_key (str) – The Azure Speech key.

  • speech_region (str) – The Azure Speech region.

  • voice (str, optional) – The voice to use.

classmethod get_command() Command

Return a Click command for the speaker.

Returns:

The Click command.

Return type:

click.Command

classmethod list_voices() list[str]

List the available voices for the speaker.

Returns:

A list of voice names.

Return type:

list[str]

say(text: str, out_file: str | Path | None = None, lang: str = 'en-US') float

A synchronous version of synthesize()

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

async stream(text: str, lang: str = 'en-US') AsyncGenerator[bytes, None]

Generate speech from text as a stream.

Parameters:
  • text (str) – The text to synthesize.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Raises:
  • Duration – If the speech is generated successfully, it contains the duration.

  • SynthesizeError – If the speech is not generated successfully.

async synthesize(text: str, out_file: str | Path, lang: str = 'en-US') float

Generate speech from text and save it to a file.

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Returns:

The duration of the speech in seconds.

Return type:

float

Baidu

class tetos.baidu.BaiduSpeaker(api_key: str, secret_key: str, voice: str | None = None, speed: int = 5, pitch: int = 5, volume: int = 5)

Baidu TTS speaker.

Parameters:
  • api_key (str) – The Baidu API key.

  • secret_key (str) – The Baidu secret key.

  • voice (str) – The voice to use.

  • speed (int) – The speed of speech, from 0 to 15. Defaults to 5.

  • pitch (int) – The pitch of speech, from 0 to 15. Defaults to 5.

  • volume (int) – The volume of speech, from 0 to 9(basic) and 0 to 15(high quality). Defaults to 5.

classmethod get_command() Command

Return a Click command for the speaker.

Returns:

The Click command.

Return type:

click.Command

classmethod list_voices() list[str]

List the available voices for the speaker.

Returns:

A list of voice names.

Return type:

list[str]

say(text: str, out_file: str | Path | None = None, lang: str = 'en-US') float

A synchronous version of synthesize()

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

async stream(text: str, lang: str = 'en-US') AsyncGenerator[bytes, None]

Generate speech from text as a stream.

Parameters:
  • text (str) – The text to synthesize.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Raises:
  • Duration – If the speech is generated successfully, it contains the duration.

  • SynthesizeError – If the speech is not generated successfully.

async synthesize(text: str, out_file: str | Path, lang: str = 'en-US') float

Generate speech from text and save it to a file.

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Returns:

The duration of the speech in seconds.

Return type:

float

Edge

class tetos.edge.EdgeSpeaker(voice: str | None = None, rate: str = '+0%', pitch: str = '+0Hz', volume: str = '+0%')

Edge TTS speaker.

Parameters:
  • voice (str) – The voice to use.

  • rate (str) – The rate of speech.

  • pitch (str) – The pitch of speech.

  • volume (str) – The volume of speech.

classmethod get_command() Command

Return a Click command for the speaker.

Returns:

The Click command.

Return type:

click.Command

classmethod list_voices() list[str]

List the available voices for the speaker.

Returns:

A list of voice names.

Return type:

list[str]

say(text: str, out_file: str | Path | None = None, lang: str = 'en-US') float

A synchronous version of synthesize()

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

async stream(text: str, lang: str = 'en-US') AsyncGenerator[bytes, None]

Generate speech from text as a stream.

Parameters:
  • text (str) – The text to synthesize.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Raises:
  • Duration – If the speech is generated successfully, it contains the duration.

  • SynthesizeError – If the speech is not generated successfully.

async synthesize(text: str, out_file: str | Path, lang: str = 'en-US') float

Generate speech from text and save it to a file.

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Returns:

The duration of the speech in seconds.

Return type:

float

Google

Synthesizes speech from the input string of text.

class tetos.google.GoogleSpeaker(*, voice: str | None = None, speaking_rate: float = 1.0, pitch: float = 0.0, volume_gain_db: float = 0.0)

Google TTS speaker.

Parameters:
  • voice (str) – The voice to use. Defaults to “en-US-Studio-M”.

  • speaking_rate (float) – Optional. Input only. Speaking rate/speed, in the range [0.25, 4.0]. 1.0 is the normal native speed supported by the specific voice. 2.0 is twice as fast, and 0.5 is half as fast. If unset(0.0), defaults to the native 1.0 speed. Any other values < 0.25 or > 4.0 will return an error.

  • pitch (float) – Optional. Input only. Speaking pitch, in the range [-20.0, 20.0]. 20 means increase 20 semitones from the original pitch. -20 means decrease 20 semitones from the original pitch.

  • volume_gain_db (float) – Optional. Input only. Volume gain (in dB) of the normal native volume supported by the specific voice, in the range [-96.0, 16.0]. If unset, or set to a value of 0.0 (dB), will play at normal native signal amplitude. A value of -6.0 (dB) will play at approximately half the amplitude of the normal native signal amplitude. A value of +6.0 (dB) will play at approximately twice the amplitude of the normal native signal amplitude. Strongly recommend not to exceed +10 (dB) as there’s usually no effective increase in loudness for any value greater than that.

classmethod get_command() Command

Return a Click command for the speaker.

Returns:

The Click command.

Return type:

click.Command

classmethod list_voices() list[str]

List the available voices for the speaker.

Returns:

A list of voice names.

Return type:

list[str]

say(text: str, out_file: str | Path | None = None, lang: str = 'en-US') float

A synchronous version of synthesize()

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

async stream(text: str, lang: str = 'en-US') AsyncGenerator[bytes, None]

Generate speech from text as a stream.

Parameters:
  • text (str) – The text to synthesize.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Raises:
  • Duration – If the speech is generated successfully, it contains the duration.

  • SynthesizeError – If the speech is not generated successfully.

async synthesize(text: str, out_file: str | Path, lang: str = 'en-US') float

Generate speech from text and save it to a file.

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Returns:

The duration of the speech in seconds.

Return type:

float

OpenAI

class tetos.openai.OpenAISpeaker(*, model: str = 'tts-1', voice: str | None = None, speed: float | None = None, api_key: str | None, api_base: str | None)

OpenAI TTS speaker.

Parameters:
  • model (str) – The model to use. Defaults to “tts-1”.

  • voice (str) – The voice to use. Defaults to “alloy”.

  • speed (float, optional) – The speed of the speech.

  • api_key (str, optional) – The OpenAI API key.

  • api_base (str, optional) – The OpenAI API base URL.

classmethod get_command() Command

Return a Click command for the speaker.

Returns:

The Click command.

Return type:

click.Command

classmethod list_voices() list[str]

List the available voices for the speaker.

Returns:

A list of voice names.

Return type:

list[str]

say(text: str, out_file: str | Path | None = None, lang: str = 'en-US') float

A synchronous version of synthesize()

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

async stream(text: str, lang: str = 'en-US') AsyncGenerator[bytes, None]

Generate speech from text as a stream.

Parameters:
  • text (str) – The text to synthesize.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Raises:
  • Duration – If the speech is generated successfully, it contains the duration.

  • SynthesizeError – If the speech is not generated successfully.

async synthesize(text: str, out_file: str | Path, lang: str = 'en-US') float

Generate speech from text and save it to a file.

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Returns:

The duration of the speech in seconds.

Return type:

float

Volcengine

class tetos.volc.VolcSpeaker(access_key: str, secret_key: str, app_key: str, *, voice: str | None = None, sample_rate: int = 24000, speech_rate: int = 0, pitch_rate: int = 0)

Volcengine TTS speaker.

Parameters:
  • access_key (str) – The access key ID.

  • secret_key (str) – The access secret key.

  • app_key (str) – The app key.

  • voice (str, optional) – The voice to use.

  • sample_rate (int, optional) – The sample rate. Available values: [8000,16000,22050,24000,32000,44100,48000], Defaults to 24000.

  • speech_rate (int, optional) – The speech rate. It should be in range [-50,100]. 100 means 2x speed and -50 means half speed. Defaults to 0.

  • pitch_rate (int, optional) – The pitch rate. It should be in range [-12,12]. Defaults to 0.

classmethod get_command() Command

Return a Click command for the speaker.

Returns:

The Click command.

Return type:

click.Command

classmethod list_voices() list[str]

List the available voices for the speaker.

Returns:

A list of voice names.

Return type:

list[str]

say(text: str, out_file: str | Path | None = None, lang: str = 'en-US') float

A synchronous version of synthesize()

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

async stream(text: str, lang: str = 'en-US') AsyncGenerator[bytes, None]

Generate speech from text as a stream.

Parameters:
  • text (str) – The text to synthesize.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Raises:
  • Duration – If the speech is generated successfully, it contains the duration.

  • SynthesizeError – If the speech is not generated successfully.

async synthesize(text: str, out_file: str | Path, lang: str = 'en-US') float

Generate speech from text and save it to a file.

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Returns:

The duration of the speech in seconds.

Return type:

float

Minimax

class tetos.minimax.MinimaxSpeaker(api_key: str, group_id: str, model: str = 'speech-01', voice: str | None = None, timber_weights: list[TimberWeight] | None = None, speed: float | None = None, vol: float | int | None = None, pitch: int | None = None)

MiniMax TTS speaker.

Parameters:
  • api_key (str) – The MiniMax API key.

  • group_id (str) – The MiniMax group ID.

  • model (str) – The model to use. Defaults to “speech-01”.

  • voice (str) – The voice to use.

  • timber_weights (list[TimberWeight]) – The timber weights.

  • speed (float) – The speed of speech. Range [0.5, 2.0]. Defaults to 1.0.

  • vol (float | int) – The volume of speech. Range (0, 10]. Defaults to 1.

  • pitch (int) – The pitch of speech. Range [-12, 12]. Defaults to 0.

classmethod get_command() Command

Return a Click command for the speaker.

Returns:

The Click command.

Return type:

click.Command

classmethod list_voices() list[str]

List the available voices for the speaker.

Returns:

A list of voice names.

Return type:

list[str]

say(text: str, out_file: str | Path | None = None, lang: str = 'en-US') float

A synchronous version of synthesize()

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

async stream(text: str, lang: str = 'en-US') AsyncGenerator[bytes, None]

Generate speech from text as a stream.

Parameters:
  • text (str) – The text to synthesize.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Raises:
  • Duration – If the speech is generated successfully, it contains the duration.

  • SynthesizeError – If the speech is not generated successfully.

async synthesize(text: str, out_file: str | Path, lang: str = 'en-US') float

Generate speech from text and save it to a file.

Parameters:
  • text (str) – The text to synthesize.

  • out_file (Path) – The file to save the speech to.

  • lang (str) – The language code of the text. e.g. “en-US”, “fr-FR”.

Returns:

The duration of the speech in seconds.

Return type:

float

class tetos.minimax.TimberWeight
voice_id: str

系统音色

weight: int

取值[1,100]。最多支持4种音色混合,取值为整数,单一音色取值占比越高,合成音色越像。