API 参考#

本参考手册详细介绍了 Manim Voiceover 中包含的模块、函数和变量，描述了它们是什么以及它们的功能。要了解如何使用 Manim Voiceover，请参阅快速入门。

配音场景#

class VoiceoverScene(renderer=None, camera_class=<class 'manim.camera.camera.Camera'>, always_update_mobjects=False, random_seed=None, skip_animations=False)[source]#

基类: Scene

一个可用于为场景添加配音的场景类。

add_voiceover_text(text, subcaption=None, max_subcaption_len=70, subcaption_buff=0.1, **kwargs)[source]#

为场景添加配音。

参数:

text (str) – 要朗读的文本。
subcaption (Optional[str], optional) – 备选副标题文本。如果未指定，则选择 text 作为副标题。默认为 None。
max_subcaption_len (int, optional) – 副标题的最大字符数。如果副标题长度超过此值，则会拆分成小于 max_subcaption_len 的片段。默认为 70。
subcaption_buff (float, optional) – 拆分的副标题片段之间的持续时间（秒）。默认为 0.1。

返回:

配音的跟踪器对象。

返回类型:

VoiceoverTracker

add_wrapped_subcaption(subcaption, duration, subcaption_buff=0.1, max_subcaption_len=70)[source]#

为场景添加副标题。如果副标题长度超过 max_subcaption_len，则会拆分成小于 max_subcaption_len 的片段。

参数:

subcaption (str) – 副标题文本。
duration (float) – 副标题的持续时间（秒）。
max_subcaption_len (int, optional) – 副标题的最大字符数。如果副标题长度超过此值，则会拆分成小于 max_subcaption_len 的片段。默认为 70。
subcaption_buff (float, optional) – 拆分的副标题片段之间的持续时间（秒）。默认为 0.1。

返回类型:

无

safe_wait(duration)[source]#

等待给定持续时间。如果持续时间小于一帧，则等待一帧。

参数:: duration (float) – 要等待的持续时间（秒）。
返回类型:: 无

set_speech_service(speech_service, create_subcaption=True)[source]#

设置用于配音的语音服务。此方法应在向场景添加任何配音之前调用。

参数:

speech_service (SpeechService) – 要使用的语音服务。
create_subcaption (bool, optional) – 是否为场景创建副标题。默认为 True。如果 config.save_last_frame 为 True，则此参数将被
created. (忽略，并且不会创建副标题。) –

返回类型:

无

voiceover(text=None, ssml=None, **kwargs)[source]#

用于向场景添加配音的主要函数。

参数:

text (str, optional) – 要朗读的文本。默认为 None。
ssml (str, optional) – 要朗读的 SSML。默认为 None。

产出:

Generator[VoiceoverTracker, None, None] – 配音跟踪器对象。

返回类型:

Generator[VoiceoverTracker, None, None]

wait_for_voiceover()[source]#

等待配音完成。

返回类型:: 无

wait_until_bookmark(mark)[source]#

等待直到达到某个书签。

参数:: mark (str) – 要等待的书签的 mark 属性。
返回类型:: 无

class VoiceoverTracker(scene, data, cache_dir)[source]#

基类: object

用于跟踪场景中配音进度的类。

初始化 VoiceoverTracker 对象。

参数:

scene (Scene) – 配音所属的场景。
path (str) – 包含配音数据的 JSON 文件的路径。
data (dict) –
cache_dir (str) –

get_remaining_duration(buff=0.0)[source]#

返回配音的剩余持续时间。

参数:: buff (float, optional) – 要添加到剩余持续时间中的缓冲区。默认为 0。
返回:: 配音的剩余持续时间（秒）。
返回类型:: int

time_until_bookmark(mark, buff=0, limit=None)[source]#

返回距离书签的时间。

参数:

mark (str) – 要计算到此书签的 mark 属性。
buff (int, optional) – 要添加到剩余持续时间中的缓冲区（秒）。默认为 0。
limit (Optional[int], optional) – 返回的最大值。默认为 None。

返回类型:

int

语音服务#

class SpeechService(global_speed=1.0, cache_dir=None, transcription_model=None, transcription_kwargs={}, **kwargs)[source]#

基类: ABC

语音服务的抽象基类。

参数:

global_speed (float, optional) – 播放音频的速度。默认为 1.00。
cache_dir (str, optional) – 保存音频文件的目录。默认为 voiceovers/。
transcription_model (str, optional) – 用于转录的 OpenAI Whisper 模型。默认为 None。
transcription_kwargs (dict, optional) – 传递给 transcribe() 函数的关键字参数。默认为 {}。

audio_callback(audio_path, data, **kwargs)[source]#

音频文件准备就绪时的回调函数。覆盖此方法以对音频文件执行操作，例如降噪。

参数:

audio_path (str) – 音频文件路径。
data (dict) – 数据字典。

abstract generate_from_text(text, cache_dir=None, path=None)[source]#

为每个语音服务实现此方法。请参阅 AzureService 以获取示例。

参数:

text (str) – 用于合成语音的文本。
cache_dir (str, optional) – 用于保存音频文件和数据的输出目录。默认为 None。
path (str, optional) – 音频文件的保存路径。默认为 None。

返回:

输出数据字典。待办：定义格式。

返回类型:

dict

set_transcription(model=None, kwargs={})[source]#

设置转录模型以及要传递给 transcribe() 函数的关键字参数。

参数:

model (str, optional) – 用于转录的 Whisper 模型。默认为 None。
kwargs (dict, optional) – 传递给 transcribe() 函数的关键字参数。默认为 {}。

class RecorderService(format=None, channels=1, rate=44100, chunk=512, device_index=None, transcription_model='base', trim_silence_threshold=-40.0, trim_buffer_start=200, trim_buffer_end=200, callback_delay=0.05, **kwargs)[source]#

基类: SpeechService

在渲染期间从麦克风录音的语音服务。

初始化语音服务。

参数:

format (int, optional) – 音频格式。默认为 pyaudio.paInt16。
channels (int, optional) – 通道数。默认为 1。
rate (int, optional) – 采样率。默认为 44100。
chunk (int, optional) – 块大小。默认为 512。
device_index (int, optional) – 设备索引，如果您不想每次渲染都选择它。默认为 None。
transcription_model (str, optional) –
用于转录的 OpenAI Whisper 模型。默认为 “base”。
trim_silence_threshold (float, optional) – 静音修剪的阈值（分贝）。默认为 -40.0 dB。
trim_buffer_start (int, optional) – 修剪起始静音的缓冲持续时间。默认为 200 毫秒。
trim_buffer_end (int, optional) – 修剪结束静音的缓冲持续时间。默认为 200 毫秒。
callback_delay (float) –

class AzureService(voice='en-US-AriaNeural', style=None, output_format='Audio48Khz192KBitRateMonoMp3', prosody=None, **kwargs)[source]#

基类: SpeechService

Azure TTS API 的语音服务。

参数:

voice (str, optional) – 要使用的语音。有关所有可用选项，请参阅 API 页面。默认为 en-US-AriaNeural。
style (str, optional) – 要使用的风格。有关如何查看给定语音的可用风格，请参阅 API 页面。默认为 None。
output_format (str, optional) – 要使用的输出格式。有关所有可用选项，请参阅 API 页面。默认为 Audio48Khz192KBitRateMonoMp3。
prosody (dict, optional) – 要使用的全局韵律设置。有关所有可用选项，请参阅 API 页面。默认为 None。

class CoquiService(model_name='tts_models/en/ljspeech/tacotron2-DDC', config_path=None, vocoder_path=None, vocoder_config_path=None, progress_bar=True, gpu=False, speaker_idx=0, language_idx=0, **kwargs)[source]#

基类: SpeechService

Coqui TTS 的语音服务。默认模型：tts_models/en/ljspeech/tacotron2-DDC。

参数:

global_speed (float, optional) – 播放音频的速度。默认为 1.00。
cache_dir (str, optional) – 保存音频文件的目录。默认为 voiceovers/。
transcription_model (str, optional) –
用于转录的 OpenAI Whisper 模型。默认为 None。
transcription_kwargs (dict, optional) – 传递给 transcribe() 函数的关键字参数。默认为 {}。
model_name (str) –
config_path (str) –
vocoder_path (str) –
vocoder_config_path (str) –
progress_bar (bool) –

generate_from_text(text, cache_dir=None, path=None, **kwargs)[source]#

为每个语音服务实现此方法。请参阅 AzureService 以获取示例。

参数:

text (str) – 用于合成语音的文本。
cache_dir (str, optional) – 用于保存音频文件和数据的输出目录。默认为 None。
path (str, optional) – 音频文件的保存路径。默认为 None。

返回:

输出数据字典。待办：定义格式。

返回类型:

dict

class GTTSService(lang='en', tld='com', **kwargs)[source]#

基类: SpeechService

Google 翻译文本转语音 API 的 SpeechService 类。这是 gTTS 库的一个封装。有关更多信息，请参阅 gTTS 文档。

参数:

lang (str, optional) – 用于语音的语言。有关所有可用选项，请参阅 Google 翻译文档。默认为“en”。
tld (str, optional) – Google 翻译 URL 的顶级域。默认为“com”。

class OpenAIService(voice='alloy', model='tts-1-hd', transcription_model='base', **kwargs)[source]#

基类: SpeechService

OpenAI TTS 服务的 SpeechService 类。有关语音和模型的更多信息，请参阅 OpenAI API 页面。

参数:

voice (str, optional) – 要使用的语音。有关所有可用选项，请参阅 API 页面。默认为 "alloy"。
model (str, optional) – 要使用的 TTS 模型。有关所有可用选项，请参阅 API 页面。默认为 "tts-1-hd"。

class PyTTSX3Service(engine=None, **kwargs)[source]#

基类: SpeechService

pyttsx3 的语音服务类。

默认值#

DEEPL_SOURCE_LANG = {'bg': 'Bulgarian', 'cs': 'Czech', 'da': 'Danish', 'de': 'German', 'el': 'Greek', 'en': 'English', 'es': 'Spanish', 'et': 'Estonian', 'fi': 'Finnish', 'fr': 'French', 'hu': 'Hungarian', 'id': 'Indonesian', 'it': 'Italian', 'ja': 'Japanese', 'lt': 'Lithuanian', 'lv': 'Latvian', 'nl': 'Dutch', 'pl': 'Polish', 'pt': 'Portuguese (all Portuguese varieties mixed)', 'ro': 'Romanian', 'ru': 'Russian', 'sk': 'Slovak', 'sl': 'Slovenian', 'sv': 'Swedish', 'tr': 'Turkish', 'uk': 'Ukrainian', 'zh': 'Chinese'}#: DeepL 可用的源语言

DEEPL_TARGET_LANG = {'bg': 'Bulgarian', 'cs': 'Czech', 'da': 'Danish', 'de': 'German', 'el': 'Greek', 'en': 'Alias for en-us', 'en-gb': 'English (British)', 'en-us': 'English (American)', 'es': 'Spanish', 'et': 'Estonian', 'fi': 'Finnish', 'fr': 'French', 'hu': 'Hungarian', 'id': 'Indonesian', 'it': 'Italian', 'ja': 'Japanese', 'lt': 'Lithuanian', 'lv': 'Latvian', 'nl': 'Dutch', 'pl': 'Polish', 'pt': 'Alias for pt-pt', 'pt-br': 'Portuguese (Brazilian)', 'pt-pt': 'Portuguese (all Portuguese varieties excluding Brazilian Portuguese)', 'ro': 'Romanian', 'ru': 'Russian', 'sk': 'Slovak', 'sl': 'Slovenian', 'sv': 'Swedish', 'tr': 'Turkish', 'uk': 'Ukrainian', 'zh': 'Chinese (simplified)'}#: DeepL 可用的目标语言