AudioCraft:最先进的语音和文本翻译的基础模型(meta)

2024年7月2日 单位
AudioCraft:最先进的语音和文本翻译的基础模型(meta)
郝彦飞

Seamless is a family of AI models that enable more natural and authentic communication across languages. SeamlessM4T is a massive multilingual multimodal machine translation model supporting around 100 languages. SeamlessM4T serves as foundation for SeamlessExpressive, a model that preserves elements of prosody and voice style across languages and SeamlessStreaming, a model supporting simultaneous translation and streaming ASR for around 100 languages. SeamlessExpressive and SeamlessStreaming are combined into Seamless, a unified model featuring multilinguality, real-time and expressive translations.

Seamless 是一系列 AI 模型,可实现更自然、更真实的跨语言交流。 SeamlessM4T 是一个大规模的多语言多模式机器翻译模型,支持大约 100 种语言。 SeamlessM4T 是 SeamlessExpressive 和 SeamlessStreaming 的基础,SeamlessExpressive 是一个保留跨语言韵律和语音风格元素的模型,而 SeamlessStreaming 是一个支持约 100 种语言的同声翻译和流式 ASR 的模型。 SeamlessExpressive 和 SeamlessStreaming 结合成 Seamless,一个具有多语言、实时和富有表现力翻译的统一模型。

SeamlessM4T models support the tasks of:

SeamlessM4T 模型支持以下任务:

  • Speech-to-speech translation (S2ST)
    语音到语音翻译 (S2ST)
  • Speech-to-text translation (S2TT)
    语音到文本翻译 (S2TT)
  • Text-to-speech translation (T2ST)
    文本到语音翻译 (T2ST)
  • Text-to-text translation (T2TT)
    文本到文本翻译 (T2TT)
  • Automatic speech recognition (ASR)
    自动语音识别 (ASR)


源码地址:https://github.com/facebookresearch/seamless_communication