Tango Text to Sound | www.masayume.it

categorie

TANGO text to sound

TANGO genera effetti sonori condizionati dal testo, compreso il parlato umano e la musica.

Adotta un LLM FLAN-T5 regolato da istruzioni come codificatore di testo per la generazione di audio da testo (TTA). I lavori precedenti su TTA hanno preaddestrato un codificatore congiunto testo-audio o hanno utilizzato un modello non regolato dalle istruzioni, come il T5. Di conseguenza, questo approccio basato sul modello di diffusione latente (LDM) (TANGO) supera lo stato dell'arte AudioLDM nella maggior parte delle metriche e rimane comparabile sulle altre sul set di test AudioCaps, nonostante l'addestramento dell'LDM su un set di dati molto più piccolo.

huggingface space

Add new comment

Your name

The content of this field is kept private and will not be shown publicly.

Homepage

Comment

About text formats

Text format

Full HTML 2

Web page addresses and email addresses turn into links automatically.
Lines and paragraphs break automatically.

Filtered HTML

Web page addresses and email addresses turn into links automatically.
Allowed HTML tags: <a href hreflang> <em> <strong> <cite> <blockquote cite> <code> <ul type> <ol start type='1 A I'> <li> <dl> <dt> <dd> <h2 id='jump-*'> <h3 id> <h4 id> <h5 id> <h6 id>
Lines and paragraphs break automatically.

CAPTCHA

This question is for testing whether or not you are a human visitor and to prevent automated spam submissions.