Hacker News new | comments | show | ask | jobs | submit login
Predicting Expressive Speaking Style from Text in End-To-End Speech Synthesis (google.github.io)
64 points by daisystanton 11 days ago | hide | past | web | favorite | 9 comments





Wow, these audio samples are incredible. I'm surprised to hear the model actually outputting natural-sounding breathing between and inside sentences. Most TTS systems explicitly remove things like that, but the addition of breathing makes it sound so much more natural.

The style tokens result in pretty incredible and realistic audio.


If you want to see some more research samples check out this link: https://google.github.io/tacotron/ What's especially impressive is how fast they're moving along with new ideas (see the dates) Bear in mind that the WaveNet outputs are likely to be pretty slow to generate (but they do yield remarkable quality!)

Good heavens. These give me the shivers. In some of these samples you can hear breathing, emphasis, and even what sounds like genuine emotion.

This seems like it could be great for automatically generating audio books. Personally I would one day like to have a program that can read arbitrary text to me in a more or less human way, that would allow me to read papers for work while driving.

You could except you would get sued. The Kindle 2 was announced with a feature that would read the book to you and Amazon landed in court. https://sunsteinlaw.com/read-it-aloud-and-weep-controversy-s...

> You could except you would get sued. The Kindle 2 was announced with a feature that would read the book to you and Amazon landed in court.

But yet here we are, 9 years later, and the Kindle apps on Android, Windows and iOS support screen reader access to books. Those screen readers can use an array of voices, undoubtedly including the speech engines used in the original TTS feature written about here.


I thought the exact same thing. Except that I know if it pronounces certain names or words wrong over and over it would make me crazy and I would have to stop.

> Except that I know if it pronounces certain names or words wrong over and over it would make me crazy and I would have to stop.

Most decent TTS or assistive technology systems have a pronunciation dictionary. This is a very real problem for people who use screen readers on a daily basis but luckily it's a (mostly) solved one.


Isn't it something Amazon Polly does a good job ?



Guidelines | FAQ | Support | API | Security | Lists | Bookmarklet | Legal | Apply to YC | Contact

Search: