Audio can be a bit dismal on a Raspberry Pi. Once you get a configuration that works, sometimes you’re not sure how you got there and you’ll do anything to keep that arcane setup going. It’s better than it was.
Speech synthesis or TTS adds an extra layer for potential failure. One of the popular Linux TTS systems, eSpeak, hasn’t seen much development in almost a decade and seems to only work through workarounds and hand-waving.
Thankfully, there’s a fork of eSpeak that is maintained: espeak-ng. Better yet, it’s packaged with Raspberry Pi OS and can be installed quite easily:
sudo apt install espeak-ng espeak-ng-data libespeak-ng-dev
In my simple tests, it output everything I expected of it.
eSpeak had a Python module that kinda worked, but espeak-ng’s is much more ambitious, and (mostly) does what it sets out to do. You can install it like this:
sudo pip3 install py-espeak-ng
py-espeak-ng has some documentation, but it’s still got some trial and error in getting it to work. The biggest issue that held me up was that the module needs to be initialized with a voice that espeak-ng already knows about. If you don’t specify a voice, or specify one that the system doesn’t know about, you won’t get any errors — but you won’t get any output, either.
Here’s a small Python example that you’ll probably want to try with no-one else within earshot. It repeats the same English phrase (a favourite of elocution teachers) in every English regional language that espeak-ng knows about. In addition, since I’m a dictionary nerd, it outputs phonetics too.
# -*- coding: utf-8 -*-
# an espeakng elocution lesson from scruss, 2020-07
# I suffered this at school, now you get to as well!
# You will need to:
# sudo apt install espeak-ng espeak-ng-data libespeak-ng-dev
# sudo pip3 install py-espeak-ng
from espeakng import ESpeakNG
from time import sleep
# you have to initialize with a voice that exists
# `espeak-ng --voices=en` will list English ones
esng = ESpeakNG(voice='en-gb')
esng.pitch = 32
esng.speed = 150
phrase = "Father's car is a Jaguar and pa drives rather fast. "\
"Castles, farms and draughty barns, all go charging past."
for voice in esng.voices:
print('Using voice:', voice['language'],
'for', voice['voice_name'], '-')
esng.voice = voice['language']
ipa = esng.g2p(phrase, ipa=2)
print(voice['language'], 'phonetics:', ipa)
Be thankful you can’t hear the output. The IPA output, however, is a thing of beauty:
Father's car is a Jaguar and pa drives rather fast. Castles, farms and draughty barns, all go charging past.
Using voice: en-029 for English_(Caribbean) -
en-029 phonetics: fˈɑːdaz kˈɑ͡əɹ ɪz a d͡ʒˈaɡwɑ͡ə and pˈɑː dɹˈa͡ɪvz ɹˈɑːda fˈa͡astkˈa͡asɛlzfˈɑ͡əmz and dɹˈa͡afti bˈɑ͡ənzˈɔːl ɡˌo͡ʊ t͡ʃˈɑ͡əd͡ʒɪn pˈa͡ast
Using voice: en-gb for English_(Great_Britain) -
en-gb phonetics: fˈɑːðəz kˈɑːɹ ɪz ɐ d͡ʒˈaɡwɑː and pˈɑː dɹˈa͡ɪvz ɹˈɑːðə fˈastkˈasə͡lzfˈɑːmz and dɹˈafti bˈɑːnzˈɔːl ɡˌə͡ʊ t͡ʃˈɑːd͡ʒɪŋ pˈast
Using voice: en-gb-scotland for English_(Scotland) -
en-gb-scotland phonetics: fˈa:ðɜz kˈaːr ɪz ɐ d͡ʒˈaɡwaːr and pˈa: drˈa͡ɪvz rˈa:ðɜ fˈa:stkˈa:sə͡lzfˈaːrmz and drˈa:fte bˈaːrnzˈɔːl ɡˌoː t͡ʃˈaːrd͡ʒɪŋ pˈa:st