After remarkable success with the SYN-6988 TTS module, then somewhat less success with the SYN-6658 and other modules, I didn’t hold out much hope for the YuTone SYN-6288, which – while boasting a load of background tunes that could play over speech – can only convert Chinese text to speech
The wiring is similar to the SYN-6988: a serial UART connection at 9600 baud, plus a Busy (BY) line to signal when the chip is busy. The serial protocol is slightly more complicated, as the SYN-6288 requires a checksum byte at the end.
As I’m not interested in the text-to-speech output itself, here’s a MicroPython script to play all of the sounds:
# very crude MicroPython demo of SYN6288 TTS chip
# scruss, 2023-07
import machine
import time
### setup device
ser = machine.UART(
0, baudrate=9600, bits=8, parity=None, stop=1
) # tx=Pin(0), rx=Pin(1)
busyPin = machine.Pin(2, machine.Pin.IN, machine.Pin.PULL_UP)
def sendspeak(u2, data, busy):
# modified from https://github.com/TPYBoard/TPYBoard_lib/
# u2 = UART(uart, baud)
eec = 0
buf = [0xFD, 0x00, 0, 0x01, 0x01]
# buf = [0xFD, 0x00, 0, 0x01, 0x79] # plays with bg music 15
buf[2] = len(data) + 3
buf += list(bytearray(data, "utf-8"))
for i in range(len(buf)):
eec ^= int(buf[i])
buf.append(eec)
u2.write(bytearray(buf))
while busy.value() != True:
# wait for busy line to go high
time.sleep_ms(5)
while busy.value() == True:
# wait for it to finish
time.sleep_ms(5)
for s in "abcdefghijklmnopqrstuvwxy":
playstr = "[v10][x1]sound" + s
print(playstr)
sendspeak(ser, playstr, busyPin)
time.sleep(2)
for s in "abcdefgh":
playstr = "[v10][x1]msg" + s
print(playstr)
sendspeak(ser, playstr, busyPin)
time.sleep(2)
for s in "abcdefghijklmno":
playstr = "[v10][x1]ring" + s
print(playstr)
sendspeak(ser, playstr, busyPin)
time.sleep(2)
Each sound starts and stops with a very loud click, and the sound quality is not great. I couldn’t get a good recording of the sounds (some of which of which are over a minute long) as the only way I could get reliable audio output was through tiny headphones. Any recording came out hopelessly distorted:
I’m not too disappointed that this didn’t work well. I now know that the SYN-6988 is the good one to get. It also looks like I may never get to try the XFS5152CE speech synthesizer board: AliExpress has cancelled my shipment for no reason. It’s supposed to have some English TTS function, even if quite limited.
Here’s the auto-translated SYN-6288 manual, if you do end up finding a use for the thing
Yup, it’s another “let’s wire up a SYN6988 board” thing, this time for MMBasic running on the Armmite STM32F407 Module (aka ‘Armmite F4’). This board is also known as the BLACK_F407VE, which also makes a nice little MicroPython platform.
Uh, let’s not dwell too much on how the SYN6988 seems to parse 19:51 as “91 minutes to 20” …
Wiring
SYN6988
Armmite F4
RX
PA09 (COM1 TX)
TX
PA10 (COM1 RX)
RDY
PA08
your choice of 3.3 V and GND connections, of course
Yes, I know it says it’s an XFS5152, but I got a SYN6988 and it seems to be about as reliable a source as one can find. The board is marked YS-V6E-V1.03, and even mentions SYN6988 on the rear silkscreen:
Code
REM SYN6988 speech demo - MMBasic / Armmite F4
REM scruss, 2023-07
OPEN "COM1:9600" AS #5
REM READY line on PA8
SETPIN PA8, DIN, PULLUP
REM you can ignore font/text commands
CLS
FONT 1
TEXT 0,15,"[v1]Hello - this is a speech demo."
say("[v1]Hello - this is a speech demo.")
TEXT 0,30,"[x1]soundy[d]"
say("[x1]soundy[d]"): REM chimes
TEXT 0,45,"The time is "+LEFT$(TIME$,5)+"."
say("The time is "+LEFT$(TIME$,5)+".")
END
SUB say(a$)
LOCAL dl%,maxlof%
REM data length is text length + 2 (for the 1 and 0 bytes)
dl%=2+LEN(a$)
maxlof%=LOF(#5)
REM SYN6988 simple data packet
REM byte 1 : &HFD
REM byte 2 : data length (high byte)
REM byte 3 : data length (low byte)
REM byte 4 : &H01
REM byte 5 : &H00
REM bytes 6-: ASCII string data
PRINT #5, CHR$(&hFD)+CHR$(dl%\256)+CHR$(dl% MOD 256)+CHR$(1)+CHR$(0)+a$;
DO WHILE LOF(#5)<maxlof%
REM pause while sending text
PAUSE 5
LOOP
DO WHILE PIN(PA8)<>1
REM wait until RDY is high
PAUSE 5
LOOP
DO WHILE PIN(PA8)<>0
REM wait until SYN6988 signals READY
PAUSE 5
LOOP
END SUB
The other week’s success with the SYN6988 TTS chip was not repeated with three other modules I ordered, alas. Two of them I couldn’t get a peep out of, the other didn’t support English text-to-speech.
SYN6658
This one looks remarkably like the SYN6988:
Yes, I added the 6658 label so I could tell the boards apart
Apart from the main chip, the only difference appears to be that the board’s silkscreen says YS-V6 V1.15 where the SYN6988’s said YS-V6E V1.02.
To be fair to YuTone (the manufacturer), they claim this only supports Chinese as an input language. If you feed it English, at best you’ll get it spelling out the letters. It does have quite a few amusing sounds, though, so at least you can make it beep and chime. My MicroPython library for the VoiceTX SYN6988 text to speech module can drive it as far as I understand it.
I’ve still got a SYN6288 to look at, plus a XFS5152CE TTSthat’s in the mail that may or may not be in the mail. The SYN6988 is the best of the bunch so far.
I have a bunch of other boards on order to see if the other chips (SYN6288, SYN6658, XF5152) work in the same way. I really wonder which I’ll end up receiving!
Update (2023-07-09): Got the SYN6658. It does not support English TTS and thus is not recommended. It does have some cool sounds, though.
Embedded Text Command Sound Table
The github repo references Embedded text commands, but all of the sound references were too difficult to paste into a table there. So here are all of the ones that the SYN-6988 knows about:
Name is the string you use to play the sound, eg: [x1]sound101
Alias is an alternative name by which you can call some of the sounds. This is for better compatibility with the SYN6288 apparently. So [x1]sound101 is exactly the same as specifying [x1]sounda
Type is the sound description from the manual. Many of these are blank
Link is a playable link for a recording of the sound.
I’ve had one of these cheap(ish – $15) sound modules from AliExpress for a while. I hadn’t managed to get much out of it before, but I poked about at it a little more and found I was trying to drive the wrong chip. Aha! Makes all the difference.
Sensitive listener alert! There is a static click midway through. I edited out the clipped part, but it’s still a little jarring. It would always do this at the same point in playback, for some reason.
The only Pythonish code I could find for these chips was meant for the older SYN6288 and MicroPython (syn6288.py). I have no idea what I’m doing, but with some trivial modification, it makes sound.
I used the simple serial UART connection: RX -> TX, TX -> RX, 3V3 to 3V3 and GND to GND. My board is hard-coded to run at 9600 baud. I used the USB serial adapter that came with the board.
Here’s the code that read that text:
#!/usr/bin/env python3
# -*- coding: utf-8 -*-
import serial
import time
# NB via MicroPython and old too! Also for a SYN6288, which I don't have
# nabbed from https://github.com/TPYBoard/TPYBoard_lib/
def sendspeak(port, data):
eec = 0
buf = [0xFD, 0x00, 0, 0x01, 0x01]
buf[2] = len(data) + 3
buf += list(bytearray(data, encoding='utf-8'))
for i in range(len(buf)):
eec ^= int(buf[i])
buf.append(eec)
port.write(bytearray(buf))
ser = serial.Serial("/dev/ttyUSB1", 9600)
sendspeak(ser, "[t5]I like to think [p100](it [t7]has[t5] to be!)[p100] of a cybernetic ecology [p100]where we are free of our labors and joined back to nature, [p100]returned to our mammal brothers and sisters, [p100]and all watched over by machines of loving grace")
time.sleep(8)
ser.close()
This code is bad. All I did was prod stuff until it stopped not working. Since all I have to work from includes a datasheet in Chinese (from here: ??????-SYN6988???TTS????) there’s lots of stuff I could do better. I used the tone and pause tags to give the reading a little more life, but it’s still a bit flat. For $15, though, a board that makes a fair stab at reading English is not bad at all. We can’t all afford vintage DECtalk hardware.
The one thing I didn’t do is used the SYN6988’s Busy/Ready line to see if it was still busy reading. That means I could send it text as soon as it was ready, rather than pausing for 8 seconds after the speech. This refinement will come later, most likely when I port this to MicroPython.
[tɒk bɒks]: case[tɒk bɒks]: inside. The observant amongst you will notice that the speech board is 1/10″ further in than it should be for ideal alignment with the USB serial adapter.
Note (2025): Parallax Inc have stopped selling the board this project is based on. I’ve added local copies of the reference documents at the end of this post. If you want basic embedded English speech, there are other options available.
Back in the 1980s, the now-defunct Digital Equipment Corporation (“DEC”) sold a hardware speech synthesizer based on Dennis Klatt’s research at MIT. These DECTalk boxes were compact and robust, and — despite not having the greatest speech quality — gave valuable speech, telephone and reading accessibility to many people. Stephen Hawking’s distinctive voice is from a pre-DEC version of the MIT hardware.
DEC is long gone, and the licensing of DECTalk has wandered off into mostly software (ahem …). Much to the annoyance of those in earshot, I’ve always enjoyed dabbling in speech synthesis. DECTalk hardware remains expensive, partly because of demand from electronic music producers (its vocoder-like burr is on countless tracks), but also because there are still many people who rely on it for daily life. I couldn’t justify buying a real DECTalk, but I found this: the Parallax Emic 2 Text-to-Speech Module. For about $80, this stamp-sized board brings a hardware DECTalk implementation to embedded projects.
The Emic 2 is really marketed to microcontroller hobbyists: Make Your Arduino Speak! sorta thing. But I wanted to make a DECTalk-ish hardware box, with serial input, a speaker, and switchable headphone/line jack. [tɒk bɒks] (a fair IPA approximation of how I pronounce “Talk Box”) is the result.
OSEPP FTDI USB-Serial Breakout — there are many USB-Serial boards that would do this, but two points in this one’s favour are: i) it has header pins for breadboard use, and ii) I had a spare one.
Small 8Ω speaker element — the one I used is most likely a headphone element, bought from Active Surplus (RIP). This should be as small as you can get away with (and still hear) as the USB-Serial connection isn’t designed to supply audio power.
You’ll need some kind of serial terminal connection. In a pinch, you can use the serial monitor that is in the Arduino development environment. Either way, identify your serial port (/dev/ttyUSBN, COMN:, or /dev/tty-usbserialNNNN) and find a way to send 9600 baud, 8N1 characters to it. Hit Return, and you should be greeted by the Emic 2’s : prompt (or a ?, followed by :). Whether you get the prompt or not depends on whether local echo is set or not. Either way, try sending this line:
SAll watched over by machines of loving grace.
You should hear a voice say the title of Richard Brautigan’s lovely poem All Watched Over by Machines of Loving Grace (caution: video link contains nekkid hippies). You should get the : prompt back once the the speech has stopped. And that’s all there is to it: send an S, followed by up to 1023 bytes of (basically ASCII) text, followed by a newline, and it will be spoken. There’s more detail, of course, in the Emic 2 documentation and the Emic 2 Epson/Fonix DECTalk 501 User’s Guide for changing voices, etc. Yes, you can make it sing. No, you probably shouldn’t, though.
Notes
The Emic 2 has no serial flow control, so you have to wait until the module stops speaking (or you send it the stop command) before you can send more. The easiest way is to poll the serial port and see if there’s the : prompt waiting. Until you see the prompt, any text you send it may be lost.
The Emic 2 is an embedded device: Unicode is a bit of a stretch. It’s supposed to accept ISO Latin-1 8-bit characters (handy for Spanish mode), though.
Starting every speech line with S may make this board incompatible with assistive technology software such as the JAWS screen reader. I don’t think that this was the goal for Emic 2’s designers (Grand Idea Studio), however.
The output from the audio jack has a fair bit of noise on it, and you need to set the volume quite low to avoid hiss and hum. Your experience may be different, as I may have accidentally made a ground loop. There is an audible click at the start and end of the text, too.
The Emic 2 uses DECTalk v5 commands and phonemes. Many DECTalk resources on the web (like these songs) use v4 or older, which are subtly incompatible. I haven’t found a reliable conversion protocol yet.
The board also starts in Epson-style command mode, which uses slightly different commands from standard DECTalk.
To end, here’s the Emic 2’s “Dennis” voice reading all of Brautigan’s All Watched Over By Machines of Loving Grace:
Someone asked how the automatic podcast works. It’s a bit complex, and they probably will be sorry they asked.
I have all my music saved as MP3s on a server running Firefly Media Server. It stores all its information about tracks in a SQLite database, so I can very easily grab a random selection of tracks.
Since I know the name of the track and the artist from the Firefly database, I have a selection of script lines that I can feed to flite, a very simple speech synthesizer. Each of these spoken lines is stored as as wav file, and then each candidate MP3 is converted to wav, and the whole mess is joined together using SoX. SoX also created the nifty (well, I think so) intro and outro sweeps.
The huge wav file of the whole show is converted to MP3 using LAME and uploaded to my webhost with scp. All of this process is done by one Perl script – it also creates the web page, the RSS feed, and even logs the tracks on Last.fm.
I am trying to speak to my computer I’m not sure it understands the two well actually it’s doing not badly I’m speaking in a rather disjointed manner I have to go to New York state tomorrow I’m not quite sure why I have to look at twin turbo inns no I don’t have to look at twin turbo ends I have to look at wind turbine what’s
While it’s remarkably accurate I’m going to be really mean and bald face and sure I don’t know “bold face is that I’m going to be located anyway, to your things have calmed pear shaped nine
This is the voice of the Mr. Owns and I do I think I’m going to live. I suppose this is better than I expected especially since Mike accent is unusual most people in Canada do not understand a and so finally I have a computer that understands the this is a bit worrying isn’t it?
Well that wraps it up for dictation. You have a pleasant evening. Good night!
– what Microsoft speech recognition thinks I said. The random “what” and “nine” is me starting to laugh, and “bald face” is “blog this”.