SYN6288 TTS board from AliExpress

After remarkable success with the SYN-6988 TTS module, then somewhat less success with the SYN-6658 and other modules, I didn’t hold out much hope for the YuTone SYN-6288, which – while boasting a load of background tunes that could play over speech – can only convert Chinese text to speech

small blue circuit board with 6 MHz crystal oscillator, main chip, input headers at bottom and headphone jack/speaker output at top
as bought from quason official store: SYN6288 speech synthesis module

The wiring is similar to the SYN-6988: a serial UART connection at 9600 baud, plus a Busy (BY) line to signal when the chip is busy. The serial protocol is slightly more complicated, as the SYN-6288 requires a checksum byte at the end.

As I’m not interested in the text-to-speech output itself, here’s a MicroPython script to play all of the sounds:

# very crude MicroPython demo of SYN6288 TTS chip
# scruss, 2023-07
import machine
import time

### setup device
ser = machine.UART(
    0, baudrate=9600, bits=8, parity=None, stop=1
)  # tx=Pin(0), rx=Pin(1)

busyPin = machine.Pin(2, machine.Pin.IN, machine.Pin.PULL_UP)


def sendspeak(u2, data, busy):
    # modified from https://github.com/TPYBoard/TPYBoard_lib/
    # u2 = UART(uart, baud)
    eec = 0
    buf = [0xFD, 0x00, 0, 0x01, 0x01]
    # buf = [0xFD, 0x00, 0, 0x01, 0x79]  # plays with bg music 15
    buf[2] = len(data) + 3
    buf += list(bytearray(data, "utf-8"))
    for i in range(len(buf)):
        eec ^= int(buf[i])
    buf.append(eec)
    u2.write(bytearray(buf))
    while busy.value() != True:
        # wait for busy line to go high
        time.sleep_ms(5)
    while busy.value() == True:
        # wait for it to finish
        time.sleep_ms(5)


for s in "abcdefghijklmnopqrstuvwxy":
    playstr = "[v10][x1]sound" + s
    print(playstr)
    sendspeak(ser, playstr, busyPin)
    time.sleep(2)

for s in "abcdefgh":
    playstr = "[v10][x1]msg" + s
    print(playstr)
    sendspeak(ser, playstr, busyPin)
    time.sleep(2)

for s in "abcdefghijklmno":
    playstr = "[v10][x1]ring" + s
    print(playstr)
    sendspeak(ser, playstr, busyPin)
    time.sleep(2)

Each sound starts and stops with a very loud click, and the sound quality is not great. I couldn’t get a good recording of the sounds (some of which of which are over a minute long) as the only way I could get reliable audio output was through tiny headphones. Any recording came out hopelessly distorted:

I’m not too disappointed that this didn’t work well. I now know that the SYN-6988 is the good one to get. It also looks like I may never get to try the XFS5152CE speech synthesizer board: AliExpress has cancelled my shipment for no reason. It’s supposed to have some English TTS function, even if quite limited.

Here’s the auto-translated SYN-6288 manual, if you do end up finding a use for the thing

Adding speech to MMBasic

Yup, it’s another “let’s wire up a SYN6988 board” thing, this time for MMBasic running on the Armmite STM32F407 Module (aka ‘Armmite F4’). This board is also known as the BLACK_F407VE, which also makes a nice little MicroPython platform.

Uh, let’s not dwell too much on how the SYN6988 seems to parse 19:51 as “91 minutes to 20” …

Wiring

SYN6988Armmite F4
RXPA09 (COM1 TX)
TXPA10 (COM1 RX)
RDYPA08
your choice of 3.3 V and GND connections, of course

Where to buy: AliExpress — KAIKAI Electronics Wholesale Store : High-end Speech Synthesis Module Chinese/English Speech Synthesis XFS5152 Real Pronunciation TTS

Yes, I know it says it’s an XFS5152, but I got a SYN6988 and it seems to be about as reliable a source as one can find. The board is marked YS-V6E-V1.03, and even mentions SYN6988 on the rear silkscreen:

Code

REM                 SYN6988 speech demo - MMBasic / Armmite F4
REM                 scruss, 2023-07

OPEN "COM1:9600" AS #5
REM                 READY line on PA8
SETPIN PA8, DIN, PULLUP

REM    you can ignore font/text commands
CLS
FONT 1
TEXT 0,15,"[v1]Hello - this is a speech demo."
say("[v1]Hello - this is a speech demo.")
TEXT 0,30,"[x1]soundy[d]"
say("[x1]soundy[d]"): REM    chimes
TEXT 0,45,"The time is "+LEFT$(TIME$,5)+"."
say("The time is "+LEFT$(TIME$,5)+".")
END

SUB say(a$)
  LOCAL dl%,maxlof%
  REM     data length is text length + 2 (for the 1 and 0 bytes)
  dl%=2+LEN(a$)
  maxlof%=LOF(#5)
  REM     SYN6988 simple data packet
  REM      byte  1 : &HFD
  REM      byte  2 : data length (high byte)
  REM      byte  3 : data length (low byte)
  REM      byte  4 : &H01
  REM      byte  5 : &H00
  REM      bytes 6-: ASCII string data
  PRINT #5, CHR$(&hFD)+CHR$(dl%\256)+CHR$(dl% MOD 256)+CHR$(1)+CHR$(0)+a$;
  DO WHILE LOF(#5)<maxlof%
  REM       pause while sending text
    PAUSE 5
  LOOP
  DO WHILE PIN(PA8)<>1
    REM       wait until RDY is high
    PAUSE 5
  LOOP
  DO WHILE PIN(PA8)<>0
    REM       wait until SYN6988 signals READY
    PAUSE 5
  LOOP
END SUB

For more commands, please see Embedded text commands

Heres the auto-translated manual for the SYN6988:

Markedly less success with three TTS boards from AliExpress

The other week’s success with the SYN6988 TTS chip was not repeated with three other modules I ordered, alas. Two of them I couldn’t get a peep out of, the other didn’t support English text-to-speech.

SYN6658

This one looks remarkably like the SYN6988:

Yes, I added the 6658 label so I could tell the boards apart

Apart from the main chip, the only difference appears to be that the board’s silkscreen says YS-V6 V1.15 where the SYN6988’s said YS-V6E V1.02.

To be fair to YuTone (the manufacturer), they claim this only supports Chinese as an input language. If you feed it English, at best you’ll get it spelling out the letters. It does have quite a few amusing sounds, though, so at least you can make it beep and chime. My MicroPython library for the VoiceTX SYN6988 text to speech module can drive it as far as I understand it.

Here are the sounds:

NameTypeLink
msgaPolyphonic Chord Beep
msgbPolyphonic Chord Beep
msgcPolyphonic Chord Beep
msgdPolyphonic Chord Beep
msgePolyphonic Chord Beep
msgfPolyphonic Chord Beep
msggPolyphonic Chord Beep
msghPolyphonic Chord Beep
msgiPolyphonic Chord Beep
msgjPolyphonic Chord Beep
msgkPolyphonic Chord Beep
msglPolyphonic Chord Beep
msgmPolyphonic Chord Beep
msgnPolyphonic Chord Beep
sound101Prompt Tone
sound102Prompt Tone
sound103Prompt Tone
sound104Prompt Tone
sound105Prompt Tone
sound106Prompt Tone
sound107Prompt Tone
sound108Prompt Tone
sound109Prompt Tone
sound110Prompt Tone
sound111Prompt Tone
sound112Prompt Tone
sound113Prompt Tone
sound114Prompt Tone
sound115Prompt Tone
sound116Prompt Tone
sound117Prompt Tone
sound118Prompt Tone
sound119Prompt Tone
sound120Prompt Tone
sound121Prompt Tone
sound122Prompt Tone
sound123Prompt Tone
sound124Prompt Tone
sound201phone ringtone
sound202phone ringtone
sound203phone ringtone
sound204phone ringing
sound205phone ringtone
sound206door bell
sound207door bell
sound208doorbell
sound209door bell
sound210alarm
sound211alarm
sound212alarm
sound213alarm
sound214wind chimes
sound215wind chimes
sound216wind chimes
sound217wind chimes
sound218wind chimes
sound219wind chimes
sound301alarm
sound302alarm
sound303alarm
sound304alarm
sound305alarm
sound306alarm
sound307alarm
sound308alarm
sound309alarm
sound310alarm
sound311alarm
sound312alarm
sound313alarm
sound314alarm
sound315alert-emergency
sound316alert-emergency
sound317alert-emergency
sound318alert-emergency
sound319alert-emergency
sound401credit card successful
sound402credit card successful
sound403credit card successful
sound404credit card successful
sound405credit card successful
sound406credit card successful
sound407credit card successful
sound408successfully swiped the card
sound501cuckoo
sound502error
sound503applause
sound504laser
sound505laser
sound506landing
sound507gunshot
sound601alarm sound / air raid siren (long)
sound602prelude to weather forecast (long)
SYN-6658 Sound Reference

Where I bought it: Electronic Component Module Store : Chinese-to-real-life Speech Synthesis Playing Module TTS Announcer SYN6658 of Bank Bus Broadcasting.

Auto-translated manual:

Unknown “TTS Text-to-speech Broadcast Synthesis Module”

All I could get from this one was a power-on chime. The main chip has had its markings ground off, so I’ve no idea what it is.

Red and black wires seem to be standard 5 V power. Yellow seems to be serial in, white is not connected.

Where I bought it: Electronic Component Module Store / Chinese TTS Text-to-speech Broadcast Synthesis Module MCU Serial Port Robot Plays Prompt Advertising Board

HLK-V40 Speech Synthesis Module

In theory, this little board has a lot going for it: wifi, bluetooth, commands sent by AT commands. In practice, I couldn’t get it to do a thing.

Where I bought it: HI-LINK Component Store / HLK-V40 Speech Synthesis Module TTS Pure Text to Speech Playback Hailinco AI intelligent Speech Synthesis Broadcast

I’ve still got a SYN6288 to look at, plus a XFS5152CE TTS that’s in the mail that may or may not be in the mail. The SYN6988 is the best of the bunch so far.

SYN-6988 Speech with MicroPython

Full repo, with module and instructions, here: scruss/micropython-SYN6988: MicroPython library for the VoiceTX SYN6988 text to speech module

(and for those that CircuitPython is the sort of thing they like, there’s this: scruss/circuitpython-SYN6988: CircuitPython library for the YuTone VoiceTX SYN6988 text to speech module.)

I have a bunch of other boards on order to see if the other chips (SYN6288, SYN6658, XF5152) work in the same way. I really wonder which I’ll end up receiving!

Update (2023-07-09): Got the SYN6658. It does not support English TTS and thus is not recommended. It does have some cool sounds, though.

Embedded Text Command Sound Table

The github repo references Embedded text commands, but all of the sound references was too difficult to paste into a table there. So here are all of the ones that the SYN-6988 knows about:

  • Name is the string you use to play the sound, eg: [x1]sound101
  • Alias is an alternative name by which you can call some of the sounds. This is for better compatibility with the SYN6288 apparently. So [x1]sound101 is exactly the same as specifying [x1]sounda
  • Type is the sound description from the manual. Many of these are blank
  • Link is a playable link for a recording of the sound.
NameAliasTypeLink
sound101sounda
sound102soundb
sound103soundc
sound104soundd
sound105sounde
sound106soundf
sound107soundg
sound108soundh
sound109soundi
sound110soundj
sound111soundk
sound112soundl
sound113soundm
sound114soundn
sound115soundo
sound116soundp
sound117soundq
sound118soundr
sound119soundt
sound120soundu
sound121soundv
sound122soundw
sound123soundx
sound124soundy
sound201phone ringtone
sound202phone ringtone
sound203phone ringtone
sound204phone rings
sound205phone ringtone
sound206doorbell
sound207doorbell
sound208doorbell
sound209doorbell
sound301alarm
sound302alarm
sound303alarm
sound304alarm
sound305alarm
sound306alarm
sound307alarm
sound308alarm
sound309alarm
sound310alarm
sound311alarm
sound312alarm
sound313alarm
sound314alarm
sound315alert/emergency
sound316alert/emergency
sound317alert/emergency
sound318alert/emergency
sound401credit card successful
sound402credit card successful
sound403credit card successful
sound404credit card successful
sound405credit card successful
sound406credit card successful
sound407credit card successful
sound408successfully swiped the card
SYN-6988 Sound Reference

Speech from Python with the SYN6988 module

I’ve had one of these cheap(ish – $15) sound modules from AliExpress for a while. I hadn’t managed to get much out of it before, but I poked about at it a little more and found I was trying to drive the wrong chip. Aha! Makes all the difference.

So here’s a short narration from my favourite Richard Brautigan poem, read by the SYN6988.

Sensitive listener alert! There is a static click midway through. I edited out the clipped part, but it’s still a little jarring. It would always do this at the same point in playback, for some reason.

The only Pythonish code I could find for these chips was meant for the older SYN6288 and MicroPython (syn6288.py). I have no idea what I’m doing, but with some trivial modification, it makes sound.

I used the simple serial UART connection: RX -> TX, TX -> RX, 3V3 to 3V3 and GND to GND. My board is hard-coded to run at 9600 baud. I used the USB serial adapter that came with the board.

Here’s the code that read that text:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import serial
import time

# NB via MicroPython and old too! Also for a SYN6288, which I don't have
# nabbed from https://github.com/TPYBoard/TPYBoard_lib/

def sendspeak(port, data):
    eec = 0
    buf = [0xFD, 0x00, 0, 0x01, 0x01]
    buf[2] = len(data) + 3
    buf += list(bytearray(data, encoding='utf-8'))
    for i in range(len(buf)):
        eec ^= int(buf[i])
    buf.append(eec)
    port.write(bytearray(buf))

ser = serial.Serial("/dev/ttyUSB1", 9600)
sendspeak(ser, "[t5]I like to think [p100](it [t7]has[t5] to be!)[p100] of a cybernetic ecology [p100]where we are free of our labors and joined back to nature, [p100]returned to our mammal brothers and sisters, [p100]and all watched over by machines of loving grace")
time.sleep(8)
ser.close()

This code is bad. All I did was prod stuff until it stopped not working. Since all I have to work from includes a datasheet in Chinese (from here: ??????-SYN6988???TTS????) there’s lots of stuff I could do better. I used the tone and pause tags to give the reading a little more life, but it’s still a bit flat. For $15, though, a board that makes a fair stab at reading English is not bad at all. We can’t all afford vintage DECtalk hardware.

The one thing I didn’t do is used the SYN6988’s Busy/Ready line to see if it was still busy reading. That means I could send it text as soon as it was ready, rather than pausing for 8 seconds after the speech. This refinement will come later, most likely when I port this to MicroPython.

More resources:

[tɒk bɒks] — a tiny hardware speech synthesizer/TTS

[tÉ’k bÉ’ks]: case
[tÉ’k bÉ’ks]: case
[tÉ’k bÉ’ks]: inside
[tÉ’k bÉ’ks]: inside. The observant amongst you will notice that the speech board is 1/10″ further in than it should be for ideal alignment with the USB serial adapter.
Back in the 1980s, the now-defunct Digital Equipment Corporation (“DEC”) sold a hardware speech synthesizer based on Dennis Klatt’s research at MIT.  These DECTalk boxes were compact and robust, and — despite not having the greatest speech quality — gave valuable speech, telephone and reading accessibility to many people. Stephen Hawking’s distinctive voice is from a pre-DEC version of the MIT hardware.

DEC is long gone, and the licensing of DECTalk has wandered off into mostly software. Much to the annoyance of those in earshot, I’ve always enjoyed dabbling in speech synthesis. DECTalk hardware remains expensive, partly because of demand from electronic music producers (its vocoder-like burr is on countless tracks), but also because there are still many people who rely on it for daily life. I couldn’t justify buying a real DECTalk, but I found this: the Parallax Emic 2 Text-to-Speech Module.  For about $80, this stamp-sized board brings a hardware DECTalk implementation to embedded projects.

The Emic 2 is really marketed to microcontroller hobbyists: Make Your Arduino Speak! sorta thing. But I wanted to make a DECTalk-ish hardware box, with serial input, a speaker, and switchable headphone/line jack.  [tɒk bɒks] (a fair approximation of how I pronounce “Talk Box”) is the result.

Hardware

  • Parallax Emic 2 Text-to-Speech Module
  • OSEPP FTDI USB-Serial Breakout — there are many USB-Serial boards that would do this, but two points in this one’s favour are: i) it has header pins for breadboard use, and ii) I had a spare one.
  • Small 8Ω speaker element — the one I used is most likely a headphone element, bought from Active Surplus (RIP). This should be as small as you can get away with (and still hear) as the USB-Serial connection isn’t designed to supply audio power.
  • Header pins and sockets
  • Toggle switch
  • Small project box with perfboard
  • Jumper wires and solder

Connections

Emic 2             Serial
======             ======
 GND                GND
 5V                 Vcc
 SOUT               RXD
 SIN                TXD

Emic 2             Speaker
======             =======
 SP-                -
 SP+  (via switch)  +

Using it

You’ll need some kind of serial terminal connection. In a pinch, you can use the serial monitor that is in the Arduino development environment. Either way, identify your serial port (/dev/ttyUSBN, COMN:, or /dev/tty-usbserialNNNN) and find a way to send 9600 baud, 8N1 characters to it. Hit Return, and you should be greeted by the Emic 2’s : prompt (or a ?, followed by :). Whether you get the prompt or not depends on whether local echo is set or not. Either way, try sending this line:

SAll watched over by machines of loving grace.

You should hear a voice say the title of Richard Brautigan’s lovely poem All Watched Over by Machines of Loving Grace (caution: video link contains nekkid hippies). You should get the : prompt back once the the speech has stopped. And that’s all there is to it: send an S, followed by up to 1023 bytes of (basically ASCII) text, followed by a newline, and it will be spoken. There’s more detail, of course, in the Emic 2 documentation and the Emic 2 Epson/Fonix DECTalk 501 User’s Guide for changing voices, etc. Yes, you can make it sing. No, you probably shouldn’t, though.

Notes

  1. The Emic 2 has no serial flow control, so you have to wait until the module stops speaking (or you send it the stop command) before you can send more. The easiest way is to poll the serial port and see if there’s the : prompt waiting. Until you see the prompt, any text you send it may be lost.
  2. The Emic 2 is an embedded device; Unicode is a bit of a stretch. It’s supposed to accept ISO Latin-1 8-bit characters (handy for Spanish mode), though.
  3. Starting every speech line with S may make this board incompatible with assistive technology software such as the JAWS screen reader. I don’t think that this was the goal for Emic 2’s designers (Grand Idea Studio), however.
  4. The output from the audio jack has a fair bit of noise on it, and you need to set the volume quite low to avoid hiss and hum. Your experience may be different, as I may have accidentally made a ground loop. There is a faintly  audible click at the start and end of the text, too.
  5. The Emic 2 uses DECTalk v5 commands and phonemes. Many DECTalk resources on the web (like these songs) use v4 or older, which are subtly incompatible. I haven’t found a reliable conversion protocol yet.

To end, here’s the Emic 2’s “Dennis” voice reading all of Brautigan’s All Watched Over By Machines of Loving Grace:


(plain link: molg-dennis-140wpm-16khz.mp3)

(even plainer link if you can’t decode MP2 files: molg-dennis-140wpm.mp3)

(recorded and edited for length with Audacity. No hippies — nekkid, or otherwise — were harmed in the making of this recording.)

how does he do that?

Someone asked how the automatic podcast works. It’s a bit complex, and they probably will be sorry they asked.

I have all my music saved as MP3s on a server running Firefly Media Server. It stores all its information about tracks in a SQLite database, so I can very easily grab a random selection of tracks.

Since I know the name of the track and the artist from the Firefly database, I have a selection of script lines that I can feed to flite, a very simple speech synthesizer. Each of these spoken lines is stored as as wav file, and then each candidate MP3 is converted to wav, and the whole mess is joined together using SoX. SoX also created the nifty (well, I think so) intro and outro sweeps.

The huge wav file of the whole show is converted to MP3 using LAME and uploaded to my webhost with scp. All of this process is done by one Perl script – it also creates the web page, the RSS feed, and even logs the tracks on Last.fm.

Couldn’t be simpler.

the voice of the Mr. Owns

I am trying to speak to my computer I’m not sure it understands the two well actually it’s doing not badly I’m speaking in a rather disjointed manner I have to go to New York state tomorrow I’m not quite sure why I have to look at twin turbo inns no I don’t have to look at twin turbo ends I have to look at wind turbine what’s

While it’s remarkably accurate I’m going to be really mean and bald face and sure I don’t know “bold face is that I’m going to be located anyway, to your things have calmed pear shaped nine

This is the voice of the Mr. Owns and I do I think I’m going to live. I suppose this is better than I expected especially since Mike accent is unusual most people in Canada do not understand a and so finally I have a computer that understands the this is a bit worrying isn’t it?

Well that wraps it up for dictation. You have a pleasant evening. Good night!

– what Microsoft speech recognition thinks I said. The random “what” and “nine” is me starting to laugh, and “bald face” is “blog this”.