speech – We Saw a Chicken …

SYN6288 TTS board from AliExpress

After remarkable success with the SYN-6988 TTS module, then somewhat less success with the SYN-6658 and other modules, I didn’t hold out much hope for the YuTone SYN-6288, which – while boasting a load of background tunes that could play over speech – can only convert Chinese text to speech

small blue circuit board with 6 MHz crystal oscillator, main chip, input headers at bottom and headphone jack/speaker output at top — as bought from quason official store: SYN6288 speech synthesis module

The wiring is similar to the SYN-6988: a serial UART connection at 9600 baud, plus a Busy (BY) line to signal when the chip is busy. The serial protocol is slightly more complicated, as the SYN-6288 requires a checksum byte at the end.

As I’m not interested in the text-to-speech output itself, here’s a MicroPython script to play all of the sounds:

# very crude MicroPython demo of SYN6288 TTS chip
# scruss, 2023-07
import machine
import time

### setup device
ser = machine.UART(
    0, baudrate=9600, bits=8, parity=None, stop=1
)  # tx=Pin(0), rx=Pin(1)

busyPin = machine.Pin(2, machine.Pin.IN, machine.Pin.PULL_UP)


def sendspeak(u2, data, busy):
    # modified from https://github.com/TPYBoard/TPYBoard_lib/
    # u2 = UART(uart, baud)
    eec = 0
    buf = [0xFD, 0x00, 0, 0x01, 0x01]
    # buf = [0xFD, 0x00, 0, 0x01, 0x79]  # plays with bg music 15
    buf[2] = len(data) + 3
    buf += list(bytearray(data, "utf-8"))
    for i in range(len(buf)):
        eec ^= int(buf[i])
    buf.append(eec)
    u2.write(bytearray(buf))
    while busy.value() != True:
        # wait for busy line to go high
        time.sleep_ms(5)
    while busy.value() == True:
        # wait for it to finish
        time.sleep_ms(5)


for s in "abcdefghijklmnopqrstuvwxy":
    playstr = "[v10][x1]sound" + s
    print(playstr)
    sendspeak(ser, playstr, busyPin)
    time.sleep(2)

for s in "abcdefgh":
    playstr = "[v10][x1]msg" + s
    print(playstr)
    sendspeak(ser, playstr, busyPin)
    time.sleep(2)

for s in "abcdefghijklmno":
    playstr = "[v10][x1]ring" + s
    print(playstr)
    sendspeak(ser, playstr, busyPin)
    time.sleep(2)

Each sound starts and stops with a very loud click, and the sound quality is not great. I couldn’t get a good recording of the sounds (some of which of which are over a minute long) as the only way I could get reliable audio output was through tiny headphones. Any recording came out hopelessly distorted:

I’m not too disappointed that this didn’t work well. I now know that the SYN-6988 is the good one to get. It also looks like I may never get to try the XFS5152CE speech synthesizer board: AliExpress has cancelled my shipment for no reason. It’s supposed to have some English TTS function, even if quite limited.

Here’s the auto-translated SYN-6288 manual, if you do end up finding a use for the thing

SYN6288-translated Download

Adding speech to MMBasic

Yup, it’s another “let’s wire up a SYN6988 board” thing, this time for MMBasic running on the Armmite STM32F407 Module (aka ‘Armmite F4’). This board is also known as the BLACK_F407VE, which also makes a nice little MicroPython platform.

Uh, let’s not dwell too much on how the SYN6988 seems to parse 19:51 as “91 minutes to 20” …

Wiring

SYN6988	Armmite F4
RX	PA09 (COM1 TX)
TX	PA10 (COM1 RX)
RDY	PA08

your choice of 3.3 V and GND connections, of course

Where to buy: AliExpress — KAIKAI Electronics Wholesale Store : High-end Speech Synthesis Module Chinese/English Speech Synthesis XFS5152 Real Pronunciation TTS

Yes, I know it says it’s an XFS5152, but I got a SYN6988 and it seems to be about as reliable a source as one can find. The board is marked YS-V6E-V1.03, and even mentions SYN6988 on the rear silkscreen:

Code

REM                 SYN6988 speech demo - MMBasic / Armmite F4
REM                 scruss, 2023-07

OPEN "COM1:9600" AS #5
REM                 READY line on PA8
SETPIN PA8, DIN, PULLUP

REM    you can ignore font/text commands
CLS
FONT 1
TEXT 0,15,"[v1]Hello - this is a speech demo."
say("[v1]Hello - this is a speech demo.")
TEXT 0,30,"[x1]soundy[d]"
say("[x1]soundy[d]"): REM    chimes
TEXT 0,45,"The time is "+LEFT$(TIME$,5)+"."
say("The time is "+LEFT$(TIME$,5)+".")
END

SUB say(a$)
  LOCAL dl%,maxlof%
  REM     data length is text length + 2 (for the 1 and 0 bytes)
  dl%=2+LEN(a$)
  maxlof%=LOF(#5)
  REM     SYN6988 simple data packet
  REM      byte  1 : &HFD
  REM      byte  2 : data length (high byte)
  REM      byte  3 : data length (low byte)
  REM      byte  4 : &H01
  REM      byte  5 : &H00
  REM      bytes 6-: ASCII string data
  PRINT #5, CHR$(&hFD)+CHR$(dl%\256)+CHR$(dl% MOD 256)+CHR$(1)+CHR$(0)+a$;
  DO WHILE LOF(#5)<maxlof%
  REM       pause while sending text
    PAUSE 5
  LOOP
  DO WHILE PIN(PA8)<>1
    REM       wait until RDY is high
    PAUSE 5
  LOOP
  DO WHILE PIN(PA8)<>0
    REM       wait until SYN6988 signals READY
    PAUSE 5
  LOOP
END SUB

For more commands, please see Embedded text commands

Heres the auto-translated manual for the SYN6988:

SYN6988-translated Download

Markedly less success with three TTS boards from AliExpress

The other week’s success with the SYN6988 TTS chip was not repeated with three other modules I ordered, alas. Two of them I couldn’t get a peep out of, the other didn’t support English text-to-speech.

SYN6658

This one looks remarkably like the SYN6988:

Yes, I added the 6658 label so I could tell the boards apart

Apart from the main chip, the only difference appears to be that the board’s silkscreen says YS-V6 V1.15 where the SYN6988’s said YS-V6E V1.02.

To be fair to YuTone (the manufacturer), they claim this only supports Chinese as an input language. If you feed it English, at best you’ll get it spelling out the letters. It does have quite a few amusing sounds, though, so at least you can make it beep and chime. My MicroPython library for the VoiceTX SYN6988 text to speech module can drive it as far as I understand it.

Here are the sounds:

Name	Type	Link
msga	Polyphonic Chord Beep
msgb	Polyphonic Chord Beep
msgc	Polyphonic Chord Beep
msgd	Polyphonic Chord Beep
msge	Polyphonic Chord Beep
msgf	Polyphonic Chord Beep
msgg	Polyphonic Chord Beep
msgh	Polyphonic Chord Beep
msgi	Polyphonic Chord Beep
msgj	Polyphonic Chord Beep
msgk	Polyphonic Chord Beep
msgl	Polyphonic Chord Beep
msgm	Polyphonic Chord Beep
msgn	Polyphonic Chord Beep
sound101	Prompt Tone
sound102	Prompt Tone
sound103	Prompt Tone
sound104	Prompt Tone
sound105	Prompt Tone
sound106	Prompt Tone
sound107	Prompt Tone
sound108	Prompt Tone
sound109	Prompt Tone
sound110	Prompt Tone
sound111	Prompt Tone
sound112	Prompt Tone
sound113	Prompt Tone
sound114	Prompt Tone
sound115	Prompt Tone
sound116	Prompt Tone
sound117	Prompt Tone
sound118	Prompt Tone
sound119	Prompt Tone
sound120	Prompt Tone
sound121	Prompt Tone
sound122	Prompt Tone
sound123	Prompt Tone
sound124	Prompt Tone
sound201	phone ringtone
sound202	phone ringtone
sound203	phone ringtone
sound204	phone ringing
sound205	phone ringtone
sound206	door bell
sound207	door bell
sound208	doorbell
sound209	door bell
sound210	alarm
sound211	alarm
sound212	alarm
sound213	alarm
sound214	wind chimes
sound215	wind chimes
sound216	wind chimes
sound217	wind chimes
sound218	wind chimes
sound219	wind chimes
sound301	alarm
sound302	alarm
sound303	alarm
sound304	alarm
sound305	alarm
sound306	alarm
sound307	alarm
sound308	alarm
sound309	alarm
sound310	alarm
sound311	alarm
sound312	alarm
sound313	alarm
sound314	alarm
sound315	alert-emergency
sound316	alert-emergency
sound317	alert-emergency
sound318	alert-emergency
sound319	alert-emergency
sound401	credit card successful
sound402	credit card successful
sound403	credit card successful
sound404	credit card successful
sound405	credit card successful
sound406	credit card successful
sound407	credit card successful
sound408	successfully swiped the card
sound501	cuckoo
sound502	error
sound503	applause
sound504	laser
sound505	laser
sound506	landing
sound507	gunshot
sound601	alarm sound / air raid siren (long)
sound602	prelude to weather forecast (long)

SYN-6658 Sound Reference

Where I bought it: Electronic Component Module Store : Chinese-to-real-life Speech Synthesis Playing Module TTS Announcer SYN6658 of Bank Bus Broadcasting.

Auto-translated manual:

syn6658-translated Download

Unknown “TTS Text-to-speech Broadcast Synthesis Module”

All I could get from this one was a power-on chime. The main chip has had its markings ground off, so I’ve no idea what it is.

Red and black wires seem to be standard 5 V power. Yellow seems to be serial in, white is not connected.

Where I bought it: Electronic Component Module Store / Chinese TTS Text-to-speech Broadcast Synthesis Module MCU Serial Port Robot Plays Prompt Advertising Board

HLK-V40 Speech Synthesis Module

In theory, this little board has a lot going for it: wifi, bluetooth, commands sent by AT commands. In practice, I couldn’t get it to do a thing.

Where I bought it: HI-LINK Component Store / HLK-V40 Speech Synthesis Module TTS Pure Text to Speech Playback Hailinco AI intelligent Speech Synthesis Broadcast

I’ve still got a SYN6288 to look at, plus a XFS5152CE TTS ~~that’s in the mail~~ that may or may not be in the mail. The SYN6988 is the best of the bunch so far.

SYN-6988 Speech with MicroPython

Full repo, with module and instructions, here: scruss/micropython-SYN6988: MicroPython library for the VoiceTX SYN6988 text to speech module

(and for those that CircuitPython is the sort of thing they like, there’s this: scruss/circuitpython-SYN6988: CircuitPython library for the YuTone VoiceTX SYN6988 text to speech module.)

I have a bunch of other boards on order to see if the other chips (SYN6288, SYN6658, XF5152) work in the same way. I really wonder which I’ll end up receiving!

Update (2023-07-09): Got the SYN6658. It does not support English TTS and thus is not recommended. It does have some cool sounds, though.

Embedded Text Command Sound Table

The github repo references Embedded text commands, but all of the sound references was too difficult to paste into a table there. So here are all of the ones that the SYN-6988 knows about:

Name is the string you use to play the sound, eg: [x1]sound101
Alias is an alternative name by which you can call some of the sounds. This is for better compatibility with the SYN6288 apparently. So [x1]sound101 is exactly the same as specifying [x1]sounda
Type is the sound description from the manual. Many of these are blank
Link is a playable link for a recording of the sound.

Name	Alias	Type
sound101	sounda
sound102	soundb
sound103	soundc
sound104	soundd
sound105	sounde
sound106	soundf
sound107	soundg
sound108	soundh
sound109	soundi
sound110	soundj
sound111	soundk
sound112	soundl
sound113	soundm
sound114	soundn
sound115	soundo
sound116	soundp
sound117	soundq
sound118	soundr
sound119	soundt
sound120	soundu
sound121	soundv
sound122	soundw
sound123	soundx
sound124	soundy
sound201		phone ringtone
sound202		phone ringtone
sound203		phone ringtone
sound204		phone rings
sound205		phone ringtone
sound206		doorbell
sound207		doorbell
sound208		doorbell
sound209		doorbell
sound301		alarm
sound302		alarm
sound303		alarm
sound304		alarm
sound305		alarm
sound306		alarm
sound307		alarm
sound308		alarm
sound309		alarm
sound310		alarm
sound311		alarm
sound312		alarm
sound313		alarm
sound314		alarm
sound315		alert/emergency
sound316		alert/emergency
sound317		alert/emergency
sound318		alert/emergency
sound401		credit card successful
sound402		credit card successful
sound403		credit card successful
sound404		credit card successful
sound405		credit card successful
sound406		credit card successful
sound407		credit card successful
sound408		successfully swiped the card

SYN-6988 Sound Reference

Speech from Python with the SYN6988 module

I’ve had one of these cheap(ish – $15) sound modules from AliExpress for a while. I hadn’t managed to get much out of it before, but I poked about at it a little more and found I was trying to drive the wrong chip. Aha! Makes all the difference.

So here’s a short narration from my favourite Richard Brautigan poem, read by the SYN6988.

Sensitive listener alert! There is a static click midway through. I edited out the clipped part, but it’s still a little jarring. It would always do this at the same point in playback, for some reason.

The only Pythonish code I could find for these chips was meant for the older SYN6288 and MicroPython (syn6288.py). I have no idea what I’m doing, but with some trivial modification, it makes sound.

I used the simple serial UART connection: RX -> TX, TX -> RX, 3V3 to 3V3 and GND to GND. My board is hard-coded to run at 9600 baud. I used the USB serial adapter that came with the board.

Here’s the code that read that text:

#!/usr/bin/env python3
# -*- coding: utf-8 -*-

import serial
import time

# NB via MicroPython and old too! Also for a SYN6288, which I don't have
# nabbed from https://github.com/TPYBoard/TPYBoard_lib/

def sendspeak(port, data):
    eec = 0
    buf = [0xFD, 0x00, 0, 0x01, 0x01]
    buf[2] = len(data) + 3
    buf += list(bytearray(data, encoding='utf-8'))
    for i in range(len(buf)):
        eec ^= int(buf[i])
    buf.append(eec)
    port.write(bytearray(buf))

ser = serial.Serial("/dev/ttyUSB1", 9600)
sendspeak(ser, "[t5]I like to think [p100](it [t7]has[t5] to be!)[p100] of a cybernetic ecology [p100]where we are free of our labors and joined back to nature, [p100]returned to our mammal brothers and sisters, [p100]and all watched over by machines of loving grace")
time.sleep(8)
ser.close()

This code is bad. All I did was prod stuff until it stopped not working. Since all I have to work from includes a datasheet in Chinese (from here: ??????-SYN6988???TTS????) there’s lots of stuff I could do better. I used the tone and pause tags to give the reading a little more life, but it’s still a bit flat. For $15, though, a board that makes a fair stab at reading English is not bad at all. We can’t all afford vintage DECtalk hardware.

The one thing I didn’t do is used the SYN6988’s Busy/Ready line to see if it was still busy reading. That means I could send it text as soon as it was ready, rather than pausing for 8 seconds after the speech. This refinement will come later, most likely when I port this to MicroPython.

More resources:

Board front image (labelled YS-V6E V1.02)
Board back image
Auto-translated programming manual (thanks, Google Translate!): SYN6988-translated.pdf

[tÉ’k bÉ’ks] â€” a tiny hardware speech synthesizer/TTS

[tÉ’k bÉ’ks]: inside. The observant amongst you will notice that the speech board is 1/10″ further in than it should be for ideal alignment with the USB serial adapter.

Back in the 1980s, the now-defunct Digital Equipment Corporation (â€œDECâ€) sold a hardware speech synthesizer based on Dennis Klatt’s research at MIT.Â These DECTalk boxes were compact and robust, and â€” despite not having the greatest speech quality â€” gave valuable speech, telephone and reading accessibility to many people. Stephen Hawking’s distinctive voice is from a pre-DEC version of the MIT hardware.

DEC is long gone, and the licensing of DECTalk has wandered off into mostly software. Much to the annoyance of those in earshot, I’ve always enjoyed dabbling in speech synthesis. DECTalk hardware remains expensive, partly because of demand from electronic music producers (its vocoder-like burr is on countless tracks), but also because there are still many people who rely on it for daily life. I couldn’t justify buying a real DECTalk, but I found this: the Parallax Emic 2 Text-to-Speech Module.Â For about $80, this stamp-sized board brings a hardware DECTalk implementation to embedded projects.

The Emic 2 is really marketed to microcontroller hobbyists: Make Your Arduino Speak! sorta thing. But I wanted to make a DECTalk-ish hardware box, with serial input, a speaker, and switchable headphone/line jack.Â [tÉ’k bÉ’ks] (a fair approximation of how I pronounce â€œTalk Boxâ€) is the result.

Hardware

Parallax Emic 2 Text-to-Speech Module
OSEPP FTDI USB-Serial BreakoutÂ â€” there are many USB-Serial boards that would do this, but two points in this one’s favour are: i) it has header pins for breadboard use, and ii) I had a spare one.
Small 8Î© speaker element â€” the one I used is most likely a headphone element, bought from Active Surplus (RIP). This should be as small as you can get away with (and still hear) as the USB-Serial connection isn’t designed to supply audio power.
Header pins and sockets
Toggle switch
Small project box with perfboard
Jumper wires and solder

Connections

Emic 2             Serial
======             ======
 GND                GND
 5V                 Vcc
 SOUT               RXD
 SIN                TXD

Emic 2             Speaker
======             =======
 SP-                -
 SP+  (via switch)  +

Using it

You’ll need some kind of serial terminal connection. In a pinch, you can use the serial monitor that is in the Arduino development environment. Either way, identify your serial port (/dev/ttyUSBN, COMN:, or /dev/tty-usbserialNNNN) and find a way to send 9600 baud, 8N1 characters to it. Hit Return, and you should be greeted by the Emic 2’s : prompt (or a ?, followed by :). Whether you get the prompt or not depends on whether local echo is set or not. Either way, try sending this line:

SAll watched over by machines of loving grace.

You should hear a voice say the title of Richard Brautigan’s lovely poem All Watched Over by Machines of Loving Grace (caution: video link contains nekkid hippies). You should get the : prompt back once the the speech has stopped. And that’s all there is to it: send an S, followed by up to 1023 bytes of (basically ASCII) text, followed by a newline, and it will be spoken. There’s more detail, of course, in the Emic 2 documentation and the Emic 2 Epson/Fonix DECTalk 501 User’s Guide for changing voices, etc. Yes, you can make it sing. No, you probably shouldn’t, though.

Notes

The Emic 2 has no serial flow control, so you have to wait until the module stops speaking (or you send it the stop command) before you can send more. The easiest way is to poll the serial port and see if there’s the : prompt waiting. Until you see the prompt, any text you send it may be lost.
The Emic 2 is an embedded device; Unicode is a bit of a stretch. It’s supposed to accept ISO Latin-1 8-bit characters (handy for Spanish mode), though.
Starting every speech line with S may make this board incompatible with assistive technology software such as the JAWS screen reader. I don’t think that this was the goal for Emic 2’s designers (Grand Idea Studio), however.
The output from the audio jack has a fair bit of noise on it, and you need to set the volume quite low to avoid hiss and hum. Your experience may be different, as I may have accidentally made a ground loop. There is a faintlyÂ audible click at the start and end of the text, too.
The Emic 2 uses DECTalk v5 commands and phonemes. Many DECTalk resources on the web (like these songs) use v4 or older, which are subtly incompatible. I haven’t found a reliable conversion protocol yet.

To end, here’s the Emic 2’s â€œDennisâ€ voice reading all of Brautigan’s All Watched Over By Machines of Loving Grace:

(plain link: molg-dennis-140wpm-16khz.mp3)

(even plainer link if you can’t decode MP2 files: molg-dennis-140wpm.mp3)

(recorded and edited for length with Audacity. No hippies â€” nekkid, or otherwise â€” were harmed in the making of this recording.)

something went wrong

At the automatic podcast today, something went very wrong with the announcements. Hear what I mean.

I was playing with flite‘s new voices, and I think the command line went up the chute.

how does he do that?

Someone asked how the automatic podcast works. It’s a bit complex, and they probably will be sorry they asked.

I have all my music saved as MP3s on a server running Firefly Media Server. It stores all its information about tracks in a SQLite database, so I can very easily grab a random selection of tracks.

Since I know the name of the track and the artist from the Firefly database, I have a selection of script lines that I can feed to flite, a very simple speech synthesizer. Each of these spoken lines is stored as as wav file, and then each candidate MP3 is converted to wav, and the whole mess is joined together using SoX. SoX also created the nifty (well, I think so) intro and outro sweeps.

The huge wav file of the whole show is converted to MP3 using LAME and uploaded to my webhost with scp. All of this process is done by one Perl script – it also creates the web page, the RSS feed, and even logs the tracks on Last.fm.

Couldn’t be simpler.

the voice of the Mr. Owns

I am trying to speak to my computer Iâ€™m not sure it understands the two well actually itâ€™s doing not badly Iâ€™m speaking in a rather disjointed manner I have to go to New York state tomorrow Iâ€™m not quite sure why I have to look at twin turbo inns no I donâ€™t have to look at twin turbo ends I have to look at wind turbine whatâ€™s

While itâ€™s remarkably accurate Iâ€™m going to be really mean and bald face and sure I donâ€™t know â€œbold face is that Iâ€™m going to be located anyway, to your things have calmed pear shaped nine

This is the voice of the Mr. Owns and I do I think Iâ€™m going to live. I suppose this is better than I expected especially since Mike accent is unusual most people in Canada do not understand a and so finally I have a computer that understands the this is a bit worrying isnâ€™t it?

Well that wraps it up for dictation. You have a pleasant evening. Good night!

– what Microsoft speech recognition thinks I said. The random “what” and “nine” is me starting to laugh, and “bald face” is “blog this”.