Project - Text 2 Speech Voice Announcer

Practical Workshop Tutorials‎ > ‎Chapter 6 - Serial‎ > ‎

Project - Text 2 Speech Voice Announcer

Uses an XFS5152CE Text to Speech module to create a Voice Announcer.

(bought for just over £10 free postage from aliexpress)

The script uses Serial2 to connect to the TTS module at 115200 baud.

This leaves Serial(1) free for optionally sending data to be spoken.

The script can read incoming Serial data, and send it suitably pre-packaged via a Queue to the Serial2 port for the TTS module to speak.

(so the Serial Monitor can be used to send messages to speak if wished)

(and yes, this project could be used to add serial speech to any platform)

An alternative mode of use is for the ESP to send pre-programmed phrases.

This needs the user to have some way of selecting an appropriate message.
One or more sensors or switches might trigger a corresponding announcement message embedded in the appropriate 'triggering' subroutines.
Or messages might be saved as text files, eg: "msg1.txt", or "door_alert.txt", etc... to be read and sent to TTS by the appropriate trigger events.

The script offers examples of both triggered by the same gpio0 button - a short press <1s speaks an embedded msg$, a long press loads from file.

Many applications might benefit from voice feedback confirmation announcements - and it offers great potential for creating voice menu projects.
Use of i2c port expander(s) offer ability to individually hardware trigger as many different messages as could realistically be needed.

Message lengths should be less than 4K, otherwise a verbal warning will be issued and the message queue contents will be lost.
A verbal warning will also be issued if too many messages are sent to the queue (other infringements may be dealt with more severely!).

Inter-connections are: serial TX and RX, 3.3v and GND, and 2 wires for Audio Out, plus a BUSY pin for controlling the message queue.

If you have an LED on pin 13 it will echo the Busy signal on pin 12, but you could change the logic in 'busy:' subroutine to make it mirror instead.
The audio out should be connected to the aux-in of an amplified speaker (eg: a computer speaker, or bluetooth speaker, etc).

Sample Audio You can't expect a hardware chip to provide the same quality as an online speech service with its huge available libraries, but I doubt you'll find a cheaper better sounding hardware alternative any time soon - and that's without even trying to get the speech recognition going.

It is not necessary to know the details, but in case anyone is interested:
Essentially, the module is sent a serial control frame that tells it what to do, which includes 2 bytes containing the length (hi and Lo bytes) of everything that follows. Everything that follows must include the control (Speak) and encoding bytes, plus of course the text to be spoken, but it can also include a variety of optional parameters to change how the speech sounds, and how some words and abbreviations are pronounced.

To make things easier, this project splits the serial frame into a control header, an options block, and the text message to be spoken.
This makes it possible to send just the text message to be spoken, and everything else will be taken care of automatically.

Firstly, the options block is prefixed to the text message, then the combined length value is calculated and fed into the control header bytes.
Then the control header is prefixed to the combined options block plus text message text.
Finally the combined control header + options block + text message is all sent to the TTS module via an interrupt-controlled message queue.

Some control header bytes are 0 (zero), which cannot be sent to the serial ports using print or print2 chr$(0), so the serial2.byte instruction is used to send some byte values, followed by print2 for character strings.

Messages are not sent directly to the module, they are sent to a FIFO (First In, First Out) circular buffer array, which only sends the next waiting item in the queue when the modules BUSY pin sends an interrupt to the ESP pin 12 to signal that the module is READY for speaking the next message ... this prevents new messages from overwriting the current message before it is finished - adjust buffersize to suit your own needs.

Timer0 is used to periodically keep checking the queue to ensure no messages remain unspoken because of a missed Busy/Ready interrupt.

Basic:

'TTSQ by Electroguard - Text 2 Speech using XFS5152 TTS module, with hardware Busy pin interrupt to control queued messages with a circular FIFO array
buffersize = 7                             'needs to be 1 more than the max message queue size
dim q$(buffersize)
qitem$ = ""                                 'queue data variable
qsize = 0                                  'size of queue
qfront = 0
qback = 0
serial.mode 115200                    'Hardware serial
serial2.mode 115200,4,5            'Software serial2 TX & RX
led_pin = 13                                'Optional Busy/Ready status LED
pin.mode led_pin, output
busy_pin = 12                             'modules hardware Busy signal
ready = 0                                     'Ready state of Busy pin signal
buttonpin = 0                               'gpio0 button (active low)
pin.mode buttonpin, input, pullup
interrupt buttonpin, pressed
start = 0
stop = 0
pin.mode busy_pin, input, pullup
interrupt busy_pin, busy
onserial serialin
onserial2 serial2in
strt = 253                                     '&hFD start byte for TTL module
lenhi = 0                                 'hi byte of data length
lenlo = 0                                 'lo byte of data length
synth = 1                                'synthesise talk instruction byte
stat = 33                                     '&h21 status query byte
enc = 0                                       'encoder type byte
ctl$ = ""
opt$ = "[d][x1][t5][s6][m51][g2][h2][n1][y1][v8]"      'optional parameter defaults for adjusting speech and text interpretation
msg$ = "sound218 [p500] [m3] [t3] Text 2 Speech Voice Announcer,[p1500]"
gosub speak
msg$ = "Hello, and welcome"       'message to be spoken
gosub speak                                  'send message to voice queue
msg$ = "[m3]yes-hello from me [m54] and me[m53][s8] and don't forget me [m55]me also [m3][p100][s2] and me too[m52][s5][t1] shut up all of you."
gosub speak
msg$ = word$(ip$,1)
msg$ = "[t2][p1000] This I P address is [y1][i1][n1]" + replace$(msg$,"."," dot ")
gosub speak
msg$ = "[m53][s7] [p1300] To send a message to the Voice Announcer, add the required text into the msg dollar variable, then GO-SUB SPEAK."
gosub speak
msg$ = "[m52] this demonstrates that multiple messages can be queued even when the Announcer is still busy speaking."
gosub speak
msg$ = "[p300]Oh, and by the way, Obviously you should remove these introductory messages when they start annoying you. [p600][m51] Have fun."
gosub speak
gosub screen
timer0 2000, dQ
onhtmlreload screen
wait

speak:
L = len(opt$) + len(msg$) + 2
if L >4000 then
qfront = 0
qback = 0
qsize = (qback - qfront) mod buffersize
msg$ = "[p400][m53] ATTENTION. WARNING, the message was ignored because it was too long."
endif
gosub prepack
qpush
gosub dQ
return

prepack:
msg$ = msg$ + " "
L = len(opt$) + len(msg$) + 2
lenhi = L >> 8
lenlo = L and 255
ctl$ = str$(strt) + "," + str$(lenhi) + "," + str$(lenlo)+","+str$(synth)+","+str$(enc) + chr$(10)
print "before prepack, opt$="; opt$;", msg$=";msg$
qitem$ = ctl$ + "opt=" + opt$ + chr$(10) + "msg=" + msg$ + chr$(10)
print "after prepack, opt$="; opt$;", msg$=";msg$;", qitem$=";qitem$
return

dQ:
if pin(busy_pin) = ready then
qitem$ = ""
if qsize > 0 then qpull
if qitem$ <> "" then
for pos = 1 to 5
   serial2.byte val(word$(qitem$,pos,","))
next pos
print "dQ, opt$="; word.getparam$(qitem$,"opt");", msg$=";word.getparam$(qitem$,"msg")
print2 word.getparam$(qitem$,"opt")
print2 word.getparam$(qitem$,"msg")
endif
endif
return

serialin:
msg$ = serial.input$
gosub speak
return

serial2in:
temp$ = serial2.input$
print "Response=";hex$(asc(temp$))
return

busy:
if pin(busy_pin) = ready then
pin(led_pin) = 0
gosub dQ
else
pin(led_pin) = 1
endif
return

sub qpush
if qsize + 1 >= buffersize then
print "ERROR: Queue is full"
end
else
q$(qback) = qitem$                               ' Push item to back of q
qback = (qback + 1) mod buffersize      ' Adjust the back pointer
endif
qsize = (qback - qfront) mod buffersize
end sub

sub qpull
if qsize < 1 then
print "ERROR: Queue is empty"
end
else
qitem$ = q$(qfront)                                 ' Pull item from front of q
qfront = (qfront + 1) mod (buffersize)      ' Adjust the front pointer
endif
qsize = (qback - qfront) mod buffersize
end sub

shortpress:
msg$ = "[x1]sound208 Short button press"
gosub speak
return

longpress:
msg$ = "[x1]sound322 [m53] WARNING, long press, file name not found."
filename$ = "/program/longpress.txt"
if FILE.EXISTS(filename$) > 0 then msg$ = FILE.READ$(filename$)
gosub speak
return

pressed:
if pin(buttonpin) = 0 then start = millis else stop = millis
if stop > start then
if stop - start < 1000 then gosub shortpress else gosub longpress
endif
return

screen:
cls
msg$ = "[m52][t3][s5] Chitchio CB is a Wizard, and Ee lec tro guard rocks"
a$ = |<div style='display: table; margin-right: auto; margin-left: auto; text-align: center; '|
a$ = a$ + |<br><h2> Text 2 Speech Voice Announcer </h2><br>|
a$ = a$ + "opt$ " + textbox$(opt$,"optcss") + |<br><br>|
a$ = a$ + "msg$ " + textbox$(msg$,"msgcss") + |<br><br>|
a$ = a$ + cssid$("optcss", "color:gray; background:ghostwhite; width:800px; text-align: center; font-size: 1.1em;")
a$ = a$ + cssid$("msgcss", "color:darkblue; background: LightYellow ; width:800px; text-align: center; font-size: 1.1em; ")
a$ = a$ + |<button data-var='speak' onclick="connection.send('cmd:immediate'+'msg$='+String.fromCharCode(34)+ _$('msgcss').value + String.fromCharCode(34));cmdButton(this)"> Speak </button>|
a$ = a$ + |<br><br><table style="width:100%;color:gray;text-align: left;">|
a$ = a$ + |<tr><td style="color:blue;">| + "Param" + string$(4," ") + |</td>|
a$ = a$ + |<td style="color:red;">| + "values" + string$(6," ") + |</td>|
a$ = a$ + |<td style="color:orange;">| + "Subject" + string$(24," ") + |</td>|
a$ = a$ + |<td style="width:100%; color:green;">| + " Choices" + |</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|d</td><td> </td><td>Reset</td><td>Reset paramters back to default values </td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|f</td><td>|+string$(2," ")+|0-1</td><td>stress</td><td>0=stress syllables, 1=normal</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|h</td><td>|+string$(2," ")+|0-2</td><td>pronounciation</td><td>0=auto, 1=as letters, 2=as words</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|i</td><td>|+string$(2," ")+|0-1</td><td>phonetics</td><td>0=not recognize phonetic, 1=phonetic recognition</td></tr>|
refresh
pause 300
a$ = a$ + |<tr><td>|+string$(3," ")+|m</td><td>3,51-55</td><td>voices</td><td>3=female 1, 51=male 1, 52=male 2, 53=female 2, 54=Donald Duck, 55=Minnie Mouse</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|n</td><td>|+string$(2," ")+|0-2</td><td>numbers</td><td>0=auto, 1=digital number, 2=digital val</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|0</td><td>|+string$(2," ")+|0-1</td><td>zero</td><td>0=zero, 1=oh</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|p</td><td>|+string$(1," ")+|millis</td><td>pause</td><td>insert pause for specified millis duration</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|r</td><td>|+string$(2," ")+|0-1</td><td>name policy</td><td>0=auto, 1=mandatory</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|s</td><td>|+string$(1," ")+|0-10</td><td>speed</td><td>0=slow, 10=fast</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|t</td><td>|+string$(1," ")+|0-10</td><td>tone</td><td>0=low, 10=high</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|v</td><td>|+string$(1," ")+|0-10</td><td>volume</td><td>0=quiet, 10=loud</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|x</td><td>sound<i>nnn</i></td><td>sound effects</td><td>0=noff,1=on, sound101-125, ringtones=sound201-225, alarms=sound301-330</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|y</td><td>|+string$(2," ")+|0-1</td><td>one</td><td>0=as unitary, 1=as one</td></tr>|
a$ = a$ + |<tr><td>|+string$(3," ")+|z</td><td>|+string$(2," ")+|* #</td><td>rhythm break</td><td>0=read as characters, 1=use * to break word rhythm, # to pause</td></tr>|
a$ = a$ + |</table><br><i style="color: dimgrey; text-align: center;">(This is only a rough 'crib sheet' interpretation offered to make 'on the fly' parameter evaluations a bit handier, so don't take the info as gospel</i></div>|
html a$
return

'---------------- end -----------------

You may prefer to change some of the default parameter settings contained in the opt$ variable, but they should do to get you started.
The script has examples showing how the parameters can also be inserted anywhere in the text to tailor how something is spoken or interpreted.
In general, they are a single lower case letter followed by a number (from 0 to whatever) enclosed within square brackets. Most applied parameter changes take effect until changed again, but the pause [pmillis] is a special case allowing a one-off pause of the specified duration to be inserted.

Although it isn't an operational requirement, there is a web page available on the Output browser page.

This offers opportunity to change the text to speak message and any parameters on the fly -. simply edit msg$ then click the Speak button.

So you can add any [parameter] you wish and quickly try out all its available values (just by editing its number) to hear what difference they make.

Sounds [x1] has been turned on by default, so you can just enter 'sound' + an appropriate 3 digit number to play the appropriate sound effect.
There's 3 groups - sound101 to sound125 are general noises, sound201 to sound225 are called ring tones, sound301 to sound330 are alarms.

A 'heads-up' regarding the web page textbox$ components:
When editing the screen contents of a Basic textbox$, the textbox$ variable will only be updated with the edits after pressing carriage return.
So if you edit the screen contents of the opt$ textbox window then click the Speak button before pressing enter on your edits, the variable contents will not yet have been updated, so your edits will have no effect... you need to Enter your edits first.
Except - in the case of the msg$ textbox, CiccioCB has provided a javascript snippet to create a web button component (not a Basic button) which will automatically update the msg$ variable when the button is clicked without needing to press carriage return first.
So if you have not been confused by all of that, you will now know of a good javascript button snippet - but otherwise, just remember to press the enter button after editing any textbox contents.

I've added a 'best guess' crib sheet of translated parameters to help make things a bit easier - but there is much scope for learning.
A look through the 'better-than-nothing' documentation in the iforece2d links (see below) shows there is still much in-built functionality waiting to be utilised by intrepid explorers, including voice recognition with built-in commands, and recording of your own codecs (it has input for onboard mic!).

The module also has capability for i2c or SPI operation, and also an intriguing unmarked switch (which must do something, but who knows what).

A few quirks I've noticed:

The switch speed setting options printed on the module don't show what the 'All Off' defaults are (but change them anyway to 'Off, On, Off, On').

Some words when spelled correctly do not pronounce correctly, so experiment with 'wrong' spellings until you get it to sound how you want it to.

I couldn't persuade the module to pronounce "." as 'dot', so I ripped the dots out and replaced them with " dot " which does pronounce correctly.

msg$ = "[t2][p1000] This I P address is [y1][i1][n1]" + replace$(msg$,"."," dot ")

The 'serialin:' branch should grab all incoming serial(1) text and drop it straight into the msg$ variable for normal 'speak:' processing, but I noticed that if it included a [bracketed] parameter at the front of the serial string, the leading bracket was dropped... but I didn't want to spoil all the fun, so have left it as room for improvement.

I became aware of this module after stumbling across a very handy video from iforce2d that I 'liked', and which contains useful links,details, and a sample of audio to listen to:
https://www.youtube.com/watch?v=kuBG0U6X7Jw&t=46s

TTSQ.zip

(3k)

Margaret Baker,

May 17, 2018, 2:33 PM

v.1

Comentarios

Annex WiFI RDS

Main Menu

Join Our Discussion

Project - Text 2 Speech Voice Announcer