aytwartoofyoroo

【ART-Pi】Offline TTS speech synthesis + recognition

Overview

exhibit

Function

1. The base plate design adopts a modular design with a compact layout to save resources and costs.

2. Pure offline module.

3. Adopting a double-layer board design, from left to right are: interface, power supply, voice module, and power amplifier. Voice input microphone and speaker interface.

4. Supports the synthesis of any Chinese text and English text, and supports mixed reading of Chinese and English. The chip supports the synthesis of any Chinese and English text, and can use four encoding methods: GB2312 , GBK , BIG5 and UNICODE . The amount of text synthesized at a time can be up to 4K bytes.

The chip analyzes the text. For text in common formats such as numbers, numbers, time, date, weights and measures symbols, the chip can correctly identify and process it according to the built-in text matching rules; it can also correctly identify and process general polyphonic words according to their context. Determine the reading; additionally

For texts in both Chinese and English, mixed reading in Chinese and English can be achieved.
5. Supports voice codec function. Users can use the chip to record and play directly. The chip integrates a voice encoding unit and a decoding unit, which can encode and decode voice to achieve recording and playback functions. The chip's voice codec has high compression rate, low distortion rate,

It has the characteristics of low latency and can support a variety of voice coding and decoding rates. These characteristics make it very suitable for digital voice communications, voice storage and other occasions that require digital processing of voice. Such as: vehicle-mounted WeChat, command center, etc.
6. Supports speech recognition function and supports the recognition of 30 command words. The default setting of the chip is 30 commonly used recognition command words in automotive, early warning and other industries.
7. Supports three communication methods: UART , I2C and SPI. The UART serial port supports 4 communication baud rates that can be set: 4800 bps , 9600 bps , 57600 bps , 115200 bps. You can choose the baud rate you need through hardware configuration according to the situation. . 8. Supports a variety of control commands, such as text synthesis, stop synthesis, pause synthesis, resume synthesis, status query, enter power saving mode, wake up, etc. The controller sends control commands through the communication interface to control the chip accordingly. The control commands of the chip are very simple and easy to use.

For example: the chip can play prompt sounds and Chinese text through a unified "synthesis command" interface, and can also set synthesis parameters through marked text.

principle

1. The minimum system requirements of the speech synthesis system include: controller module, XFS5152CE chip, power amplifier module, and speaker. If you need to use the speech recognition function or speech encoding and decoding function, you need to add a microphone to the system.
2. In the speech synthesis system, the main controller and the XFS5152CE chip can be connected through the UART interface, I2C interface, or SPI interface. The controller can send control commands and text to the XFS5152CE chip through the above communication interface , and the XFS5152CE chip receives the text. It is then synthesized into a voice signal for output, and the output signal is amplified by a power amplifier and then connected to a speaker for playback. 3. When using the speech recognition function, the host computer sends a command to activate the speech recognition function to the speech chip. The chip converts the speech data collected from the microphone into the corresponding recognition result through the internal recognition module, and transmits it back through the communication interface. to the controller. 4. When using the voice codec function (the communication interface must select the UART interface, and the baud rate is set to 115200bps ), the host computer sends a command to start codec to the voice chip, and the voice codec module inside the chip converts the collected audio The data is encoded and transmitted to the host computer in real time through the UART interface, or the audio data transmitted from the host computer is decoded and played in real time.

5. For recording function, the microphone module can be designed with reference to the following circuit. The MIC bias voltage MIC_BIAS is output by pin 12 of the chip. The network label MIC in the circuit in the figure is the microphone access point.

Text markup control

1. How to use text control tags

①The speech synthesis function of the XFS5152CE chip supports a variety of text control tags, which can meet the user's settings for speech synthesis speaker, volume, speaking speed, intonation, etc.
②The format of text control tags is generally a lowercase letter and an Arabic numeral within half-width square brackets (i.e. " [] "), such as: [m3] . The use of tags is exactly the same as that of synthetic text. For details on the communication protocol, please refer to the " 8.2.1 Speech Synthesis Command" chapter
of this development guide . ③The user can send the mark to the chip as text alone, for example: only send " [v3] " to the chip to set the synthesis volume to level 3 , or send the mark to the chip together with other texts to be synthesized , such as : " [v3] I'm talking quietly, [v10] I'm talking loudly".
④The mark is only used as a control mark to implement the setting function and will not be synthesized into sound output. For example: " [s1] I am slow and leisurely. [s8] I speak quickly", after setting the mark, the synthesis speed of the former sentence will be very slow, and the synthesis speed of the latter sentence will be very fast, but " s1" will not be read. ” and “ s8 ”.

2. Text control tag list

Note:
i. All control identifiers are half-width characters.
ii. The control mark needs to be sent according to the format of the speech synthesis command, and the control mark is synthesized as text, that is, the synthesis command is in the format of "frame header + data area length + synthesis command word + text encoding format + control mark text".
iii. The control flag is a global control flag, that is, as long as it is used once, without resetting the chip, or powering off, or using [d] to restore the default settings, all subsequent texts sent to the chip will be in its under control.
iv. When the chip is powered off or reset, the originally set flags will lose their effect and the chip will return to all default values.

Chip integrated beep

1. Information tone list

2. Ringtone prompt list

3.Alarm tone list

speech recognition command words

debug

1. After the chip is successfully initialized after power-on, it automatically returns 0X4A

2. Speech synthesis

-------------------------------------------------- -------------------------------------------------- --------------------------------------------------

Video link: https://b23.tv/CCKl3s

3. Prompt sound display

-------------------------------------------------- -------------------------------------------------- --------------------------------------------------

Video link: https://b23.tv/xCVxiR

4. Lightweight 30 speech recognition words

-------------------------------------------------- -------------------------------------------------- --------------------------------------------------

Video link: https://b23.tv/j6ouiL

Note: Although the recognition word will be broadcast here, you can set it in the program not to broadcast it and broadcast your own customized answer. For example: (Recognition: turn on the music. Answer: OK)

Data link

XFS5152CE user manual: chrome-extension://ibllepbpahcoppkjjllbabhnigcbffpi/http://www.iflytek.com/upload/contents/2014/07/53be5e3ec4047.pdf

iFlytek open platform community: http://bbs.xfyun.cn/portal.php

Design Files

All reference designs on this site are sourced from major semiconductor manufacturers or collected online for learning and research. The copyright belongs to the semiconductor manufacturer or the original author. If you believe that the reference design of this site infringes upon your relevant rights and interests, please send us a rights notice. As a neutral platform service provider, we will take measures to delete the relevant content in accordance with relevant laws after receiving the relevant notice from the rights holder. Please send relevant notifications to email: bbs_service@eeworld.com.cn.

It is your responsibility to test the circuit yourself and determine its suitability for you. EEWorld will not be liable for direct, indirect, special, incidental, consequential or punitive damages arising from any cause or anything connected to any reference design used.

Hot

Technical Resources More

Search Datasheet?

Supported by EEWorld Datasheet

Technical Videos More

Forum More

Update:2025-08-08 15:03:29

Please tell me how to post the picture?
Could you please tell me how to paste the pictures? I have tried, but I can only paste several pictures in an article to the bottom of the article, instead of one picture per paragraph of text. Can an
Is there any simulation software to learn dsp?
Can CCS do this? That is, no need for an emulator, just emulate on the PC. We don't have money, ... There are many emulation tools like microcontrollers, which are very convenient to learn.
Graduate graduation project design topic: Smart home controller based on SOPC technology will be launched soon
My graduation thesis topic is to embed uC/OS- II. You can ask questions and express your opinions. I know that with my ability, it is a great success to embed uC/OS- II.
How to determine the output current of FSEZ1317
Here is my personal understanding: PIN CS is similar to the power detection (current detection) resistor under the conventional FLYBACK MOSFET. From the DATASHEET, the following points are related to
Just saw the news about the second earth~ so excited
Will spaceships become a means of transportation in the future?
LCD display bitmap problem
Displaying a bitmap on an LCD means transferring the address of the BITMAP data array to the video memory of the LCD controller, right?

Datasheet More

Circuits More