SuiUnnJptU

ESP32 Direct Dialogue Large Language Model AI Voice Assistant

 
Overview
The ESP32S3 uses the INMP441 to receive audio, sends the PCM audio data to the STT speech recognition service to convert it into text, then sends the text to the Large Language Model API to ask a question, and finally sends the text response to the TTS speech synthesis service. The audio is then played back via the MAX98357A, with interactive display using a TFT touchscreen.
Video link:
https://www.bilibili.com/video/BV1F1421k7Sv/?vd_source=922712da2bcef8666165702c88f19f89
This project was originally a derivative project of an ESP32 MP3 player. Thinking that since it could play audio, it could also record, this simple audio processing tool using the ESP32 was developed.
Both the STT Chinese language recognition and TTS speech synthesis in this project use the iFlytek API. You need to register an iFlytek account (https://www.xfyun.cn/) and receive a trial package to use the service for free.
The large language model in this project uses the same engine as Doubao from Volcano Engine (https://www.volcengine.com/product/ark). Registering an account and receiving a free trial is required, and you can try it for a period of time.
For the three APIs mentioned above, you need to obtain the keys from their respective official website consoles and modify the relevant fields in the code to connect to the API normally. The code also needs to modify the WiFi connection information.
Xunfei's service uses WebSocket to connect to the API. TTS uses streaming processing, receiving and playing audio simultaneously, but the data processing is not rigorous and sometimes errors occur, which are still unresolved. Please check and modify them yourself.
Volcano Engine uses HTTP to connect to the API (because Volcano's WebSocket is indeed not working properly, my knowledge is limited, please excuse my shortcomings), and does not implement streaming processing, resulting in wasted response time.
STT also uses WebSocket connection but does not use streaming processing (laziness). This has little impact on short sentences, but when sending long sentences, some time is wasted here.
As a limited expert, I have barely managed to integrate some basic knowledge to implement this idea. The code is provided for reference by fellow beginners; experts can just laugh it off.
If you intend to replicate this project, please download the source code and compile it for testing first. Only attempt replication if it compiles successfully; otherwise, please reconsider.
The compilation environment is as follows
: Compilation software: Arduino IDE 2.3.2
; SDK: ESP32 SDK 2.0.13 (downloadable from the development board management);
Main libraries used: TFT_eSPI 2.5.43 (downloadable from the library management),
                     U8g2_for_TFT_eSPI (download address: https://github.com/Bodmer/U8g2_for_TFT_eSPI);
Development board setup:
The attached source code package contains a Base64_Arturo library, which needs to be copied to the libraries/ directory.
On the motherboard, resistors R38 and R39 control the pull-up and pull-down pins of the MAX98357A module's GAIN interface. You only need to solder one of them, or none at all. Do not solder both simultaneously. Note that they are left floating by default.
GAIN pin function:
GAIN is, well, the gain setting. You can have a gain of 3dB, 6dB, 9dB, 12dB or 15dB.

15dB if a 100K resistor is connected between GAIN and GND
12dB if GAIN is connected directly to GND
9dB if GAIN is not connected to anything (this is the default)
6dB if GAIN is connected directly to Vin
3dB if a 100K resistor is connected between GAIN and Vin

 
 
TFT-espi's pin settings:
#define ILI9341_DRIVER#define TFT_WIDTH 320 #define TFT_HEIGHT 240 
#define TFT_MISO 19#define TFT_MOSI 4 // In some display driver board, it might be written as "SDA" and so on.#define TFT_SCLK 5#define TFT_CS 16 // Chip Select control pin 5 #define TFT_DC 6 // Data Command control pin #define TFT_RST -1 // Reset pin (could connect to Arduino RESET pin) #define TFT_BL 7 // LED back-light #define TFT_BACKLIGHT_ON HIGH #define TOUCH_CS 15 // Chip select pin (T_CS) of touch screen
 
 
 
Regarding the components:
ESP32S3 module selected: S3-WROOM-1-N16R8 version My link: https://item.taobao.com/item.htm?spm=a1z09.2.0.0.54402e8dfVHHff&id=675349632310&_u=o2oqo1kf26cd
INMP441 and MAX98357A All components are modules; you can directly search and purchase them on Taobao.
The battery plug is a 1.25mm pitch positive connector, supporting only a single 3.7V lithium battery. The Type-C interface has a charging function and can charge lithium batteries.
The speaker: MAX98357A can drive a 3W speaker
. The serial module is CH340C; note the "C" designation.
The display screen is a 3.2-inch touchscreen. Reference link: https://item.taobao.com/item.htm?spm=a1z09.2.0.0.54402e8dfVHHff&id=643516677167&_u=o2oqo1kfcc57
The motherboard has an MPU-6050 gyroscope, originally intended for screen rotation tracking, but not used in this project. It can be left unsoldered if not used for this purpose. For
 
other details, please refer to the video introduction. The supervisor generally doesn't reply to messages because I don't know how to answer most questions.
If you absolutely must ask, the probability of getting a reply is higher on Douyin (TikTok).
 
参考设计图片
×
 
 
Search Datasheet?

Supported by EEWorld Datasheet

Forum More
Update:2026-03-26 20:18:33

EEWorld
subscription
account

EEWorld
service
account

Automotive
development
community

Robot
development
community

About Us Customer Service Contact Information Datasheet Sitemap LatestNews


Room 1530, 15th Floor, Building B, No.18 Zhongguancun Street, Haidian District, Beijing, Postal Code: 100190 China Telephone: 008610 8235 0740

Copyright © 2005-2024 EEWORLD.com.cn, Inc. All rights reserved 京ICP证060456号 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号