How to implement fast search in embedded devices

xiaohei85

How to implement fast search in embedded devices [Copy link]

Prerequisite: embedded device, CPU400, each record content is tagData, there are 1 million or more records, requirement: how to complete the search within 1 second according to the keyword search szText field, and return the records that meet the requirements. No database involved, the data is stored in the file itself. typedef struct tagData{ int iTextLen; int iOtherLen; char* szText; //Keyword char* szTextSpell; //Pinyin char* szDetails; }Data; I stored the data in sequence according to Data, and built an index table according to szText (mostly 1:1). Although the speed is satisfied, it also leads to new problems. If I want to search a field in the Data structure (such as szTextSpell), I need to rebuild an index, so the capacity will definitely not meet the requirements. I have thought about it for a long time, but I still can't get a more perfect method. If any prawn has come into contact with this problem, please guide me, or give me an idea, I will be very grateful.

chiwawa

Let's talk about how the first index table is built. According to the current situation, we can only optimize the first index table to put down multiple index tables. Generally speaking, it is not necessary to build a 1:1 index table.

zserfv8210

to 91program: I have seen others do fast retrieval, and their data volume is the same as my real data. I am sure they have also done indexing, but it seems to have optimized the real data after indexing. Of course, their indexing method is definitely better than mine, at least the data volume is smaller than mine. Since someone can do it, there must be a way. I am stupid and have never thought of a better way, so I hope someone can help "crack" it.

winner123

This is a problem that combines data structure and optimization algorithm. It is challenging. This is the first time I encounter it. Thinking...

zjgczx

typedef struct tagData{ int iTextLen; int iOtherLen; char* szText; //Keyword char* szTextSpell; //Pinyin char* szDetails; }Data; When my colleague did sorting before, the structure was fixed, there was no pointer type. All were limited in size. For example, char szText[20]; All were processed in BYTE format. I am not sure how it was done. It seems that tree sorting was used. The speed is very fast. (Used in GPS navigation to retrieve national data)

k275868177

I'd like to make a suggestion and let's discuss it. Some principles of database must be used here. Since you used szText as the keyword when you created the index for the first time, you can also use szText and szTextspell as the primary key to create the index as needed, and then search. The speed will definitely be better than rebuilding an index table. Of course, this method is static after all and doesn't make sense. There is another way, which is to dynamically modify the index table. My idea is to first add two items to your structure typedef struct tagData{ int iTextLen; int iOtherLen; char* szText; //Keyword char* szTextSpell;//Pinyin char* szDetails; Data *front; Date *next; }Data; This is equivalent to adding two pointers, so that a linked list can be created in the memory. Using a linked list to search is definitely faster than reading the file directly. Then create an index. Create an index based on the keyword. When the keyword changes, you can add, modify, and delete it from the original index table. These steps are similar to GROUP in database operations BY operation, modify and refresh the index table as needed at any time.

ccqingzhi

CPU400, a total of 1 million or more records, complete the search keyword char* Text within 1 second. This task is quite severe. Excluding the time of reading data, 1 million records are assumed to be in memory, and it is just a memory search. If the Text can be listed according to each letter, then if your Text is 6 bits long, there is no need to traverse, just take it directly from the linked list, and then take the next letter from the linked list. This will become a fork tree. This should meet the requirements. However, considering the speed of reading so much data, it may be necessary to separate these index information from the real data. As for how to get the data directly, it depends on your own way. It's just an idea... I hope it will be useful to you.

dw772

No matter how to manage 1 million data, there are at least 1 million records after all. The fastest way is to change more to less, change big to small, change traversal to direct retrieval, or the least retrieval operation. Then the space will be wasted and you must use a method to exchange space for speed. Then you will need to do more in managing and maintaining data. Regarding sorting, you may need to compromise here, and maybe only partial sorting can be performed. See if others have good ideas...

houcs01

Space for time, time for space. It is difficult to balance the two. Assuming that each keyword is indexed and saved in a file, it will take up more storage space. When the indexing actually occurs, it needs to be read into the memory (otherwise, the speed cannot be met), so the memory requirement is a key point. Simply put, what is the acceptable indexing time and storage and memory space for your project? It also includes your work platform, CPU model, main frequency, FLASH, SDRAM speed, etc. Ask an experienced friend to help identify whether it can be achieved. If it is far beyond the technical limit, it is meaningless to discuss the algorithm here.

hzhanhai

TO dthxman: If you want to create an index, it must have been created first and then stored in the file. Of course, your keywords are used to directly determine which part of the index data to read, which reduces the amount of data to read and the number of comparisons. TO slyzhang: Using space for speed can be done in this way, and I have also achieved it. The index is 1:1 in size after creation, but I am amazed when I see others optimize the storage space on this basis!~ TO shuiyan: This can definitely meet the requirements, because lenux mentioned that it can be achieved, and I also know that others have achieved it. So I have always wanted to know how to do it. Another point is that using pinyin search and keywords must establish the connection between Chinese characters and pinyin, so you only need to share one index.

shawkle2000

If you want to reduce the 1:1 index space further, you can try to introduce a "compression" algorithm, but I haven't studied it in detail. If you are the same, you have to learn and apply it from the beginning. Pinyin search and Chinese character keywords can be linked, but what if you want to add other keyword indexes? Because what you asked is "If I need to search a certain field in the Data structure (such as szTextSpell), I need to rebuild an index", then I can't consider the special relationship between Pinyin and keywords. It's better to use tree sorting honestly. Although it consumes some resources, it only needs to solve the speed problem.

How to implement fast search in embedded devices [Copy link]

Latest reply