4201 views|10 replies

63

Posts

0

Resources
The OP

How to implement fast search in embedded devices [Copy link]

Prerequisite: embedded device, CPU400, each record content is tagData, there are 1 million or more records, requirement: how to complete the search within 1 second according to the keyword search szText field, and return the records that meet the requirements. No database involved, the data is stored in the file itself. typedef struct tagData{ int iTextLen; int iOtherLen; char* szText; //Keyword char* szTextSpell; //Pinyin char* szDetails; }Data; I stored the data in sequence according to Data, and built an index table according to szText (mostly 1:1). Although the speed is satisfied, it also leads to new problems. If I want to search a field in the Data structure (such as szTextSpell), I need to rebuild an index, so the capacity will definitely not meet the requirements. I have thought about it for a long time, but I still can't get a more perfect method. If any prawn has come into contact with this problem, please guide me, or give me an idea, I will be very grateful.


This post is from Embedded System

Latest reply

If you want to reduce the 1:1 index space further, you can try to introduce a "compression" algorithm, but I haven't studied it in detail. If you are the same, you have to learn and apply it from the beginning. Pinyin search and Chinese character keywords can be linked, but what if you want to add other keyword indexes? Because what you asked is "If I need to search a certain field in the Data structure (such as szTextSpell), I need to rebuild an index", then I can't consider the special relationship between Pinyin and keywords. It's better to use tree sorting honestly. Although it consumes some resources, it only needs to solve the speed problem.  Details Published on 2008-6-25 22:27

70

Posts

0

Resources
2
Let's talk about how the first index table is built. According to the current situation, we can only optimize the first index table to put down multiple index tables. Generally speaking, it is not necessary to build a 1:1 index table.
This post is from Embedded System

86

Posts

0

Resources
3
to 91program: I have seen others do fast retrieval, and their data volume is the same as my real data. I am sure they have also done indexing, but it seems to have optimized the real data after indexing. Of course, their indexing method is definitely better than mine, at least the data volume is smaller than mine. Since someone can do it, there must be a way. I am stupid and have never thought of a better way, so I hope someone can help "crack" it.
This post is from Embedded System

67

Posts

0

Resources
4
This is a problem that combines data structure and optimization algorithm. It is challenging. This is the first time I encounter it. Thinking...
This post is from Embedded System

73

Posts

0

Resources
5
typedef struct tagData{ int iTextLen; int iOtherLen; char* szText; //Keyword char* szTextSpell; //Pinyin char* szDetails; }Data; When my colleague did sorting before, the structure was fixed, there was no pointer type. All were limited in size. For example, char szText[20]; All were processed in BYTE format. I am not sure how it was done. It seems that tree sorting was used. The speed is very fast. (Used in GPS navigation to retrieve national data)
This post is from Embedded System

66

Posts

0

Resources
6
I'd like to make a suggestion and let's discuss it. Some principles of database must be used here. Since you used szText as the keyword when you created the index for the first time, you can also use szText and szTextspell as the primary key to create the index as needed, and then search. The speed will definitely be better than rebuilding an index table. Of course, this method is static after all and doesn't make sense. There is another way, which is to dynamically modify the index table. My idea is to first add two items to your structure typedef struct tagData{ int iTextLen; int iOtherLen; char* szText; //Keyword char* szTextSpell;//Pinyin char* szDetails; Data *front; Date *next; }Data; This is equivalent to adding two pointers, so that a linked list can be created in the memory. Using a linked list to search is definitely faster than reading the file directly. Then create an index. Create an index based on the keyword. When the keyword changes, you can add, modify, and delete it from the original index table. These steps are similar to GROUP in database operations BY operation, modify and refresh the index table as needed at any time.
This post is from Embedded System

69

Posts

0

Resources
7
CPU400, a total of 1 million or more records, complete the search keyword char* Text within 1 second. This task is quite severe. Excluding the time of reading data, 1 million records are assumed to be in memory, and it is just a memory search. If the Text can be listed according to each letter, then if your Text is 6 bits long, there is no need to traverse, just take it directly from the linked list, and then take the next letter from the linked list. This will become a fork tree. This should meet the requirements. However, considering the speed of reading so much data, it may be necessary to separate these index information from the real data. As for how to get the data directly, it depends on your own way. It's just an idea... I hope it will be useful to you.

This post is from Embedded System

54

Posts

0

Resources
8
No matter how to manage 1 million data, there are at least 1 million records after all. The fastest way is to change more to less, change big to small, change traversal to direct retrieval, or the least retrieval operation. Then the space will be wasted and you must use a method to exchange space for speed. Then you will need to do more in managing and maintaining data. Regarding sorting, you may need to compromise here, and maybe only partial sorting can be performed. See if others have good ideas...
This post is from Embedded System

74

Posts

0

Resources
9
Space for time, time for space. It is difficult to balance the two. Assuming that each keyword is indexed and saved in a file, it will take up more storage space. When the indexing actually occurs, it needs to be read into the memory (otherwise, the speed cannot be met), so the memory requirement is a key point. Simply put, what is the acceptable indexing time and storage and memory space for your project? It also includes your work platform, CPU model, main frequency, FLASH, SDRAM speed, etc. Ask an experienced friend to help identify whether it can be achieved. If it is far beyond the technical limit, it is meaningless to discuss the algorithm here.
This post is from Embedded System

80

Posts

0

Resources
10
TO dthxman: If you want to create an index, it must have been created first and then stored in the file. Of course, your keywords are used to directly determine which part of the index data to read, which reduces the amount of data to read and the number of comparisons. TO slyzhang: Using space for speed can be done in this way, and I have also achieved it. The index is 1:1 in size after creation, but I am amazed when I see others optimize the storage space on this basis!~ TO shuiyan: This can definitely meet the requirements, because lenux mentioned that it can be achieved, and I also know that others have achieved it. So I have always wanted to know how to do it. Another point is that using pinyin search and keywords must establish the connection between Chinese characters and pinyin, so you only need to share one index.

  

This post is from Embedded System

78

Posts

0

Resources
11
If you want to reduce the 1:1 index space further, you can try to introduce a "compression" algorithm, but I haven't studied it in detail. If you are the same, you have to learn and apply it from the beginning. Pinyin search and Chinese character keywords can be linked, but what if you want to add other keyword indexes? Because what you asked is "If I need to search a certain field in the Data structure (such as szTextSpell), I need to rebuild an index", then I can't consider the special relationship between Pinyin and keywords. It's better to use tree sorting honestly. Although it consumes some resources, it only needs to solve the speed problem.
This post is from Embedded System

Guess Your Favourite
Find a datasheet?

EEWorld Datasheet Technical Support

Related articles more>>

    EEWorld
    subscription
    account

    EEWorld
    service
    account

    Automotive
    development
    circle

    Robot
    development
    community

    Copyright © 2005-2025 EEWORLD.com.cn, Inc. All rights reserved 京B2-20211791 京ICP备10001474号-1 电信业务审批[2006]字第258号函 京公网安备 11010802033920号
    快速回复 返回顶部 Return list