搜索大量大文本的最快方法 [英] fastest way to search huge list of big texts

查看:73
本文介绍了搜索大量大文本的最快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个用 C# 编写的 Windows 应用程序,它需要从数据库加载 250,000 行并提供键入时搜索"功能,这意味着只要用户在文本框中键入内容,应用程序就需要搜索所有 250,000记录(顺便说一句,单列,每行 1000 个字符)使用 like 搜索并显示找到的记录.

I have a windows application written in C# that needs to load load 250,000 rows from database and provide a "search as you type" feature which means as soon as user types something in a text box, the application needs to search all 250,000 records (which are btw, single column with 1000 characters each row) using like search and display the found records.

我采用的方法是:

1- 应用程序将所有记录加载到类型化的 List

1- The application loads all the records into a typed List<EmployeeData>

while (objSQLReader.Read())
{
    lstEmployees.Add(new EmployeesData(
        Convert.ToInt32(objSQLReader.GetString(0)),
        objSQLReader.GetString(1), 
        objSQLReader.GetString(2)));
}

2- 在 TextChanged 事件中,使用 LINQ,我搜索(结合正则表达式)并将 IEnumerable 附加到处于虚拟模式的 ListView.

2- In TextChanged event, Using LINQ, I search (with combination of Regular Expression) and attach the IEnumerable<EmployeesData> to a ListView which is in Virtual Mode.

String strPattern = "(?=.*wood*)(?=.*james*)";
    IEnumerable<EmployeesData> lstFoundItems = from objEmployee in lstEmployees
    where Regex.IsMatch(Employee.SearchStr, strPattern, RegexOptions.IgnoreCase)
    select objEmployee;
    lstFoundEmployees = lstFoundItems;

3- 处理 RetrieveVirtualItem 事件以在 ListView 中显示项目以显示项目.

3- RetrieveVirtualItem event is handled to display items in ListView to display the item.

e.Item = new ListViewItem(new String[] { 
    lstFoundEmployees.ElementAt(e.ItemIndex).DateProjectTaskClient, 
    e.ItemIndex.ToString() });

尽管 lstEmployees 从 SQL Server 加载列表的加载速度相对较快(1.5 秒),但要在 TextChanged 上进行搜索,使用 LINQ 进行搜索需要 7 多分钟.通过执行 LIKE 搜索直接通过 SQL Server 进行搜索只需不到 7 秒.

Though the lstEmployees is loaded relatively fast (1.5 seconds) for loading the list from SQL Server, to search on TextChanged, it takes more than 7 minutes to search using LINQ. Searching thru SQL Server directly by performing a LIKE search takes less than 7 seconds.

我在这里做错了什么?我怎样才能使这个搜索更快(不超过 2 秒)?这是我客户的要求.因此,任何帮助都受到高度赞赏.请帮助...

What am I doing wrong here? How can I make this search faster (not more 2 seconds)? This is a requirement from my client. So, any help is highly appreciated. Please Help...

推荐答案

存储文本数据的数据库列是否有索引?如果是这样,类似于 Nicholas 描述的 trie 结构正在使用.SQL Server 中的索引是使用 B+ 树 实现的,它具有a 以 n 的对数为底为 2 的平均搜索时间,其中 n 是树的高度.这意味着,如果表中有 250,000 条记录,则搜索所需的操作数是对数基数 2 ( 250,000 ) 或大约 18 次操作.

Does the database column that stores the text data have an index on it? If so, something similar to the trie structure that Nicholas described is already in use. Indexes in SQL Server are implemented using B+ trees, which have a an average search time on the order of log base 2 of n, where n is the height of the tree. This means that if you have 250,000 records in the table the number of operations required to search are log base 2 ( 250,000 ) or approximately 18 operations.

当您将所有信息加载到数据读取器中,然后使用 LINQ 表达式时,这是一个线性运算 (O) n,其中 n 是列表的长度.所以最坏的情况是 250,000 次操作.如果您使用 DataView,将有可用于帮助搜索的索引,这将大大提高性能.

When you load all of the information into a data reader and then use a LINQ expression it's a linear operation, (O) n, where n is the length of the list. So worst case, it's going to be 250,000 operations. If you use a DataView there will be indexes that can be used to help with searching, which will drastically improve performance.

在一天结束时,如果不会针对数据库服务器提交太多请求,则利用查询优化器来执行此操作.只要 LIKE 操作没有在字符串的前面使用通配符(即 LIKE %some_string)(否定使用索引)并且表上有一个索引,你就会有非常快的性能.如果有太多请求将提交到数据库服务器,要么将所有信息放入 DataView 以便索引可以使用,要么使用上面 Tim 建议的字典,它的搜索时间为 O(1)(以 1 为单位),假设字典是使用哈希表实现的.

At the end of the day if there will not be too many requests submitted against the database server leverage the query optimizer to do this. As long as the LIKE operation isn't performed with a wildcard at the front of the string (i.e. LIKE %some_string) (negates the use of an index) and there is an index on the table you will have really fast performance. If there are just too many requests that will be submitted to the database server, either put all of the information into a DataView so an index can be used, or use a dictionary as Tim suggested above, which has a search time of O(1) (on the order of one), assuming the dictionary is implemented using a hash table.

这篇关于搜索大量大文本的最快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆