在字符串集合中搜索的最快方式 [英] Fastest way to search in a string collection
问题描述
问题:
我有一个大约 120,000 个用户(字符串)的文本文件
搜索方法将在用户每次更改的文本时发生, TextBox
,结果应为 包含 字符串 TextBox
/ p>
我不需要更改列表,只需拉结果,并将它们放在 ListBox
p>
我到目前为止所尝试的:
容器,我从一个外部文本文件(当然是一次)转储字符串条目:
-
列表< string> allUsers;
-
HashSet< string& allUsers;
使用以下 LINQ 查询:
allUsers.Where(item => item.Contains(textBox_search.Text)) .ToList();
我的搜索事件(当用户更改搜索文本时触发):
private void textBox_search_TextChanged(object sender,EventArgs e)
{
if(textBox_search.Text.Length> 2)
{
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text))。ToList();
}
else
{
listBox_choices.DataSource = null;
}
}
结果 p>
两者都给我一个很差的响应时间(每次按键之间大约1-3秒)。
问题:
您认为我的瓶颈在哪里?我使用的集合?搜索方法?
如何获得更好的性能和更流畅的功能?
您可以考虑在后台线程中执行过滤任务,在完成后调用回调方法,或者在输入更改后重新启动过滤。
想法是能够使用它像这样:
public partial class YourForm:Form
{
private readonly BackgroundWordFilter _filter;
public YourForm()
{
InitializeComponent();
//设置后台worker返回不超过10个项目,
//并在结果准备好时设置ListBox.DataSource
_filter = new BackgroundWordFilter
(
items:GetDictionaryItems(),
maxItemsToMatch:10,
callback:results =>
this.Invoke(new Action(()=> listBox_choices .DataSource = results))
);
}
private void textBox_search_TextChanged(object sender,EventArgs e)
{
//这将更新后台worker的当前条目
_filter。 SetCurrentEntry(textBox_search.Text);
}
}
粗略草图如下:
public class BackgroundWordFilter:IDisposable
{
private readonly List< string& _items;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
private readonly Thread _workerThread;
private readonly int _maxItemsToMatch;
private readonly Action< List< string>> _回电话;
private volatile bool _shouldRun = true;
private volatile字符串_currentEntry = null;
public BackgroundWordFilter(
List< string> items,
int maxItemsToMatch,
Action< List< string>> callback)
{
_items = items;
_callback = callback;
_maxItemsToMatch = maxItemsToMatch;
//启动长寿命的背景线程
_workerThread = new Thread(WorkerLoop)
{
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};
_workerThread.Start();
}
public void SetCurrentEntry(string currentEntry)
{
//设置当前条目并指示工作线程
_currentEntry = currentEntry;
_signal.Set();
}
void WorkerLoop()
{
while(_shouldRun)
{
//这里等待,直到有一个新的条目
_signal.WaitOne();
if(!_shouldRun)
return;
var entry = _currentEntry;
var results = new List< string>();
//如果没有要处理,
//返回一个空列表
if(string.IsNullOrEmpty(entry))
{
_callback (结果);
continue;
}
//在for循环中搜索
//允许在当前条目提前终止
//在不同的线程上更改
foreach(var i in _items)
{
//如果匹配,添加到结果列表
if(i.Contains(entry))
results.Add一世);
//检查当前条目是否被更新,
//或者我们找到了足够的项目
if(entry!= _currentEntry || results.Count> = _maxItemsToMatch)
break;
}
if(entry == _currentEntry)
_callback(results);
}
}
public void Dispose()
{
//我们使用AutoResetEvent和后台线程
//必须明确处理
Dispose(true);
}
private void Dispose(bool disposal)
{
if(!
//关闭线程
if(_workerThread.IsAlive)
{
_shouldRun = false;
_currentEntry = null;
_signal.Set();
_workerThread.Join();
}
//如果定位.NET 3.5或更高版本,我们必须
//使用显式IDisposable实现
(_signal as IDisposable).Dispose() ;
}
}
此外,你应该把 _filter
实例,当父 Form
处理时。这意味着您应该打开并编辑表单
的 Dispose
方法(在 YourForm。 Designer.cs
file)看起来像:
// insidexxxxxx.Designer.cs
protected override void Dispose(bool disposal)
{
if(处置)
{
if(_filter!= null)
_filter.Dispose );
//此部分由Visual Studio设计器添加
if(components!= null)
components.Dispose();
}
base.Dispose(disposal);
}
在我的机器上,它的工作速度很快,所以你应该测试和之前要一个更复杂的解决方案。
话虽如此,一个更复杂的解决方案可能是将最后几个结果存储在字典中,然后只有过滤它们,如果结果是新的条目只有最后一个字符的第一个不同。
Problem:
I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection.
The search method will occur every time the user change the text of a TextBox
and the result should be the strings that contain the text in TextBox
.
I don't have to change the list, just pull the results and put them in a ListBox
.
What I've tried so far:
I tried with two different collections/containers, which I'm dumping the string entries from an external text file (once, of course):
List<string> allUsers;
HashSet<string> allUsers;
With the following LINQ query:
allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
My search event (fires when user change the search text):
private void textBox_search_TextChanged(object sender, EventArgs e)
{
if (textBox_search.Text.Length > 2)
{
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
}
else
{
listBox_choices.DataSource = null;
}
}
Results:
Both gave me a poor response time (around 1-3 seconds between each key press).
Question:
Where do you think my bottleneck is? The collection I've used? The search method? Both?
How can I get better performance and more fluent functionality?
You could consider doing the filtering task on a background thread which would invoke a callback method when it's done, or simply restart filtering if input is changed.
The general idea is to be able to use it like this:
public partial class YourForm : Form
{
private readonly BackgroundWordFilter _filter;
public YourForm()
{
InitializeComponent();
// setup the background worker to return no more than 10 items,
// and to set ListBox.DataSource when results are ready
_filter = new BackgroundWordFilter
(
items: GetDictionaryItems(),
maxItemsToMatch: 10,
callback: results =>
this.Invoke(new Action(() => listBox_choices.DataSource = results))
);
}
private void textBox_search_TextChanged(object sender, EventArgs e)
{
// this will update the background worker's "current entry"
_filter.SetCurrentEntry(textBox_search.Text);
}
}
A rough sketch would be something like:
public class BackgroundWordFilter : IDisposable
{
private readonly List<string> _items;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
private readonly Thread _workerThread;
private readonly int _maxItemsToMatch;
private readonly Action<List<string>> _callback;
private volatile bool _shouldRun = true;
private volatile string _currentEntry = null;
public BackgroundWordFilter(
List<string> items,
int maxItemsToMatch,
Action<List<string>> callback)
{
_items = items;
_callback = callback;
_maxItemsToMatch = maxItemsToMatch;
// start the long-lived backgroud thread
_workerThread = new Thread(WorkerLoop)
{
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};
_workerThread.Start();
}
public void SetCurrentEntry(string currentEntry)
{
// set the current entry and signal the worker thread
_currentEntry = currentEntry;
_signal.Set();
}
void WorkerLoop()
{
while (_shouldRun)
{
// wait here until there is a new entry
_signal.WaitOne();
if (!_shouldRun)
return;
var entry = _currentEntry;
var results = new List<string>();
// if there is nothing to process,
// return an empty list
if (string.IsNullOrEmpty(entry))
{
_callback(results);
continue;
}
// do the search in a for-loop to
// allow early termination when current entry
// is changed on a different thread
foreach (var i in _items)
{
// if matched, add to the list of results
if (i.Contains(entry))
results.Add(i);
// check if the current entry was updated in the meantime,
// or we found enough items
if (entry != _currentEntry || results.Count >= _maxItemsToMatch)
break;
}
if (entry == _currentEntry)
_callback(results);
}
}
public void Dispose()
{
// we are using AutoResetEvent and a background thread
// and therefore must dispose it explicitly
Dispose(true);
}
private void Dispose(bool disposing)
{
if (!disposing)
return;
// shutdown the thread
if (_workerThread.IsAlive)
{
_shouldRun = false;
_currentEntry = null;
_signal.Set();
_workerThread.Join();
}
// if targetting .NET 3.5 or older, we have to
// use the explicit IDisposable implementation
(_signal as IDisposable).Dispose();
}
}
Also, you should actually dispose the _filter
instance when the parent Form
is disposed. This means you should open and edit your Form
's Dispose
method (inside the YourForm.Designer.cs
file) to look something like:
// inside "xxxxxx.Designer.cs"
protected override void Dispose(bool disposing)
{
if (disposing)
{
if (_filter != null)
_filter.Dispose();
// this part is added by Visual Studio designer
if (components != null)
components.Dispose();
}
base.Dispose(disposing);
}
On my machine, it works pretty quickly, so you should test and profile this before going for a more complex solution.
That being said, a "more complex solution" would possibly be to store the last couple of results in a dictionary, and then only filter them if it turns out that the new entry differs by only the first of last character.
这篇关于在字符串集合中搜索的最快方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!