在字符串集合中搜索的最快方式 [英] Fastest way to search in a string collection

查看:147
本文介绍了在字符串集合中搜索的最快方式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

问题:



我有一个大约 120,000 个用户(字符串)的文本文件



搜索方法将在用户每次更改的文本时发生, TextBox ,结果应为 包含 字符串 TextBox / p>

我不需要更改列表,只需拉结果,并将它们放在 ListBox p>

我到目前为止所尝试的



容器,我从一个外部文本文件(当然是一次)转储字符串条目:


  1. 列表< string> allUsers;

  2. HashSet< string& allUsers;

使用以下 LINQ 查询:



allUsers.Where(item => item.Contains(textBox_search.Text)) .ToList();



我的搜索事件(当用户更改搜索文本时触发):

  private void textBox_search_TextChanged(object sender,EventArgs e)
{
if(textBox_search.Text.Length> 2)
{
listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text))。ToList();
}
else
{
listBox_choices.DataSource = null;
}
}

结果 p>

两者都给我一个很差的响应时间(每次按键之间大约1-3秒)。



问题:



您认为我的瓶颈在哪里?我使用的集合?搜索方法?

解决方案



如何获得更好的性能和更流畅的功能?

您可以考虑在后台线程中执行过滤任务,在完成后调用回调方法,或者在输入更改后重新启动过滤。



想法是能够使用它像这样:

  public partial class YourForm:Form 
{
private readonly BackgroundWordFilter _filter;

public YourForm()
{
InitializeComponent();

//设置后台worker返回不超过10个项目,
//并在结果准备好时设置ListBox.DataSource

_filter = new BackgroundWordFilter

items:GetDictionaryItems(),
maxItemsToMatch:10,
callback:results =>
this.Invoke(new Action(()=> listBox_choices .DataSource = results))
);
}

private void textBox_search_TextChanged(object sender,EventArgs e)
{
//这将更新后台worker的当前条目
_filter。 SetCurrentEntry(textBox_search.Text);
}
}

粗略草图如下:

  public class BackgroundWordFilter:IDisposable 
{
private readonly List< string& _items;
private readonly AutoResetEvent _signal = new AutoResetEvent(false);
private readonly Thread _workerThread;
private readonly int _maxItemsToMatch;
private readonly Action< List< string>> _回电话;

private volatile bool _shouldRun = true;
private volatile字符串_currentEntry = null;

public BackgroundWordFilter(
List< string> items,
int maxItemsToMatch,
Action< List< string>> callback)
{
_items = items;
_callback = callback;
_maxItemsToMatch = maxItemsToMatch;

//启动长寿命的背景线程
_workerThread = new Thread(WorkerLoop)
{
IsBackground = true,
Priority = ThreadPriority.BelowNormal
};

_workerThread.Start();
}

public void SetCurrentEntry(string currentEntry)
{
//设置当前条目并指示工作线程
_currentEntry = currentEntry;
_signal.Set();
}

void WorkerLoop()
{
while(_shouldRun)
{
//这里等待,直到有一个新的条目
_signal.WaitOne();
if(!_shouldRun)
return;

var entry = _currentEntry;
var results = new List< string>();

//如果没有要处理,
//返回一个空列表
if(string.IsNullOrEmpty(entry))
{
_callback (结果);
continue;
}

//在for循环中搜索
//允许在当前条目提前终止
//在不同的线程上更改
foreach(var i in _items)
{
//如果匹配,添加到结果列表
if(i.Contains(entry))
results.Add一世);

//检查当前条目是否被更新,
//或者我们找到了足够的项目
if(entry!= _currentEntry || results.Count> = _maxItemsToMatch)
break;
}

if(entry == _currentEntry)
_callback(results);
}
}

public void Dispose()
{
//我们使用AutoResetEvent和后台线程
//必须明确处理
Dispose(true);
}

private void Dispose(bool disposal)
{
if(!

//关闭线程
if(_workerThread.IsAlive)
{
_shouldRun = false;
_currentEntry = null;
_signal.Set();
_workerThread.Join();
}

//如果定位.NET 3.5或更高版本,我们必须
//使用显式IDisposable实现
(_signal as IDisposable).Dispose() ;
}
}

此外,你应该把 _filter 实例,当父 Form 处理时。这意味着您应该打开并编辑表单 Dispose 方法(在 YourForm。 Designer.cs file)看起来像:

  // insidexxxxxx.Designer.cs 
protected override void Dispose(bool disposal)
{
if(处置)
{
if(_filter!= null)
_filter.Dispose );

//此部分由Visual Studio设计器添加
if(components!= null)
components.Dispose();
}

base.Dispose(disposal);
}



在我的机器上,它的工作速度很快,所以你应该测试和之前要一个更复杂的解决方案。



话虽如此,一个更复杂的解决方案可能是将最后几个结果存储在字典中,然后只有过滤它们,如果结果是新的条目只有最后一个字符的第一个不同。


Problem:

I have a text file of around 120,000 users (strings) which I would like to store in a collection and later to perform a search on that collection.

The search method will occur every time the user change the text of a TextBox and the result should be the strings that contain the text in TextBox.

I don't have to change the list, just pull the results and put them in a ListBox.

What I've tried so far:

I tried with two different collections/containers, which I'm dumping the string entries from an external text file (once, of course):

  1. List<string> allUsers;
  2. HashSet<string> allUsers;

With the following LINQ query:

allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();

My search event (fires when user change the search text):

private void textBox_search_TextChanged(object sender, EventArgs e)
{
    if (textBox_search.Text.Length > 2)
    {
        listBox_choices.DataSource = allUsers.Where(item => item.Contains(textBox_search.Text)).ToList();
    }
    else
    {
        listBox_choices.DataSource = null;
    }
}

Results:

Both gave me a poor response time (around 1-3 seconds between each key press).

Question:

Where do you think my bottleneck is? The collection I've used? The search method? Both?

How can I get better performance and more fluent functionality?

解决方案

You could consider doing the filtering task on a background thread which would invoke a callback method when it's done, or simply restart filtering if input is changed.

The general idea is to be able to use it like this:

public partial class YourForm : Form
{
    private readonly BackgroundWordFilter _filter;

    public YourForm()
    {
        InitializeComponent();

        // setup the background worker to return no more than 10 items,
        // and to set ListBox.DataSource when results are ready

        _filter = new BackgroundWordFilter
        (
            items: GetDictionaryItems(),
            maxItemsToMatch: 10,
            callback: results => 
              this.Invoke(new Action(() => listBox_choices.DataSource = results))
        );
    }

    private void textBox_search_TextChanged(object sender, EventArgs e)
    {
        // this will update the background worker's "current entry"
        _filter.SetCurrentEntry(textBox_search.Text);
    }
}

A rough sketch would be something like:

public class BackgroundWordFilter : IDisposable
{
    private readonly List<string> _items;
    private readonly AutoResetEvent _signal = new AutoResetEvent(false);
    private readonly Thread _workerThread;
    private readonly int _maxItemsToMatch;
    private readonly Action<List<string>> _callback;

    private volatile bool _shouldRun = true;
    private volatile string _currentEntry = null;

    public BackgroundWordFilter(
        List<string> items,
        int maxItemsToMatch,
        Action<List<string>> callback)
    {
        _items = items;
        _callback = callback;
        _maxItemsToMatch = maxItemsToMatch;

        // start the long-lived backgroud thread
        _workerThread = new Thread(WorkerLoop)
        {
            IsBackground = true,
            Priority = ThreadPriority.BelowNormal
        };

        _workerThread.Start();
    }

    public void SetCurrentEntry(string currentEntry)
    {
        // set the current entry and signal the worker thread
        _currentEntry = currentEntry;
        _signal.Set();
    }

    void WorkerLoop()
    {
        while (_shouldRun)
        {
            // wait here until there is a new entry
            _signal.WaitOne();
            if (!_shouldRun)
                return;

            var entry = _currentEntry;
            var results = new List<string>();

            // if there is nothing to process,
            // return an empty list
            if (string.IsNullOrEmpty(entry))
            {
                _callback(results);
                continue;
            }

            // do the search in a for-loop to 
            // allow early termination when current entry
            // is changed on a different thread
            foreach (var i in _items)
            {
                // if matched, add to the list of results
                if (i.Contains(entry))
                    results.Add(i);

                // check if the current entry was updated in the meantime,
                // or we found enough items
                if (entry != _currentEntry || results.Count >= _maxItemsToMatch)
                    break;
            }

            if (entry == _currentEntry)
                _callback(results);
        }
    }

    public void Dispose()
    {
        // we are using AutoResetEvent and a background thread
        // and therefore must dispose it explicitly
        Dispose(true);
    }

    private void Dispose(bool disposing)
    {
        if (!disposing)
            return;

        // shutdown the thread
        if (_workerThread.IsAlive)
        {
            _shouldRun = false;
            _currentEntry = null;
            _signal.Set();
            _workerThread.Join();
        }

        // if targetting .NET 3.5 or older, we have to
        // use the explicit IDisposable implementation
        (_signal as IDisposable).Dispose();
    }
}

Also, you should actually dispose the _filter instance when the parent Form is disposed. This means you should open and edit your Form's Dispose method (inside the YourForm.Designer.cs file) to look something like:

// inside "xxxxxx.Designer.cs"
protected override void Dispose(bool disposing)
{
    if (disposing)
    {
        if (_filter != null)
            _filter.Dispose();

        // this part is added by Visual Studio designer
        if (components != null)
            components.Dispose();
    }

    base.Dispose(disposing);
}

On my machine, it works pretty quickly, so you should test and profile this before going for a more complex solution.

That being said, a "more complex solution" would possibly be to store the last couple of results in a dictionary, and then only filter them if it turns out that the new entry differs by only the first of last character.

这篇关于在字符串集合中搜索的最快方式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆