在c#win中搜索大数据的最佳容器。形成 [英] Best container for searching on large data in c# win. form

查看:121
本文介绍了在c#win中搜索大数据的最佳容器。形成的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

有人可以建议我一个容器: -



1)可以在最短的时间内搜索大数据并占用最少的内存

2)也可以用其内容搜索数据



直到现在我正在存储我的字符串数组中的数据,其元素数为1,000,000,000





提前谢谢

Can somebody suggest me a container:-

1) which can search on large data in least amount of time and taking least amount of memory
2) can also search data with its content .

Till now i am storing my data in string array whose element count is 1,000,000,000


Thanks in advance

推荐答案

这不是我们可以回答的问题:它在很大程度上取决于您的数据,如何组织数据以及您要查找的内容。请记住,我们无法看到您的屏幕,访问您的硬盘或读取您的想法!



如果您正在寻找整个字符串,那么这相对简单:你可以使用某种形式的哈希表,或者排序列表和二进制文章来找到你需要的那个。



如果你正在寻找一个子串或子串,然后那是一个全新的水壶,你可能想开始索引单词或类似的东西,并使用这些索引来减少你需要做的比较次数。



但坦率地说,有很多有效的方法可以做到这一点,因为有不同的可能性,你想要做什么!

所以请坐下来,更详细地考虑一下你的数据和你的任务,并试着弄清楚你想要做什么 - 从那你至少可以得到一些想法让你问一个更具体的问题。在我们提供帮助之前,它需要更加具体!
This isn't a question we can answer: it depends far too much on your data, how you have it organised, and what you are trying to find. And remember that we can't see your screen, access your HDD, or read your mind!

If you are looking for the whole string, then that's relatively simple: you could use a hash table of some form, or a sorted list and binary chop to find the one you need.

If you are looking for a substring or substrings, then that's a whole new kettle of fish, and you probably want to start indexing words, or similar, and using those indexes to cut down on the number of comparisons you need to do.

But frankly, there are as many efficient ways to do this as there are different possibilities for exactly what you want to do!
So sit back, think about your data and your task in more detail, and try to work out exactly what you are trying to do - from that you might at least get some ideas to let you ask a more specific question. And it needs to be a lot more specific before we can help!


从你的问题来看,你似乎有1 000 000 000个字符串,这真的很多。将这么多数据一次加载到内存中没有多大意义。



因此,您应该同时使用部分数据,因此只需要一小部分数据数据将在内存中。



如果你想对大量数据进行大量搜索,那么你需要构建一些索引。否则,每次需要搜索都需要很长时间。



或者,也许可以使用Windows索引。我从来没有看过那边。只是一个想法。



顺便说一下,当我们不知道如何使用它时很难说出最佳解决方案(单一搜索,频繁搜索,修复数据或更改数据......)以及使用者。这还取决于你是否进行精确搜索,全字搜索,通配符搜索以及是否需要处理大小写或重音符号等。



最后,as作为开发人员,您通常必须尝试一些方法来查看它们的比较方式以及它们是否足够。顺便问一下,你现在的代码有问题,或者你有没有尝试过。



另外,你可以考虑购买一些专门为此设计的东西,具体取决于预期用途。
From you question, it seems that you have 1 000 000 000 strings which is really a lot. It does not make much sense to load that much data in memory at once.

Thus you should works with a part of the data at once so only a fraction of the data would be in memory.

If you want to do a lot of search of that huge amount of data, then you need to build some indexing. Otherwise, it would take much time every time you need to do a search.

Or maybe, it is possible to use Windows indexing. I have never look on that side. Just an idea.

By the way, it is hard to tell best solution when we don't know how it will be used (single search, frequent search, fixed data or changing data...) and by who it will be used. It also depends if you do exact search, whole word search, wildcard search and if you need to handle upper/lower case or accents and the like.

Finally, as a developer, you typically have to try some approches to see how they compare and if they are adequate. By the way, do you have a problem with existing code or have you even try it.

Also, you might consider buying something designed for that depending on the intended use.


当处理大量字符串并需要它们编入索引时,到目前为止,我使用的最快和最有效的方法是Trie。在创建容错自动完成时,我必须尝试过六种结构,但到目前为止,最佳是特定的树类型。我后来发现Google也使用它来实现自动完成功能,所以我想我的公司很好。
When working with large numbers of strings and need them indexed, by far the fastest and most efficient method I've used is a Trie. I must have tried a half-dozen structures when creating an error-tolerant autocomplete, but by far the "best" was that particular tree type. I later found out that Google also implements their autocomplete using it, so I guess I'm in good company.


这篇关于在c#win中搜索大数据的最佳容器。形成的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆