如何在.txt文件的目录中搜索特定的字符串(单词或短语) [英] How to search a directory of .txt files for specific string (word or phrase)

查看:170
本文介绍了如何在.txt文件的目录中搜索特定的字符串(单词或短语)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我不擅长编码 - 在vb.net中,我需要搜索近3000个.txt文件的一个目录(c:\ K_txt),搜索用户输入的单词或精确短语,然后使用包含单词或短语的那些txt文件的名称加载列表框。然后,当单击列表框中的文件时,它将被加载到文本文件中,并突出显示单词或短语的每个实例。谢谢。



我尝试了什么:



我试过什么都没有,除了试图找到执行此操作的代码,因为它似乎是一个很可能已被写过多次的例程,我正在收集这个或那个方面的片段,以便我可以将各个部分组合在一起,但是从那以后有很多文件可以通过(3000)我知道我在一起运行的任何例程都会很慢而且效率低下。我有两本关于vb编程的书,但没有任何帮助。谢谢

I am not good with coding - in vb.net, I need to search one directory (c:\K_txt) of almost 3,000 .txt files for either a word or exact phrase that the user enters, and then load up a listbox with the names of those txt files that contain the word or phrase. Then when the file in the listbox is clicked it will be loaded into a text file with each instance of the word or phrase highlighted. Thank you.

What I have tried:

I have tried nothing, except trying to find the code that does this as it seems like a routine that will likely have been written many many times, I am collecting snippets of this or that aspect, so that I might be able to fit pieces together, but since there are so many files to through (3000) I know that any routine that I kludge together will be slow and inefficient. I have two books on programming in vb, but nothing to help with this. Thank you

推荐答案

Quote:

我需要搜索一个目录(c: \ K_txt)近3,000个.txt文件

I need to search one directory (c:\K_txt) of almost 3,000 .txt files

动态搜索这个是一个非常糟糕的主意,并且必须让用户每次在查询中进行更改时都等待。一种好的方法是读取文件一次,然后在文件中创建令牌(单词,英文)。这将告诉你的算法哪个文件包含哪些单词 - 你可以将它专门用于获得句子,句点分开让我们说



这将帮助您搜索自己数据结构中的单词;树,特里,堆,你选。这将帮助您的用户轻松检查哪些文件可用于哪些文件,因为现在您的应用程序只需要转到您自己的数据结构,而不是再次遍历文件系统。



文件系统只会遍历一次。您的结构将以有序和搜索友好的方式包含数据。



Searching this on the fly is a really very bad idea, and would have to make your users wait each time they make a change in the query. A good approach would be to read your files once, and create tokens (words, in English) in the files. This will tell your algorithm which file contains which words—you can specialize this into getting sentences, period separated let's say.

This will help you search for the words in your own data structure; a tree, trie, heap, you pick. This will help your users easily check which words are available in which files, because now your application will only have to go to your own data structure, instead of traversing the file system once again.

File system will be traversed once, only. Your structure will contain the data in an ordered and search-friendly way.

引用:

用户输入的单词或完整短语,

either a word or exact phrase that the user enters,

正是我的观点,当用户想要搜索文件并输入fole时会发生什么,你的算法将在目录中搜索fole,然后在遍历目录之后搜索file。不是一个好方法,你需要一个替代方案。其中一种方法是使用MapReduce,在这种方法中,您将逐个读取文件,计算存在的总字数及其出现次数。然后,您可以将此结果提供给您自己的结构并查询,以获得您正在考虑的方法的更好方法。



请参阅以下链接并从那里学到一些东西,

mapreduce - Hadoop从另一个文件中的一个文件中搜索单词 - Stack Overflow [ ^ ]

算法 - Hadoop MapReduce字数统计示例 - 计算机科学堆栈交换 [ ^ ](你可以称之为,单词查找)

Exactly my point, what happens when user wanted to search for "file" and entered "fole", your algorithm would be searching for "fole" in the directory, and then for "file" after it has traversed directory once. Not a good approach, and you need an alternate. One of such approaches is with MapReduce, in this approach you will be reading the files one by one, counting the overall words that exist and their number of occurrences. You can then feed this result in your own structure and query that, for a really better approach that the approach you are considering.

See the following links and learn something from there,
mapreduce - Hadoop searching words from one file in another file - Stack Overflow[^]
algorithms - Hadoop MapReduce Word Counting Example - Computer Science Stack Exchange[^] (You can call it, word finding)

引用:

然后加载一个列表框,其中包含包含单词或的单词的txt文件的名称短语

then load up a listbox with the names of those txt files that contain the word or phrase

您的结构将返回所有内容他们需要,它将知道哪些文件包含文件,并将它们返回到列表中 - 或者您已指定。

Your structure will return everything they need, it will know which files contain "file", and will return them in a list—or however you have specified.

引用:

一个文本文件,每个单词或短语的实例都会突出显示

a text file with each instance of the word or phrase highlighted

这取决于应用程序框架,我会在这里留下你的。 :-)



祝您好运。

That depends on the app framework, and I will leave you with that here. :-)

Good luck.


您可以调整此CodeProject文章中使用的技术 C#中的文件搜索者 [ ^ ]

或者这个 WinSearchFile:如何在你的电脑上搜索文件 [ ^ ]



或此处讨论的技术 .net - c #Fastest string搜索所有文件 - Stack Overflow [ ^ ]
You could adapt the technique used in this CodeProject article File Searcher in C#[^]
or this one WinSearchFile: how to search files on your PC[^]

Or the techniques discussed here .net - c# Fastest string search in all files - Stack Overflow[^]


请在此处查看答案:如何从VB.NET中搜索特定字符串到文本文件 [ ^ ]

这里:VB.Net 2005:搜索特定单词目录中的文本文件-VBForums [ ^ ]
See answer here: How to search specific string into text file from VB.NET[^]
And here: VB.Net 2005: Search specific word in text file within a directory-VBForums[^]


这篇关于如何在.txt文件的目录中搜索特定的字符串(单词或短语)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆