找到所有字符串“the”在.txt文件中 [英] Find all string "the" in .txt file

查看:148
本文介绍了找到所有字符串“the”在.txt文件中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我的代码:

// Import io so we can use file objects
import java.io.*;

public class SearchThe {
    public static void main(String args[]) {
        try {
            String stringSearch = "the";
            // Open the file c:\test.txt as a buffered reader
            BufferedReader bf = new BufferedReader(new FileReader("test.txt"));

            // Start a line count and declare a string to hold our current line.
            int linecount = 0;
                String line;

            // Let the user know what we are searching for
            System.out.println("Searching for " + stringSearch + " in file...");

            // Loop through each line, stashing the line into our line variable.
            while (( line = bf.readLine()) != null){
                // Increment the count and find the index of the word
                linecount++;
                int indexfound = line.indexOf(stringSearch);

                // If greater than -1, means we found the word
                if (indexfound > -1) {
                    System.out.println("Word was found at position " + indexfound + " on line " + linecount);
                }
            }

            // Close the file after done searching
            bf.close();
        }
        catch (IOException e) {
            System.out.println("IO Error Occurred: " + e.toString());
        }
    }
}

我想找一些字test.txt文件中的the。问题是,当我找到第一个the时,我的程序停止查找更多内容。

I want to find some word "the" in test.txt file. The problem is when I found the first "the", my program stops finding more.

当某些单词如我的程序将其理解为单词

And when some word like "then" my program understand it as the word "the".

推荐答案

使用Regexes不区分大小写,使用单词边界查找the的所有实例和变体。

Use Regexes case insensitively, with word boundaries to find all instances and variations of "the".

indexOf(the)无法辨别the然后,因为每个都以the开头。同样,the位于anathema的中间。

indexOf("the") can not discern between "the" and "then" since each starts with "the". Likewise, "the" is found in the middle of "anathema".

为避免这种情况,请使用正则表达式并搜索the ,两边都有单词边界( \b )。使用单词边界,而不是分裂,或只使用 indexOf(the)(两边的空格),它们找不到the。 和标点符号旁边的其他实例。您也可以对搜索案例不敏感地查找The

To avoid this, use regexes, and search for "the", with word boundaries (\b) on either side. Use word boundaries, instead of splitting on " ", or using just indexOf(" the ") (spaces on either side) which would not find "the." and other instances next to punctuation. You can also do your search case insensitively to find "The" as well.

Pattern p = Pattern.compile("\\bthe\\b", Pattern.CASE_INSENSITIVE);

while ( (line = bf.readLine()) != null) {
    linecount++;

    Matcher m = p.matcher(line);

    // indicate all matches on the line
    while (m.find()) {
        System.out.println("Word was found at position " + 
                       m.start() + " on line " + linecount);
    }
}

这篇关于找到所有字符串“the”在.txt文件中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆