边界匹配器正则表达式(\b)下面的代码片段问题 [英] Issue with below snippet on boundary matchers regex (\b)

查看:131
本文介绍了边界匹配器正则表达式(\b)下面的代码片段问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的输入:

 1. end 
 2. end of the day or end of the week 
 3. endline
 4. something 
 5. "something" end

根据上述讨论,如果我尝试使用此代码段替换单个字符串,则会成功删除该行中的相应字词

Based on the above discussions, If I try to replace a single string using this snippet, it removes the appropriate words from the line successfully

public class DeleteTest {

    public static void main(String[] args) {

        // TODO Auto-generated method stub
        try {
        File file = new File("C:/Java samples/myfile.txt");
        File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
        String delete="end";
        BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
        PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

        for (String line; (line = reader.readLine()) != null;) {
            line = line.replaceAll("\\b"+delete+"\\b", "");
       writer.println(line);
        }
        reader.close();
        writer.close();
        }
        catch (Exception e) {
            System.out.println("Something went Wrong");
        }
    }
}

我的输出如果我使用上面的片段:(也是我预期的输出)

My output If I use the above snippet:(Also my expected output)

 1.  
 2. of the day or of the week
 3. endline
 4. something
 5. "something"

但是当我包含更多要删除的单词时,为了这个目的,当我使用Set时,我使用下面的代码片段:

But when I include more words to delete, and for that purpose when I use Set, I use the below code snippet:

public static void main(String[] args) {

    // TODO Auto-generated method stub
    try {

    File file = new File("C:/Java samples/myfile.txt");
    File temp = File.createTempFile("myfile1", ".txt", file.getParentFile());
    BufferedReader reader = new BufferedReader(new InputStreamReader(new FileInputStream(file)));
    PrintWriter writer = new PrintWriter(new OutputStreamWriter(new FileOutputStream(temp)));

        Set<String> toDelete = new HashSet<>();
        toDelete.add("end");
        toDelete.add("something");

    for (String line; (line = reader.readLine()) != null;) {
        line = line.replaceAll("\\b"+toDelete+"\\b", "");
    writer.println(line);
    }
    reader.close();
    writer.close();
    }
    catch (Exception e) {
        System.out.println("Something went Wrong");
    }
}

我得到的输出为:(它只是删除了空格)

I get my output as: (It just removes the space)

 1. end
 2. endofthedayorendoftheweek
 3. endline
 4. something
 5. "something" end 

你们可以帮助我吗?

点击此处关注主题

推荐答案

您需要创建 轮换组 超出了

You need to create an alternation group out of the set with

String.join("|", toDelete)

并用作

line = line.replaceAll("\\b(?:"+String.join("|", toDelete)+")\\b", "");

模式看起来像

\b(?:end|something)\b

请参阅正则表达式演示。在这里,(?:...)是一个非捕获组,用于分组几个备选方案而不用为捕获创建内存缓冲区(因为删除匹配项后不需要它)。

See the regex demo. Here, (?:...) is a non-capturing group that is used to group several alternatives without creating a memory buffer for the capture (you do not need it since you remove the matches).

或者,更好的是,在进入循环之前编译正则表达式:

Or, better, compile the regex before entering the loop:

Pattern pat = Pattern.compile("\\b(?:" + String.join("|", toDelete) + ")\\b");
...
    line = pat.matcher(line).replaceAll("");

更新

要允许匹配可能包含特殊字符的整个单词,您需要 Pattern.quote 这些单词以逃避这些特殊字符,然后您需要使用明确的字符字边界,(?<!\w)而不是初始 \b 以确保有之前没有单词char和(?!\w)否定前瞻性而非最终 \b 以确保比赛结束后没有单词char。

To allow matching whole "words" that may contain special chars, you need to Pattern.quote those words to escape those special chars, and then you need to use unambiguous word boundaries, (?<!\w) instead of the initial \b to make sure there is no word char before and (?!\w) negative lookahead instead of the final \b to make sure there is no word char after the match.

在Java 8中,您可以使用以下代码:

In Java 8, you may use this code:

Set<String> nToDel = new HashSet<>();
nToDel = toDelete.stream()
    .map(Pattern::quote)
    .collect(Collectors.toCollection(HashSet::new));
String pattern = "(?<!\\w)(?:" + String.join("|", nToDel) + ")(?!\\w)";

正则表达式看起来像 (?<!\w)(?:\ Q + end\E | \ Qsomething-\ E)(?!\ w)的 。请注意, \Q \ E 之间的符号被解析为文字符号

The regex will look like (?<!\w)(?:\Q+end\E|\Qsomething-\E)(?!\w). Note that the symbols between \Q and \E is parsed as literal symbols.

这篇关于边界匹配器正则表达式(\b)下面的代码片段问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆