您将如何使用正则表达式忽略包含特定子字符串的字符串? [英] How would you use a regular expression to ignore strings that contain a specific substring?

查看:29
本文介绍了您将如何使用正则表达式忽略包含特定子字符串的字符串?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将如何使用负向后视(或任何其他方法)正则表达式来忽略包含特定子字符串的字符串?

How would I go about using a negative lookbehind(or any other method) regular expression to ignore strings that contains a specific substring?

我已经阅读了之前的两个 stackoverflow 问题:
java-regexp-for-file-filtering
regex-to-match-against-something-那不是特定的子字符串

I've read two previous stackoverflow questions:
java-regexp-for-file-filtering
regex-to-match-against-something-that-is-not-a-specific-substring

它们几乎就是我想要的……我的问题是字符串没有以我想忽略的结尾.如果这样做,这将不是问题.

They are nearly what I want... my problem is the string doesn't end with what I want to ignore. If it did this would not be a problem.

我有一种感觉,这与以下事实有关:环视为零宽度并且在第二次通过字符串时匹配某些内容...但是,我不太确定内部结构.

I have a feeling this has to do with the fact that lookarounds are zero-width and something is matching on the second pass through the string... but, I'm none too sure of the internals.

无论如何,如果有人愿意花时间解释一下,我将不胜感激.

Anyway, if anyone is willing to take the time and explain it I will greatly appreciate it.

这是我想忽略的输入字符串的示例:

Here is an example of an input string that I want to ignore:

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET/FOO/BAR/HTTP/1.1" 200 2246

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET /FOO/BAR/ HTTP/1.1" 200 2246

这是我想保留以供进一步评估的输入字符串示例:

Here is an example of an input string that I want to keep for further evaluation:

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET/FOO/BAR/content.js HTTP/1.1" 200 2246

192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] "GET /FOO/BAR/content.js HTTP/1.1" 200 2246

对我来说,关键是我想忽略文档根默认页面之后的任何 HTTP GET.

The key for me is that I want to ignore any HTTP GET that is going after a document root default page.

以下是我的小测试工具和迄今为止我想出的最好的 RegEx.

Following is my little test harness and the best RegEx I've come up with so far.

public static void main(String[] args){
String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/1.1\" 200 2246";
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/content.js HTTP/"; // This works
//String inString = "192.168.1.10 - - [08/Feb/2009:16:33:54 -0800] \"GET /FOO/BAR/ HTTP/"; // This works
String inRegEx = "^.*(?:GET).*$(?<!.?/ HTTP/)";
try {
  Pattern pattern = Pattern.compile(inRegEx);

  Matcher matcher = pattern.matcher(inString);

  if (matcher.find()) {
    System.out.printf("I found the text \"%s\" starting at " +
"index %d and ending at index %d.%n",
matcher.group(), matcher.start(), matcher.end());
  } else {
    System.out.printf("No match found.%n");
  }
} catch (PatternSyntaxException pse) {
  System.out.println("Invalid RegEx: " + inRegEx);
  pse.printStackTrace();
}
}

推荐答案

你能匹配任何不以 / 结尾的路径吗

Could you just match any path that doesn't end with a /

String inRegEx = "^.* \"GET (.*[^/]) HTTP/.*$";

<小时>

这也可以使用负后视

String inRegEx = "^.* \"GET (.+)(?<!/) HTTP/.*$";

这里,(? 表示前面的序列必须匹配/".

Here, (?<!/) says "the preceding sequence must not match /".

这篇关于您将如何使用正则表达式忽略包含特定子字符串的字符串?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆