Java - 在两个字符串之间获取所有字符串的最佳方法? (正则表达式?) [英] Java - Best way to grab ALL Strings between two Strings? (regex?)

查看:196
本文介绍了Java - 在两个字符串之间获取所有字符串的最佳方法? (正则表达式?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题一直困扰着我很长一段时间,但基本上我正在寻找最有效的方法来获取两个字符串之间的所有字符串。

This question has been bugging me for a long time now but essentially I'm looking for the most efficient way to grab all Strings between two Strings.

方式我已经做了好几个月了,现在是通过使用一堆临时索引,字符串,子串,它真的很乱。 (为什么Java没有本地方法,如 String substring(String start,String end)

The way I have been doing it for many months now is through using a bunch of temporary indices, strings, substrings, and it's really messy. (Why does Java not have a native method such as String substring(String start, String end)?

Say我有一个字符串:

Say I have a String:

abcabc [pattern1] foo [pattern2] abcdefg [pattern1] bar [pattern2] morestuff

最终目标是输出 foo bar 。(后来被添加到JList中)

The end goal would be to output foo and bar. (And later to be added into a JList)

我一直在尝试在 .split()中加入正则表达式但还没有成功。我已经尝试使用 * 的语法但是我不认为这是我的意图,特别是因为 .split()只需要一个参数来分割。

I've been trying to incorporate regex in .split() but haven't been successful. I've tried syntax using *'s and .'s but I don't think it's quite what my intention is especially since .split() only takes one argument to split against.

否则我认为另一种方法是使用Pattern和Matcher类?但我对相应的程序非常模糊。

Otherwise I think another way is to use the Pattern and Matcher classes? But I'm really fuzzy on the appropriate procedure.

推荐答案

您可以构造正则表达式来为您执行此操作:

You can construct the regex to do this for you:

// pattern1 and pattern2 are String objects
String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);

这将处理 pattern1 pattern2 literal 文本,模式之间的文本在第一个捕获组。你可以删除 Pattern.quote() 如果您想使用正则表达式,但如果您这样做,保证任何内容。

This will treat the pattern1 and pattern2 as literal text, and the text in between the patterns is captured in the first capturing group. You can remove Pattern.quote() if you want to use regex, but I don't guarantee anything if you do that.

您可以通过向 regexString 添加标记来添加一些自定义匹配方式。

You can add some customization of how the match should occurs by adding flags to the regexString.


  • 如果你想要支持Unicode的不区分大小写的匹配,那么在那里添加(?iu)开头 regexString ,或者提供 Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE 标记为 Pattern.compile 方法。

  • 如果你想捕获内容,即使两个分隔字符串出现在行之间,然后在(。*?)(?s) >,即(?s)(。*?),或提供 Pattern.DOTALL 标志为 Pattern.compile 方法。

  • If you want Unicode-aware case-insensitive matching, then add (?iu) at the beginning of regexString, or supply Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE flag to Pattern.compile method.
  • If you want to capture the content even if the two delimiting strings appear across lines, then add (?s) before (.*?), i.e. "(?s)(.*?)", or supply Pattern.DOTALL flag to Pattern.compile method.

然后编译正则表达式,获得 匹配器 对象,遍历匹配并将它们保存到列表(或任何收藏,取决于你)。

Then compile the regex, obtain a Matcher object, iterate through the matches and save them into a List (or any Collection, it's up to you).

Pattern pattern = Pattern.compile(regexString);
// text contains the full text that you want to extract data
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
  String textInBetween = matcher.group(1); // Since (.*?) is capturing group 1
  // You can insert match into a List/Collection here
}

测试代码:

String pattern1 = "hgb";
String pattern2 = "|";
String text = "sdfjsdkhfkjsdf hgb sdjfkhsdkfsdf |sdfjksdhfjksd sdf sdkjfhsdkf | sdkjfh hgb sdkjfdshfks|";

Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
Matcher m = p.matcher(text);
while (m.find()) {
  System.out.println(m.group(1));
}

请注意,如果您搜索之间的文字foo bar 在此输入 foo text foo文本栏文本栏中使用上述方法,你会得到一个匹配,这是 text foo text

Do note that if you search for the text between foo and bar in this input foo text foo text bar text bar with the method above, you will get one match, which is  text foo text .

这篇关于Java - 在两个字符串之间获取所有字符串的最佳方法? (正则表达式?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆