Java - 在两个字符串之间获取所有字符串的最佳方法?(正则表达式?) [英] Java - Best way to grab ALL Strings between two Strings? (regex?)

查看:31
本文介绍了Java - 在两个字符串之间获取所有字符串的最佳方法?(正则表达式?)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这个问题已经困扰我很长时间了,但基本上我正在寻找最有效的方法来获取两个字符串之间的所有字符串.

This question has been bugging me for a long time now but essentially I'm looking for the most efficient way to grab all Strings between two Strings.

我几个月来一直这样做的方式是使用一堆临时索引、字符串、子字符串,这真的很混乱.(为什么Java没有String substring(String start, String end)这样的原生方法?

The way I have been doing it for many months now is through using a bunch of temporary indices, strings, substrings, and it's really messy. (Why does Java not have a native method such as String substring(String start, String end)?

假设我有一个字符串:

abcabc [pattern1]foo[pattern2] abcdefg [pattern1]bar[pattern2] morestuff

最终目标是输出 foobar.(后来添加到 JList 中)

The end goal would be to output foo and bar. (And later to be added into a JList)

我一直试图在 .split() 中加入正则表达式,但没有成功.我已经尝试过使用 *. 的语法,但我认为这不是我的意图,特别是因为 .split()只需要一个参数进行拆分.

I've been trying to incorporate regex in .split() but haven't been successful. I've tried syntax using *'s and .'s but I don't think it's quite what my intention is especially since .split() only takes one argument to split against.

否则我认为另一种方法是使用 Pattern 和 Matcher 类?但我对适当的程序真的很模糊.

Otherwise I think another way is to use the Pattern and Matcher classes? But I'm really fuzzy on the appropriate procedure.

推荐答案

您可以构建正则表达式来为您执行此操作:

You can construct the regex to do this for you:

// pattern1 and pattern2 are String objects
String regexString = Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2);

这会将 pattern1pattern2 视为 literal 文本,并且模式之间的文本在第一个 捕获组.您可以删除 Pattern.quote() 如果你想使用正则表达式,但如果你这样做,我保证任何事情.

This will treat the pattern1 and pattern2 as literal text, and the text in between the patterns is captured in the first capturing group. You can remove Pattern.quote() if you want to use regex, but I don't guarantee anything if you do that.

您可以通过在 regexString 中添加标志来自定义匹配的发生方式.

You can add some customization of how the match should occurs by adding flags to the regexString.

  • 如果你想要Unicode-aware case-insensitive匹配,那么在regexString的开头添加(?iu),或者提供Pattern.CASE_INSENSITIVE |Pattern.UNICODE_CASE 标记为 Pattern.compile 方法.
  • 如果你想在两个分隔符跨行出现的情况下也抓取内容,那么在(.*?)之前添加(?s),即"(?s)(.*?)",或提供 Pattern.DOTALL 标记到 Pattern.compile 方法.
  • If you want Unicode-aware case-insensitive matching, then add (?iu) at the beginning of regexString, or supply Pattern.CASE_INSENSITIVE | Pattern.UNICODE_CASE flag to Pattern.compile method.
  • If you want to capture the content even if the two delimiting strings appear across lines, then add (?s) before (.*?), i.e. "(?s)(.*?)", or supply Pattern.DOTALL flag to Pattern.compile method.

然后编译正则表达式,得到一个Matcher 对象,遍历匹配项并将它们保存到 List(或任何 Collection,由您决定).

Then compile the regex, obtain a Matcher object, iterate through the matches and save them into a List (or any Collection, it's up to you).

Pattern pattern = Pattern.compile(regexString);
// text contains the full text that you want to extract data
Matcher matcher = pattern.matcher(text);

while (matcher.find()) {
  String textInBetween = matcher.group(1); // Since (.*?) is capturing group 1
  // You can insert match into a List/Collection here
}

测试代码:

String pattern1 = "hgb";
String pattern2 = "|";
String text = "sdfjsdkhfkjsdf hgb sdjfkhsdkfsdf |sdfjksdhfjksd sdf sdkjfhsdkf | sdkjfh hgb sdkjfdshfks|";

Pattern p = Pattern.compile(Pattern.quote(pattern1) + "(.*?)" + Pattern.quote(pattern2));
Matcher m = p.matcher(text);
while (m.find()) {
  System.out.println(m.group(1));
}

注意,如果你在这个输入法中搜索foobar之间的文本foo text foo text bar text bar上面,您将获得一个匹配项,即 text foo text .

Do note that if you search for the text between foo and bar in this input foo text foo text bar text bar with the method above, you will get one match, which is  text foo text .

这篇关于Java - 在两个字符串之间获取所有字符串的最佳方法?(正则表达式?)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆