如何在java中查找字符串中的全字索引 [英] How to find index of whole word in string in java

查看:177
本文介绍了如何在java中查找字符串中的全字索引的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想找出给定字符串中整个单词的所有起始索引。
我们可以说下面有一个字符串。

I want to find out all starting indexes of whole word in a given string. Lets say I have a string given below.


古代手稿,另一种将句子分成
的方法段落是一个换行符(换行符),后面是下一段开头的
开头。一个首字母是一个超大的大写
字母,有时超出文本边距。这种风格可以
,例如,在
Beowulf的原始古英语手稿中可以看到。在英语排版中仍然使用Outdenting,但通常不是
。[4]现代英语排版通常表示新的
缩进第一行的段落);

"an ancient manuscripts, another means to divide sentences into paragraphs was a line break (newline) followed by an initial at the beginning of the next paragraph. An initial is an oversize capital letter, sometimes outdented beyond the margin of text. This style can be seen, for example, in the original Old English manuscript of Beowulf. Outdenting is still used in English typography, though not commonly.[4] Modern English typography usually indicates a new paragraph by indenting the first line"); "

我想找出段落的起始索引。其中不应包括段落,段落。 。

I would like to find out the starting index of "paragraph" only. Which should not include "paragraphs", "paragraph.".

任何人都可以在java中知道如何做到这一点。
提前谢谢。

Can anyone give an idea how to do it in java. Thanks in advance.

推荐答案

您可以使用带有字边界字符<的正则表达式/ a>:

You can use a regexp with word boundaries character:

String text = "an ancient manuscripts, another means to divide sentences into paragraphs was a line break (newline) followed by an initial at the beginning of the next paragraph. An initial is an oversize capital letter, sometimes outdented beyond the margin of text. This style can be seen, for example, in the original Old English manuscript of Beowulf. Outdenting is still used in English typography, though not commonly.[4] Modern English typography usually indicates a new paragraph by indenting the first line";

Matcher m = Pattern.compile("\\bparagraph\\b").matcher(text);
while (m.find()) {
    System.out.println("Matching at: " + m.start());
}

如果您不想要段落(段落后跟一个点),您可以尝试

If you don't want "paragraph." ("paragraph" followed by a dot), you can try

Matcher m = Pattern.compile("\\bparagraph($| )").matcher(text);

表示段落后跟空格或行尾。

which means paragraph followed by a space or a end-of-line.

如果您要查找的字符串可以包含特殊字符(如(),则可以使用 Pattern.quote() 以逃避它:

If the String you are looking for can include special characters (like "("), you can use Pattern.quote() to escape it:

String mySearchString = "paragraph";
Matcher m = Pattern.compile("\\b" + Pattern.quote(mySearchString) + "($| )").matcher(text);

这篇关于如何在java中查找字符串中的全字索引的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆