从字符串中提取以特定字符开头的单词 [英] Extract words starting with a particular character from a string

查看:1121
本文介绍了从字符串中提取以特定字符开头的单词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我收到了以下字符串:

 String line = "#food was testy. #drink lots of. #night was fab. #three #four";

我想要 #food #drink #night #three #four 来自它。

I want to take #food #drink #night #three and #four from it.

我试过这段代码:

    String[] words = line.split("#");
    for (String word: words) {
        System.out.println(word);
    }

但是它给出了食物是暴躁的多喝 nigth是fab

But it gives food was testy, drink lots of, nigth was fab, three and four.

推荐答案

拆分只会在找到的地方剪切整个字符串。这解释了你当前的结果。

split will only cuts the whole string at where it founds a #. That explain your current result.

你可能想要提取每个字符串的第一个单词,但执行任务的好工具是 RegEx

You may want to extract the first word of every pieces of string, but the good tool to perform your task is RegEx

在这里你可以如何实现它:

Here how you can achieve it:

String line = "#food was testy. #drink lots of. #night was fab. #three #four";

Pattern pattern = Pattern.compile("#\\w+");

Matcher matcher = pattern.matcher(line);
while (matcher.find())
{
    System.out.println(matcher.group());
}

输出为:

#food
#drink
#night
#three
#four

魔术发生在#\ + +中。

The magic happen in "#\w+".

  • # the pattern start with a #
  • \w Matches any letter (a-z, A-Z), number (0-9), or underscore.
  • + Matches one or more consecutive \w characters.

所以我们搜索以开头的内容,后跟一个或多个字母,数字或下划线。

So we search for stuff starting with # followed by one or more letter, number or underscore.

我们对'\'使用'\\',因为转义序列

We use '\\' for '\' because of Escape Sequences.

你可以玩它此处

查找的解释这里


  • find 方法扫描输入序列,寻找与模式匹配的下一个子序列。

  • group()返回与上一场比赛匹配的输入子序列。

  • The find method scans the input sequence looking for the next subsequence that matches the pattern.
  • group() returns the input subsequence matched by the previous match.

[edit]

使用 \w 可能会成为一个问题,如果你需要检测重音字符或非拉丁字符。

The use of \w can be an issue if you need to detect accented characters or non-latin characters.

例如:


Bonjour mon#bébé#chat。

"Bonjour mon #bébé #chat."

比赛将是:


  • #b

  • #chat

  • #b
  • #chat

这取决于您将接受的内容 hashTag 。但这是另一个问题,多个 讨论 存在 about它

It depends on what you will accept as possible hashTag. But it is an other question and multiple discussions exist about it.

例如,如果你想要任何语言的任何字符,#\p {L} + 看起来不错,但下划线不在其中......

For example, if you want any characters from any language, #\p{L}+ looks good, but the underscore is not in it...

这篇关于从字符串中提取以特定字符开头的单词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆