R正则表达式 - 提取以@符号开头的单词 [英] R regex - extract words beginning with @ symbol

查看：73 发布时间：2021/7/6 20:28:59 r regex stringr

本文介绍了R正则表达式 - 提取以@符号开头的单词的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 R 的 stringr 包从推文中提取 twitter 句柄.例如，假设我想获取向量中以A"开头的所有单词.我可以这样做

I'm trying to extract twitter handles from tweets using R's stringr package. For example, suppose I want to get all words in a vector that begin with "A". I can do this like so

library(stringr)

# Get all words that begin with "A"
str_extract_all(c("hAi", "hi Ahello Ame"), "(?<=\\b)A[^\\s]+")

[[1]]
character(0)

[[2]]
[1] "Ahello" "Ame"

太好了.现在让我们尝试使用@"代替A"

Great. Now let's try the same thing using "@" instead of "A"

str_extract_all(c("h@i", "hi @hello @me"), "(?<=\\b)\\@[^\\s]+")

[[1]]
[1] "@i"

[[2]]
character(0)

为什么这个例子给出了与我预期相反的结果，我该如何解决?

Why does this example give the opposite result that I was expecting and how can I fix it?

推荐答案

看来你的意思是

str_extract_all(c("h@i", "hi @hello @me", "@twitter"), "(?<=^|\\s)@[^\\s]+")
# [[1]]
# character(0)
# [[2]]
# [1] "@hello" "@me" 
# [[3]]
# [1] "@twitter"

正则表达式中的 \b 是一个边界，它出现在字符串中的两个字符之间，其中一个是单词字符，另一个不是单词字符".参见此处.由于空格和@"都是非单词字符，因此@"之前没有边界.

The \b in a regular expression is a boundary and it occurs "Between two characters in the string, where one is a word character and the other is not a word character." see here. Since an space and "@" are both non-word characters, there is no boundary before the "@".

在此修订版中，您可以匹配字符串的开头或空格之后的值.

With this revision you match either the start of the string or values that come after spaces.

这篇关于R正则表达式 - 提取以@符号开头的单词的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

R正则表达式 - 提取以@符号开头的单词 [英] R regex - extract words beginning with @ symbol

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R正则表达式 - 提取以@符号开头的单词 [英] R regex - extract words beginning with @ symbol

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭