匹配R中包含转义字符的多个字符串 [英] Match multiple strings containing escape characters in R

查看:167
本文介绍了匹配R中包含转义字符的多个字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个包含表情符号的文本字符串向量和一个仅包含表情符号的字典。

I have a vector of text strings containing smilies and a dictionary containing only the smilies.

A <- c("This :/ :/ :) ^^","is :/ ^^", "weird^^ :)")
B <- c(":)",":/","^^")

我想为每个文本字符串(包括重复项)提取所有表情符号匹配项,因此我输出应如下所示:

I would like to extract all matches of smilies for each text string including duplicates, so my output should look like this:

[[1]]
[1] ":/" ":/" ":)" "^^"

[[2]]
[1] ":/" "^^"

[[3]]
[1] "^^" ":)"

这是我到目前为止尝试过的:

This is what I tried so far:

# does not return duplicates
sapply(A, function(x) B[str_detect(x, fixed(B))], USE.NAMES = FALSE)

[[1]]
[1] ":)" ":/" "^^"

[[2]]
[1] ":/" "^^"

[[3]]
[1] ":)" "^^"

# Only returns first instance
str_extract_all(A,fixed(B))

[[1]]
[1] ":)"

[[2]]
[1] ":/"

[[3]]
[1] "^^"

# returns error because of unescaped characters
rm_default(A,pattern=B,fixed=TRUE,extract=TRUE)
Error in stringi::stri_extract_all_regex(text.var, pattern) : 
  Incorrectly nested parentheses in regexp pattern. (U_REGEX_MISMATCHED_PAREN)
In addition: Warning messages:
1: In if (substring(pattern, 1, 4) == "@rm_") { :
  the condition has length > 1 and only the first element will be used
2: In if (substring(pattern, 1, 1) == "@") { :
  the condition has length > 1 and only the first element will be used

任何帮助都是值得的。

推荐答案

一种选择是进行 strsplit ,然后提取'B'中包含的元素

One option is to do strsplit and then extract the elements that are contained in 'B'

lapply(strsplit(A, "[A-Za-z ]"), function(x) x[x %in% B])
#[[1]]
#[1] ":/" ":/" ":)" "^^"

#[[2]]
#[1] ":/" "^^"

#[[3]]
#[1] "^^" ":)"

这篇关于匹配R中包含转义字符的多个字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆