如何使用R在另一个字符串向量中提取字符串向量的外观? [英] How do I extract appearances of a vector of strings in another vector of strings using R?

查看:208
本文介绍了如何使用R在另一个字符串向量中提取字符串向量的外观?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个像这样的字符串向量:

I have a vector of strings like this :

strings <- tibble(string = c("apple, orange, plum, tomato",
                             "plum, beat, pear, cactus",
                             "centipede, toothpick, pear, fruit"))

我有水果的载体:

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

我想要的是带有原始strings data.frame的data.frame/tibble,具有原始列中包含的所有水果的第二个列表或字符列.像这样的东西.

What I'd like is a data.frame/tibble with the original strings data.frame with a second list or character column of all the fruit contained in that original column. Something like this.

strings <- tibble(string = c("apple, orange, plum, tomato",
                             "plum, beat, pear, cactus",
                             "centipede, toothpick, pear, fruit"),
                   match = c("apple, orange, plum",
                             "plum, pear",
                             "pear")
                  )

我尝试了str_extract(strings, fruits),我得到了一个列表,其中所有内容都是空白以及警告:

I've tried str_extract(strings, fruits) and I get a list where everything is blank along with the warning:

Warning message:
In stri_detect_regex(string, pattern, opts_regex = opts(pattern)):
longer object length is not a multiple of shorter object length

我已经尝试过str_extract_all(strings, paste0(fruits, collapse = "|")),并且得到了相同的警告消息.

I've tried str_extract_all(strings, paste0(fruits, collapse = "|")) and I get and I get the same warning message.

我已经看过了这个在另一个字符串向量中找到一个字符串向量的匹配项,但这似乎无济于事.

I've looked at this Find matches of a vector of strings in another vector of strings, but that doesn't seem to help here.

任何帮助将不胜感激.

推荐答案

下面是使用purrr的示例

Here's an example using purrr

strings <- tibble(string = c("apple, orange, plum, tomato",
                         "plum, beat, pear, cactus",
                         "centipede, toothpick, pear, fruit"))

fruits <- tibble(fruit =c("apple", "orange", "plum", "pear"))

extract_if_exists <- function(string_to_parse, pattern){
  extraction <- stringi::stri_extract_all_regex(string_to_parse, pattern)
  extraction <- unlist(extraction[!(is.na(extraction))])
  return(extraction)
}

strings %>%
  mutate(matches = map(string, extract_if_exists, fruits$fruit)) %>%
  mutate(matches = map(string, str_c, collapse=", ")) %>%
  unnest

这篇关于如何使用R在另一个字符串向量中提取字符串向量的外观?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆