使用dplyr过滤包含列的部分字符串的行 [英] Using dplyr to filter rows which contain partial string of column

查看：42 发布时间：2021/5/2 20:47:56 r filter dplyr mutate summarize

本文介绍了使用dplyr过滤包含列的部分字符串的行的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

假设我有一个类似的数据框

Assuming I have a data frame like

term     cnt
apple     10
apples     5
a apple on 3
blue pears 3
pears      1

如何过滤此列中的所有部分找到的字符串，例如结果

How could I filter all partial found strings within this column, e.g. getting as a result

term     cnt
apple     10
pears      1

并未指示我要过滤(匹配)的术语，而是通过自我引用的方式(即，它确实对照整个列检查了每个术语，并删除了部分匹配的术语).令牌的数量不受限制，字符串的一致性也不受限制(即"mapples"将与"apple"匹配).这将导致基于dplyr的广义反向版本

without indicating to which terms I want to filter (apple|pears), but through a self-referencing manner (i.e. it does check each term against the whole column and removes terms that are a partial match). The number of tokens is not limited, nor the consistency of strings (i.e. "mapples" would get matched by "apple"). This would result in an inverted generalized dplyr-based version of

d[grep("^apple$|^pears$", d$term), ]

此外，有趣的是，使用这种去部门化来获得累积的总和，例如

Additionally, it would be interesting use this departialisation to get a cumulated sum, e.g.

term     cnt
apple     18
pears      4

我无法使其与contains()或grep()一起使用.

I couldn't get it to work with contains() or grep().

谢谢

推荐答案

希望完整的答案.不是很惯用(就像Pythonista所说的那样)，但是有人可以建议对此进行改进:

Hopefully the complete answer. Not very idiomatic (as Pythonista's call) but someone can suggest improvement to this:

> ssss <- data.frame(c('apple','red apple','apples','pears','blue pears'),c(15,3,10,4,3))
> 
> names(ssss) <- c('Fruit','Count')
> 
> ssss
       Fruit Count
1      apple    15
2  red apple     3
3     apples    10
4      pears     4
5 blue pears     3
> 
> root_list <- as.vector(ssss$Fruit[unlist(lapply(ssss$Fruit,function(x){length(grep(x,ssss$Fruit))>1}))])
> 
> 
> ssss %>% filter(ssss$Fruit %in% root_list)
  Fruit Count
1 apple    15
2 pears     4
> 
> data <- data.frame(lapply(root_list, function(x){y <- stringr::str_extract(ssss$Fruit,x); ifelse(is.na(y),'',y)}))
> 
> cols <- colnames(data)
> 
> #data$x <- do.call(paste0, c(data[cols]))
> #for (co in cols) data[co] <- NULL
> 
> ssss$Fruit <- do.call(paste0, c(data[cols]))
> 
> ssss %>% group_by(Fruit) %>% summarise(val = sum(Count))
# A tibble: 2 x 2
  Fruit   val
  <chr> <dbl>
1 apple    28
2 pears     7
>

这篇关于使用dplyr过滤包含列的部分字符串的行的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr过滤包含列的部分字符串的行 [英] Using dplyr to filter rows which contain partial string of column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用dplyr过滤包含列的部分字符串的行 [英] Using dplyr to filter rows which contain partial string of column

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭