计算R中的一列中子串的出现次数 [英] counting the occurrence of substrings in a column in R with group by

查看：675 发布时间：2018/5/28 19:20:01 r grep summarize

本文介绍了计算R中的一列中子串的出现次数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我想计算一个列中每个组的字符串出现次数。在这种情况下，字符串通常是字符列中的子字符串。

我有一些数据，例如

  ID String村庄
 1 fd_sec，ht_rm，A 
 2 NA，ht_rm A 
 3 fd_sec，B 
 4 san，ht_rm，C

我开始的代码是显然是不正确的，但是我没能找到我可以在列和组中使用grep函数的村庄

  （c_NA = round（sum（sub $ en41_1 ==NA）），
 c_ht_rm = round（sum（sum（sub_en41_1 ==NA）） sub $ en41_1 ==ht_rm）），
 c_san = round（sum（sub $ en41_1 ==san）），
 c_fd_sec = round（sum（sub $ en41_1 ==fd_sec ）））

理想情况下，我的输出是：

  village fd_sec不适用ht_rm san 
 A 1 1 2 
 B 1 
 C 1 1

预先感谢您

解决方案

你也可以使用我的splitstackshape包中的 cSplit（）。因为这个包也加载了data.table，所以你可以使用 dcast（）来列表结果。

<例如：

library（splitstackshape） cSplit（mydf，String，direction =long） [，dcast（.SD，village〜String）] ＃使用'村庄'作为值栏。使用'value.var'覆盖＃缺少聚合函数，默认为'length' ＃village fd_sec ht_rm san不适用＃1：A 1 2 0 1 ＃2 ：B 1 0 0 0 ＃3：C 0 1 1 0

I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column.

I have some data e.g.
ID String village 1 fd_sec, ht_rm, A 2 NA, ht_rm A 3 fd_sec, B 4 san, ht_rm, C
The code that I began with is obviously incorrect, but I am failing on my search to find out I could use the grep function in a column and group by village
impacts <- se %>% group_by(village) %>% summarise(c_NA = round(sum(sub$en41_1 == "NA")), c_ht_rm = round(sum(sub$en41_1 == "ht_rm")), c_san = round(sum(sub$en41_1 == "san")), c_fd_sec = round(sum(sub$en41_1 == "fd_sec")))
Ideally my output would be:
village fd_sec NA ht_rm san A 1 1 2 B 1 C 1 1
Thank you in advance
解决方案
You can also use cSplit() from my "splitstackshape" package. Since this package also loads "data.table", you can then just use dcast() to tabulate the result.

Example:
library(splitstackshape) cSplit(mydf, "String", direction = "long")[, dcast(.SD, village ~ String)] # Using 'village' as value column. Use 'value.var' to override # Aggregate function missing, defaulting to 'length' # village fd_sec ht_rm san NA # 1: A 1 2 0 1 # 2: B 1 0 0 0 # 3: C 0 1 1 0

这篇关于计算R中的一列中子串的出现次数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

计算R中的一列中子串的出现次数 [英] counting the occurrence of substrings in a column in R with group by

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

计算R中的一列中子串的出现次数 [英] counting the occurrence of substrings in a column in R with group by

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭