计算R中的一列中子串的出现次数 [英] counting the occurrence of substrings in a column in R with group by

查看:675
本文介绍了计算R中的一列中子串的出现次数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想计算一个列中每个组的字符串出现次数。在这种情况下,字符串通常是字符列中的子字符串。



我有一些数据,例如

  ID String村庄
1 fd_sec,ht_rm,A
2 NA,ht_rm A
3 fd_sec,B
4 san,ht_rm,C

我开始的代码是显然是不正确的,但是我没能找到我可以在列和组中使用grep函数的村庄

  (c_NA = round(sum(sub $ en41_1 ==NA)),
c_ht_rm = round(sum(sum(sub_en41_1 ==NA)) sub $ en41_1 ==ht_rm)),
c_san = round(sum(sub $ en41_1 ==san)),
c_fd_sec = round(sum(sub $ en41_1 ==fd_sec )))

理想情况下,我的输出是:

  village fd_sec不适用ht_rm san 
A 1 1 2
B 1
C 1 1

预先感谢您

解决方案

你也可以使用我的splitstackshape包中的 cSplit()。因为这个包也加载了data.table,所以你可以使用 dcast()来列表结果。



<例如:

  library(splitstackshape)
cSplit(mydf,String,direction =long) [,dcast(.SD,village〜String)]
#使用'村庄'作为值栏。使用'value.var'覆盖
#缺少聚合函数,默认为'length'
#village fd_sec ht_rm san不适用
#1:A 1 2 0 1
#2 :B 1 0 0 0
#3:C 0 1 1 0


I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column.

I have some data e.g.

ID   String              village
1    fd_sec, ht_rm,      A
2    NA, ht_rm           A
3    fd_sec,             B
4    san, ht_rm,         C

The code that I began with is obviously incorrect, but I am failing on my search to find out I could use the grep function in a column and group by village

impacts <- se %>%  group_by(village) %>%
summarise(c_NA = round(sum(sub$en41_1 ==  "NA")),
          c_ht_rm = round(sum(sub$en41_1 ==  "ht_rm")),
          c_san = round(sum(sub$en41_1 ==  "san")),
          c_fd_sec = round(sum(sub$en41_1 ==  "fd_sec")))

Ideally my output would be:

village  fd_sec  NA  ht_rm  san
A        1       1   2 
B        1
C                    1      1

Thank you in advance

解决方案

You can also use cSplit() from my "splitstackshape" package. Since this package also loads "data.table", you can then just use dcast() to tabulate the result.

Example:

library(splitstackshape)
cSplit(mydf, "String", direction = "long")[, dcast(.SD, village ~ String)]
# Using 'village' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
#    village fd_sec ht_rm san NA
# 1:       A      1     2   0  1
# 2:       B      1     0   0  0
# 3:       C      0     1   1  0

这篇关于计算R中的一列中子串的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆