计算R中的一列中子串的出现次数 [英] counting the occurrence of substrings in a column in R with group by
问题描述
我想计算一个列中每个组的字符串出现次数。在这种情况下,字符串通常是字符列中的子字符串。
我有一些数据,例如
ID String村庄
1 fd_sec,ht_rm,A
2 NA,ht_rm A
3 fd_sec,B
4 san,ht_rm,C
我开始的代码是显然是不正确的,但是我没能找到我可以在列和组中使用grep函数的村庄
(c_NA = round(sum(sub $ en41_1 ==NA)),
c_ht_rm = round(sum(sum(sub_en41_1 ==NA)) sub $ en41_1 ==ht_rm)),
c_san = round(sum(sub $ en41_1 ==san)),
c_fd_sec = round(sum(sub $ en41_1 ==fd_sec )))
理想情况下,我的输出是:
village fd_sec不适用ht_rm san
A 1 1 2
B 1
C 1 1
预先感谢您
你也可以使用我的splitstackshape包中的 cSplit()
。因为这个包也加载了data.table,所以你可以使用 dcast()
来列表结果。
<例如:
library(splitstackshape)
cSplit(mydf,String,direction =long) [,dcast(.SD,village〜String)]
#使用'村庄'作为值栏。使用'value.var'覆盖
#缺少聚合函数,默认为'length'
#village fd_sec ht_rm san不适用
#1:A 1 2 0 1
#2 :B 1 0 0 0
#3:C 0 1 1 0
I would like to count the occurrences of a string in a column ....per group. In this case the string is often a substring in a character column.
I have some data e.g.
ID String village
1 fd_sec, ht_rm, A
2 NA, ht_rm A
3 fd_sec, B
4 san, ht_rm, C
The code that I began with is obviously incorrect, but I am failing on my search to find out I could use the grep function in a column and group by village
impacts <- se %>% group_by(village) %>%
summarise(c_NA = round(sum(sub$en41_1 == "NA")),
c_ht_rm = round(sum(sub$en41_1 == "ht_rm")),
c_san = round(sum(sub$en41_1 == "san")),
c_fd_sec = round(sum(sub$en41_1 == "fd_sec")))
Ideally my output would be:
village fd_sec NA ht_rm san
A 1 1 2
B 1
C 1 1
Thank you in advance
You can also use cSplit()
from my "splitstackshape" package. Since this package also loads "data.table", you can then just use dcast()
to tabulate the result.
Example:
library(splitstackshape)
cSplit(mydf, "String", direction = "long")[, dcast(.SD, village ~ String)]
# Using 'village' as value column. Use 'value.var' to override
# Aggregate function missing, defaulting to 'length'
# village fd_sec ht_rm san NA
# 1: A 1 2 0 1
# 2: B 1 0 0 0
# 3: C 0 1 1 0
这篇关于计算R中的一列中子串的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!