使用Tidyr / Dplyr来总结字符串组的计数 [英] Using Tidyr/Dplyr to summarise counts of groups of strings
问题描述
我需要总结我分配给组的字符串数量,我知道我可以在dplyr / tidyr中做到这一点,但是我缺少一些东西。
示例数据集:
Owner = c('bob','julia','cheryl','bob','julia' 'cheryl')
Day = c('Mon','Tue')
Locn = c('house','store','apartment','office','house','shop ')
data< - data.frame(Owner,Day,Locn)
which看起来像这样:
所有者日位置
1 bob Mon house
2 julia星座商店
3 cheryl Mon公寓
4 bob Tue办公室
5 julia Mon house
6 cheryl Tue shop
我想按名称和日期进行分组,然后对列中的分组位置进行计数。在这个例子中,我想要房子和公寓添加到名为首页的列中,存储,办公室和店铺被列在工作列中。
我当前的代码(不起作用):
grouping_locn< - data% >%
dplyr :: arrange(Owner,Day)%>%
dplyr :: group_by(Owner,Day)%>%
dplyr :: summarize(Home =数据$%%c('house','apartment'))
工作=(%c(store,office,apartment)中的数据$ Locn%))
我仅在目前的总结步骤中展示了我如何接近它。家庭和工作代码当前返回包含组的元素的行号的向量(即Home = 1 3 5)
我的预期输出:
业主日主页工作
1 bob Mon 1 0
2 bob Tue 0 1
3 julia Mon 1 0
4 julia Tue 0 1
5 cheryl Mon 1 0
6 cheryl Tue 0 1
在实际数据集(30k +行)中,每个所有者每天有多个Locn值,因此家庭和工作计数可以是除1和0之外的数字(所以没有布尔值)。
非常感谢。
尝试这个
data%>%
pre>
group_by(Owner,Day)%>%
summaryize(Home = sum(%c(房子,公寓)),
工作= sum(%c(store,office,shop)的位置%))
I need to summarise the counts of strings I am assigning to groups, and I know I can do it in dplyr/tidyr but I am missing something.
Example dataset:
Owner = c('bob','julia','cheryl','bob','julia','cheryl') Day = c('Mon', 'Tue') Locn = c('house','store','apartment','office','house','shop') data <- data.frame(Owner, Day, Locn)
which looks like this:
Owner Day Locn 1 bob Mon house 2 julia Tue store 3 cheryl Mon apartment 4 bob Tue office 5 julia Mon house 6 cheryl Tue shop
I want to group by name and day, and then count up grouped locations in columns. In this example I want 'house' and 'apartment' to add to a column titled 'Home', and 'store', 'office' and 'shop' to be counted in a column 'Work'.
My current code (which doesn't work):
grouped_locn <- data %>% dplyr::arrange(Owner, Day) %>% dplyr::group_by(Owner, Day) %>% dplyr::summarize(Home = which(data$Locn %in% c('house', 'apartment')), Work = which(data$Locn %in% c("store", "office", "apartment")))
I have only included my current attempt at the summarize step to show how I have been approaching it. The Home and Work code currently returns vectors of the row numbers that contain an element of the group (ie Home = 1 3 5)
My intended output:
Owner Day Home Work 1 bob Mon 1 0 2 bob Tue 0 1 3 julia Mon 1 0 4 julia Tue 0 1 5 cheryl Mon 1 0 6 cheryl Tue 0 1
In the actual dataset (30k+ rows) there are multiple Locn values per Owner per Day, so the Home and Work counts can be numbers other than 1 and 0 (so no booleans).
Many thanks.
解决方案Try this
data %>% group_by(Owner, Day) %>% summarise(Home = sum(Locn %in% c("house", "apartment")), Work = sum(Locn %in% c("store", "office", "shop")))
这篇关于使用Tidyr / Dplyr来总结字符串组的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!