使用Tidyr / Dplyr来总结字符串组的计数 [英] Using Tidyr/Dplyr to summarise counts of groups of strings

查看：140 发布时间：2017/7/13 21:49:39 r dplyr tidyr

本文介绍了使用Tidyr / Dplyr来总结字符串组的计数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我需要总结我分配给组的字符串数量，我知道我可以在dplyr / tidyr中做到这一点，但是我缺少一些东西。

示例数据集：

  Owner = c（'bob'，'julia'，'cheryl'，'bob'，'julia' 'cheryl'）
 Day = c（'Mon'，'Tue'）
 Locn = c（'house'，'store'，'apartment'，'office'，'house'，'shop '）
 data<  -  data.frame（Owner，Day，Locn）

which看起来像这样：

 所有者日位置
 1 bob Mon house 
 2 julia星座商店
 3 cheryl Mon公寓
 4 bob Tue办公室
 5 julia Mon house 
 6 cheryl Tue shop

我想按名称和日期进行分组，然后对列中的分组位置进行计数。在这个例子中，我想要房子和公寓添加到名为首页的列中，存储，办公室和店铺被列在工作列中。

我当前的代码（不起作用）：

  grouping_locn<  -  data％ >％
 dplyr :: arrange（Owner，Day）％>％
 dplyr :: group_by（Owner，Day）％>％
 dplyr :: summarize（Home =数据$％％c（'house'，'apartment'））
工作=（％c（store，office，apartment）中的数据$ Locn％））

我仅在目前的总结步骤中展示了我如何接近它。家庭和工作代码当前返回包含组的元素的行号的向量（即Home = 1 3 5）

我的预期输出：

 业主日主页工作
 1 bob Mon 1 0 
 2 bob Tue 0 1 
 3 julia Mon 1 0 
 4 julia Tue 0 1 
 5 cheryl Mon 1 0 
 6 cheryl Tue 0 1

在实际数据集（30k +行）中，每个所有者每天有多个Locn值，因此家庭和工作计数可以是除1和0之外的数字（所以没有布尔值）。

非常感谢。

解决方案

尝试这个

  data％>％
 group_by（Owner，Day）％>％
 summaryize（Home = sum（％c（房子，公寓）），
工作= sum（％c（store，office，shop）的位置％））
  pre> 
I need to summarise the counts of strings I am assigning to groups, and I know I can do it in dplyr/tidyr but I am missing something.

Example dataset:
Owner = c('bob','julia','cheryl','bob','julia','cheryl')
Day = c('Mon', 'Tue') 
Locn = c('house','store','apartment','office','house','shop')
data <- data.frame(Owner, Day, Locn)
which looks like this:
   Owner Day      Locn
1    bob Mon     house
2  julia Tue     store
3 cheryl Mon apartment
4    bob Tue    office
5  julia Mon     house
6 cheryl Tue      shop
I want to group by name and day, and then count up grouped locations in columns. In this example I want 'house' and 'apartment' to add to a column titled 'Home', and 'store', 'office' and 'shop' to be counted in a column 'Work'.

My current code (which doesn't work):
grouped_locn <- data %>%
  dplyr::arrange(Owner, Day) %>%
  dplyr::group_by(Owner, Day) %>%
  dplyr::summarize(Home = which(data$Locn %in% c('house', 'apartment')), 
               Work = which(data$Locn %in% c("store", "office", "apartment")))
I have only included my current attempt at the summarize step to show how I have been approaching it. The Home and Work code currently returns vectors of the row numbers that contain an element of the group (ie Home = 1 3 5)

My intended output:
   Owner Day   Home  Work
1    bob Mon      1     0
2    bob Tue      0     1
3  julia Mon      1     0
4  julia Tue      0     1
5 cheryl Mon      1     0
6 cheryl Tue      0     1
In the actual dataset (30k+ rows) there are multiple Locn values per Owner per Day, so the Home and Work counts can be numbers other than 1 and 0 (so no booleans).

Many thanks.
 解决方案 
Try this
data %>%
  group_by(Owner, Day) %>%
  summarise(Home = sum(Locn %in% c("house", "apartment")), 
            Work = sum(Locn %in% c("store", "office", "shop")))


                        
这篇关于使用Tidyr / Dplyr来总结字符串组的计数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用Tidyr / Dplyr来总结字符串组的计数 [英] Using Tidyr/Dplyr to summarise counts of groups of strings

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录关闭

使用Tidyr / Dplyr来总结字符串组的计数 [英] Using Tidyr/Dplyr to summarise counts of groups of strings

问题描述

相关文章

其他开发语言最新文章

热门教程

热门工具

登录 关闭

登录关闭