R:用dplyr嵌套分组汇总? [英] R: nested grouped summaries with dplyr?

查看:122
本文介绍了R:用dplyr嵌套分组汇总?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图用假设的数据集练习R dplyr 包(链接到pastebin )不同酒吧的人们的饮酒记录:

  bar_name,person,drink_ordered,times_ordered, love_it 
Moe's Tavern,荷马,Romulan ale,2,TRUE
Moe's Tavern,荷马,苏格兰威士忌,1,FALSE
Moe's Tavern,Guanan,Romulan ale,1,TRUE
摩尼酒吧,贵妃酒,苏格兰威士忌酒,3,FALSE
Moe's酒馆,Rebecca,Romulan ale,2,FALSE
Moe's酒馆,Rebecca,苏格兰威士忌,4,TRUE
干杯,丽贝卡,百威,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
干杯,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
髋关节,荷马,苏格兰威士忌,3,FALSE
髋关节,荷马, 1,TRUE
髋关节,荷马,百威,1,FALSE
髋关节,Krusty,Romulan ale,3,TRUE
髋关节,Krusty,黑洞,4,FALSE
髋关节,Krusty,Corona,1,TRUE
髋关节,Rebecca,Corona,2,TRUE
髋关节,Rebecca,Romulan ale,4,FALSE
髋关节,弯曲器,电晕,1,真
十向前,弯曲,罗曼兰啤酒,1,
十向前,弯曲,黑洞,, FALSE
十向前,吉南,罗曼兰啤酒,2,真
十向前,吉南,百威,FALSE
十向前,Krusty,百威,1,
十向前,Krusty,黑洞,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,荷马,Corona,2,FALSE
Mos Eisley,荷马,Romulan ale,1,TRUE
Mos Eisley,Bender ,黑洞,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE

我已经使用了dplyr的 group_by() summarize()函数几次,但不知道如何处理更多的嵌套情况。具体来说,我想问一些问题:


  1. 对于每个唯一的 bar_name ,每个订购完全相同的饮料组合( drink_ordered )?在这个数据集中,这将被标记为酒吧Moe's Tavern,Cheers和Mos Eisley的 TRUE


  2. 即使每个在特定 bar_name 中订购完全相同的饮料组合,他们是否订购了饮料相同次数( times_ordered )?例如,Moe's Tavern和Mos Eisley会将我标记为 TRUE


  3. 然后,即使每个个人在特定的酒吧点击相同次数的完全相同的饮料组合,是他们的意见( likes_it )的饮料完全一样?在这个数据集中,Mos Eisley的 TRUE


请注意,在数据集中有一些情况(髋关节),其中所有三个问题的答案将为 FALSE ,并且缺少值(Ten Forward)。 p>

理想情况下,我希望生成一个表,其中第一列是 bar_name ,还有三个布尔列表示$ $ c> TRUE 或 FALSE 为三个问题。



我通过R中的 dplyr 有效地实现了这一点非常感谢。

解决方案

您可以:

  DF%>%
arrange(drink_ordered,times_ordered,likes_it)%>%group_by(bar_name,person)%>%
总结(
Ld = toString(drink_ordered),
Ldt = paste(Ld,toString(times_ordered),sep =_),
Ldtl = paste(Ldt,toString(likes_it),sep =_)
)%>%
group_by(bar_name)%>%
summarise_each(funs(n_distinct))%>%
mutate_each(funs(。== 1) person,-bar_name)

#bar_name person Ld Ldt Ldtl
#(chr)(int)(lgl)(lgl)(lgl)
#1干杯3 TRUE TRUE FALSE
#2 Moe's Tavern 3 TRUE FALSE FALSE
#3 Mos Eisley 3 TRUE TRUE TRUE
#4 Ten Forward 3 FALSE FALSE FALSE
#5髋关节4 FALSE FALSE FALSE


I'm trying to practise the R dplyr package with a hypothetical dataset (link to pastebin) of people's drinking records at different bars:

bar_name,person,drink_ordered,times_ordered,liked_it
Moe’s Tavern,Homer,Romulan ale,2,TRUE
Moe’s Tavern,Homer,Scotch whiskey,1,FALSE
Moe’s Tavern,Guinan,Romulan ale,1,TRUE
Moe’s Tavern,Guinan,Scotch whiskey,3,FALSE
Moe’s Tavern,Rebecca,Romulan ale,2,FALSE
Moe’s Tavern,Rebecca,Scotch whiskey,4,TRUE
Cheers,Rebecca,Budweiser,1,TRUE
Cheers,Rebecca,Black Hole,1,TRUE
Cheers,Bender,Budweiser,1,FALSE
Cheers,Bender,Black Hole,1,TRUE
Cheers,Krusty,Budweiser,1,TRUE
Cheers,Krusty,Black Hole,1,FALSE
The Hip Joint,Homer,Scotch whiskey,3,FALSE
The Hip Joint,Homer,Corona,1,TRUE
The Hip Joint,Homer,Budweiser,1,FALSE
The Hip Joint,Krusty,Romulan ale,3,TRUE
The Hip Joint,Krusty,Black Hole,4,FALSE
The Hip Joint,Krusty,Corona,1,TRUE
The Hip Joint,Rebecca,Corona,2,TRUE
The Hip Joint,Rebecca,Romulan ale,4,FALSE
The Hip Joint,Bender,Corona,1,TRUE
Ten Forward,Bender,Romulan ale,1,
Ten Forward,Bender,Black Hole,,FALSE
Ten Forward,Guinan,Romulan ale,2,TRUE
Ten Forward,Guinan,Budweiser,,FALSE
Ten Forward,Krusty,Budweiser,1,
Ten Forward,Krusty,Black Hole,1,FALSE
Mos Eisley,Krusty,Black Hole,1,TRUE
Mos Eisley,Krusty,Corona,2,FALSE
Mos Eisley,Krusty,Romulan ale,1,TRUE
Mos Eisley,Homer,Black Hole,1,TRUE
Mos Eisley,Homer,Corona,2,FALSE
Mos Eisley,Homer,Romulan ale,1,TRUE
Mos Eisley,Bender,Black Hole,1,TRUE
Mos Eisley,Bender,Corona,2,FALSE
Mos Eisley,Bender,Romulan ale,1,TRUE

I have used dplyr's group_by() and summarise() functions a couple times, but am not sure how to deal with more nested situations. Specifically, I wanna ask questions like:

  1. For each unique bar_name, did each person order the exact same combination of drinks (drink_ordered)? In this dataset, this would be marked TRUE for the bars Moe's Tavern, Cheers, and Mos Eisley.

  2. Even if each person ordered the exact same combination of drinks in a particular bar_name, did they order the drinks the same number of times (times_ordered)? For example, Moe's Tavern and Mos Eisley would me marked as TRUE for this question.

  3. Then, even if each person ordered the exact same combination of drinks in a particular bar the same number of times, are their opinions (liked_it) of the drinks exactly the same? In this dataset that would be TRUE for Mos Eisley.

Observe that in the dataset there are cases (The Hip Joint) where the answer would be FALSE for all three questions, and there are missing values (Ten Forward).

Ideally, I hope to produce a table where the first column is bar_name, and three more boolean columns saying TRUE or FALSE for each of the three questions.

How do I efficiently achieve this with dplyr in R? Thank you very much.

解决方案

You can do:

DF %>%
  arrange(drink_ordered, times_ordered, liked_it) %>% group_by(bar_name, person) %>%
  summarise(
    Ld   = toString(drink_ordered),
    Ldt  = paste(Ld, toString(times_ordered), sep="_"),
    Ldtl = paste(Ldt, toString(liked_it), sep="_")
  ) %>% 
  group_by(bar_name) %>% 
  summarise_each(funs(n_distinct)) %>%
  mutate_each(funs(. == 1), -person, -bar_name)

#        bar_name person    Ld   Ldt  Ldtl
#           (chr)  (int) (lgl) (lgl) (lgl)
# 1        Cheers      3  TRUE  TRUE FALSE
# 2  Moe’s Tavern      3  TRUE FALSE FALSE
# 3    Mos Eisley      3  TRUE  TRUE  TRUE
# 4   Ten Forward      3 FALSE FALSE FALSE
# 5 The Hip Joint      4 FALSE FALSE FALSE

这篇关于R:用dplyr嵌套分组汇总?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆