从子集中选择观察值以基于 R 中的大型数据框创建新子集 [英] Select observations from a subset to create a new subset based on a large dataframe in R

查看：12 发布时间：2022/1/21 13:34:09 r dataframe dataset subset

本文介绍了从子集中选择观察值以基于 R 中的大型数据框创建新子集的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个包含许多列和行的数据集 (Purchase.df).这个问题的重要变量名称是Customer"、OrderDate"、DateRank"(对日期进行排序以便我可以找到最小的日期)和BrandName".以下是我正在使用的一个非常小的示例:(我是这个网站的新手，所以我希望我在下面粘贴的内容有效)

I have a dataset (Purchase.df) that contains many columns and rows. The important variable names for this question are "Customer", "OrderDate", "DateRank" (which ranks the dates so I can find the smallest date) and "BrandName." Below is a very small sample of what I'm working with: (I'm new to this website, so I hope what I paste below works)

Purchase.df<-structure(list(Customer = c(10071535L, 10071535L, 10071535L, 
10071535L, 10071535L, 10071535L, 10071711L, 10071711L, 10071711L, 
10071711L, 10071711L, 10071711L, 10071711L, 10071711L, 10071711L, 
10071711L, 10071711L, 10071711L, 10072059L, 10072059L, 10072059L, 
10072113L, 10072113L, 10072113L, 10072113L, 10072113L, 10072113L, 
10072113L), BrandName = structure(c(1L, 2L, 2L, 2L, 3L, 3L, 2L, 
2L, 2L, 2L, 3L, 3L, 1L, 3L, 1L, 2L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 
2L, 3L, 3L, 3L, 3L), .Label = c("X", "Y", "Z"), class = "factor"), 
OrderDate = structure(c(14L, 14L, 15L, 16L, 19L, 20L, 11L, 
18L, 5L, 6L, 1L, 17L, 21L, 22L, 23L, 8L, 10L, 13L, 7L, 9L, 
12L, 4L, 4L, 2L, 2L, 2L, 3L, 3L), .Label = c("1/17/2011 0:00", 
"1/19/2010 0:00", "1/25/2010 0:00", "1/4/2010 0:00", "10/22/2010 0:00", 
"11/15/2010 0:00", "11/23/2011 0:00", "12/14/2011 0:00", 
"12/16/2011 0:00", "2/7/2012 0:00", "3/16/2010 0:00", "3/21/2012 0:00", 
"4/16/2012 0:00", "4/27/2012 0:00", "5/16/2012 0:00", "5/30/2012 0:00", 
"5/5/2011 0:00", "6/1/2010 0:00", "6/12/2012 0:00", "7/3/2012 0:00", 
"8/1/2011 0:00", "8/16/2011 0:00", "9/19/2011 0:00"), class = "factor"), 
DateRank = c(18.5, 18.5, 20, 21, 24, 25, 15, 23, 9, 10, 1, 
22, 26, 27, 28, 12, 14, 17, 11, 13, 16, 7.5, 7.5, 3, 3, 3, 
5.5, 5.5)), .Names = c("Customer", "BrandName", "OrderDate", 
"DateRank"), row.names = c(NA, -28L), class = "data.frame")

我创建了这个大型数据集 (subset.df) 的一个子集，它为每个客户找到了第一个 OrderDate，并告诉我他们购买了哪个品牌.我使用以下代码来执行此操作:

I've created a subset of this large dataset (subset.df) which finds the first OrderDate for each customer, and tells me which brand they purchased. I used the following code to do this:

subset1<-split(Purchase.df,Purchase.df$Customer)
subset2<-lapply(split(Purchase.df,Purchase.df$Customer), function(chunk) chunk[which(chunk$DateRank==min(chunk$DateRank)),])
subset.df<-do.call(rbind, as.list(subset2))

现在，我想弄清楚哪些客户在第一个 OrderDate 订购了品牌 X，并创建一个新数据集 (BigSubset.df)，其中包含在第一个订购日期购买品牌 X 的客户的所有 OrderDates.

Now, I want to figure out which customers ordered Brand X on their first OrderDate, and create a new dataset (BigSubset.df) that contains all of the OrderDates for the customers that purchased Brand X on their first order date.

应该是这样的:

Customer    BrandName   OrderDate   DateRank
10071535    X   4/27/2012 0:00  18.5
10071535    Y   4/27/2012 0:00  18.5
10071535    Y   5/16/2012 0:00  20
10071535    Y   5/30/2012 0:00  21
10071535    Z   6/12/2012 0:00  24
10071535    Z   7/3/2012 0:00   25
10072059    X   11/23/2011 0:00 11
10072059    X   12/16/2011 0:00 13
10072059    X   3/21/2012 0:00  16
10072113    X   1/4/2010 0:00   7.5
10072113    Y   1/4/2010 0:00   7.5
10072113    Y   1/19/2010 0:00  3
10072113    Z   1/19/2010 0:00  3
10072113    Z   1/19/2010 0:00  3
10072113    Z   1/25/2010 0:00  5.5
10072113    Z   1/25/2010 0:00  5.5

当我尝试从 Purchase.df 创建 BigSubset.df 时，我似乎无法让 R 引用较小的数据集，因为行数不相等.我在 Google 上搜索过，但没有看到任何答案，所以我什至不确定这在 R 中是否可行.让我知道你的想法.

I can't seem to get R to reference the smaller dataset when I attempt to create BigSubset.df from Purchase.df because the number of rows are not equal. I've searched on Google and haven't seen any answers, so I'm not even sure if this is possible in R. Let me know what you think.

从子集中选择观察值以基于 R 中的大型数据框创建新子集 [英] Select observations from a subset to create a new subset based on a large dataframe in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

从子集中选择观察值以基于 R 中的大型数据框创建新子集 [英] Select observations from a subset to create a new subset based on a large dataframe in R

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭