R数据表。 [英] R data.table subsetting on multiple conditions.
问题描述
使用以下数据集,如何编写一个data.table调用,将该表子集,并返回该客户已购买SKU 1的所有客户ID和相关订单?
With the below data set, how do I write a data.table call that subsets this table and returns all customer ID's and associated orders for that customer IF that customer has ever purchased SKU 1?
预期结果应返回一个表,该表在该条件下排除cid 3和5,并为符合sku == 1的客户的每一行返回。
Expected result should return a table that excludes cid 3 and 5 on that condition and every row for customers matching sku==1.
因为我不知道如何写一个contains语句,==字面值只返回sku的匹配条件...我相信有一个更好的方法..
I am getting stuck as I don't know how to write a "contains" statement, == literal returns only sku's matching condition... I am sure there is a better way..
library("data.table")
df<-data.frame(cid=c(1,1,1,1,1,2,2,2,2,2,3,4,5,5,6,6),
order=c(1,1,1,2,3,4,4,4,5,5,6,7,8,8,9,9),
sku=c(1,2,3,2,3,1,2,3,1,3,2,1,2,3,1,2))
dt=as.data.table(df)
推荐答案
这类似于以前的答案,但这里的子集化工作在更<$ c
This is similar to a previous answer, but here the subsetting works in a more data.table
like manner.
首先,让符合我们条件的cid:
First, lets take the cids that meet our condition:
match_cids = dt [sku == 1,cid]
$ c>%in%运算符允许我们仅过滤列表中包含的那些项。因此,使用上述:
the %in%
operator allows us to filter to just those items that are contained in the list. so, using the above:
dt [cid%in%match_cids]
或在一行上:
> dt[cid %in% dt[sku==1, cid]]
cid order sku
1: 1 1 1
2: 1 1 2
3: 1 1 3
4: 1 2 2
5: 1 3 3
6: 2 4 1
7: 2 4 2
8: 2 4 3
9: 2 5 1
10: 2 5 3
11: 4 7 1
12: 6 9 1
13: 6 9 2
这篇关于R数据表。的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!