计算 R 中列表的出现次数 [英] Count Occurrences of a List in R
问题描述
我有一个大约 100,000 次一起订购的项目的列表,我已将这些项目粘贴到一列中,以便我可以计算每个组合出现的次数.
I have a list of roughly 100,000 occurrences of items being ordered together that I have pasted into one column so I can count the number of times each combination occurs.
4845 Curly Fries California Burger 1
4846 French Fries California Burger 1
4847 Hamburger California Burger 1
4848 $1 Fountain Drinks Curly Fries 1
4849 $1 Fountain Drinks Curly Fries 1
4850 California Burger Curly Fries 1
4851 Curly Fries Curly Fries 1
我探索了聚合函数,它给了我以下错误:
I have explored the aggregate function which gives me the following error:
aggregate(t1$count,list(t1$pc), sum) <br>
Error in sort.list(y) : 'x' must be atomic for 'sort.list'
Have you called 'sort' on a list? <br>
我也尝试过 ddply 的变体:
I have also tried variations of ddply:
ddply(t1,t1$pc,transform,occurances=sum(t1$count))
但我收到此错误
Error in UseMethod("as.quoted") :
no applicable method for 'as.quoted' applied to an object of class "c('matrix', 'list')"
我假设我得到了这个,因为我试图基本上按字符值分组".我还根据对类似问题的回答探索了 tapply
和 recast
,但无济于事.
I am assuming I get this because I am trying to essentially "group" by a character value. I have also explored tapply
and recast
based on answers to similar questions, but to no avail.
我怎样才能得到这个组合数?
How can I get this count of combinations?
作为考虑,单独列出的项目示例(再次为格式问题道歉):
For consideration, a sample of items listed separately (again, apologies for the formatting issues):
Var1 Var2 Var3
>2 Onion Rings Onion Rings 1
>3 Pineapple Cheddar Burger Onion Rings 1
>4 Onion Rings Pineapple Cheddar Burger 1
>5 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
>5 Onion Rings Onion Rings 1
>6 Pineapple Cheddar Burger Onion Rings 1
>7 Onion Rings Pineapple Cheddar Burger 1
>8 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
>9 Fountain Soda Fountain Soda 1
>10 French Fries Fountain Soda 1
推荐答案
你最初的方法非常接近我认为你想要的.将这些组合成一个因素肯定会奏效,前提是您将它们以相同的顺序组合,这样您就不会以薯条,汉堡"和汉堡,薯条"结束.
Your initial approach was pretty close to what I think you want. Combining those into a single factor will definitely work, provided you combine them in the same order, such that you don't end up with "Fries, Burger" and "Burger, Fries."
可能有更简单的方法来做你想做的事,但我不知道那是什么.尽管如此,我认为这符合您的要求:
There may be an easier way of doing what you want, but I'm failing to brain what that is. Nevertheless, I think this does what you're looking for:
# Let's assume your data looks like this:
> df
Var1 Var2 Var3
1 Onion Rings Onion Rings 1
2 Pineapple Cheddar Burger Onion Rings 1
3 Onion Rings Pineapple Cheddar Burger 1
4 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
5 Onion Rings Onion Rings 1
6 Pineapple Cheddar Burger Onion Rings 1
7 Onion Rings Pineapple Cheddar Burger 1
8 Pineapple Cheddar Burger Pineapple Cheddar Burger 1
9 Fountain Soda Fountain Soda 1
10 French Fries Fountain Soda 1
# Now, for each row
# 1. sort the Var1 and Var2,
# 2. combine the sorted vars, and
# 3. convert them back into a factor
df$sortcomb <- as.factor(apply(df[,1:2], 1, function(x) paste(sort(x), collapse=", ")))
table(df$sortcomb) # then use table as per normal
ddply(df, .(sortcomb), summarize, count=length(sortcomb)) # or ddply
这篇关于计算 R 中列表的出现次数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!