按年/十年创建每个项目的计数 [英] Create count per item by year/decade
问题描述
我在data.table中有以下数据:
I have data in a data.table that is as follows:
> x<-df[sample(nrow(df), 10),]
> x
> Importer Exporter Date
1: Ecuador United Kingdom 2004-01-13
2: Mexico United States 2013-11-19
3: Australia United States 2006-08-11
4: United States United States 2009-05-04
5: India United States 2007-07-16
6: Guatemala Guatemala 2014-07-02
7: Israel Israel 2000-02-22
8: India United States 2014-02-11
9: Peru Peru 2007-03-26
10: Poland France 2014-09-15
我试图创建摘要,以便给定一段时间(例如十年),我可以找到每个国家的时间出现为进口商和出口商。因此,在上面的例子中,除以十年时的期望输出应该是:
I am trying to create summaries so that given a time period (say a decade), I can find the number of time each country appears as Importer and Exporter. So, in the above example the desired output when dividing up by decade should be something like:
Decade Country.Name Importer.Count Exporter.Count
2000 Ecuador 1 0
2000 Mexico 1 1
2000 Australia 1 0
2000 United States 1 3
.
.
.
2010 United States 0 2
.
.
.
到目前为止,我已经尝试过使用aggregate和data.table方法=http://stackoverflow.com/questions/14641874/summary-of-data-for-each-year-in-r>这里,但他们似乎只是给我计数的数字进口商/出口商每年(或十年,因为我对此更感兴趣)。
So far, I have tried with aggregate and data.table methods as suggested by the post here, but both of them seem to just give me counts of the number Importers/Exporters per year (or decade as I am more interested in that).
> x$Decade<-year(x$Date)-year(x$Date)%%10
> importer_per_yr<-aggregate(Importer ~ Decade, FUN=length, data=x)
> importer_per_yr
Decade Importer
2 2000 6
3 2010 4
考虑到aggregate使用公式接口,我尝试添加另一个条件,但得到以下错误:
Considering that aggregate uses the formula interface, I tried adding another criteria, but got the following error:
> importer_per_yr<-aggregate(Importer~ Decade + unique(Importer), FUN=length, data=x)
Error in model.frame.default(formula = Importer ~ Decade + :
variable lengths differ (found for 'unique(Importer)')
有一种方法可以根据十年来创建摘要
Is there a way to create the summary according to the decade and the importer/ exporter? It does not matter if the summary for importer and exporter are in different tables.
推荐答案
我们可以做到这一点使用 data.table
方法,通过赋值:=
,然后创建'通过指定
,使用度量
列,从 dcast
,我们使用 fun.aggregate
作为 length
。
We can do this using data.table
methods, Create the 'Decade' column by assignment :=
, then melt
the data from 'wide' to 'long' format by specifying the measure
columns, reshape it back to 'wide' using dcast
and we use the fun.aggregate
as length
.
x[, Decade:= year(Date) - year(Date) %%10]
dcast(melt(x, measure = c("Importer", "Exporter"), value.name = "Country"),
Decade + Country~variable, length)
# Decade Country Importer Exporter
# 1: 2000 Australia 1 0
# 2: 2000 Ecuador 1 0
# 3: 2000 India 1 0
# 4: 2000 Israel 1 1
# 5: 2000 Peru 1 1
# 6: 2000 United Kingdom 0 1
# 7: 2000 United States 1 3
# 8: 2010 France 0 1
# 9: 2010 Guatemala 1 1
#10: 2010 India 1 0
#11: 2010 Mexico 1 0
#12: 2010 Poland 1 0
#13: 2010 United States 0 2
这篇关于按年/十年创建每个项目的计数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!