当行值匹配时,将数据帧的列中的值除以其他数据帧中的值 [英] Dividing values in a column of a data frame by values from a different data frame when row values match
问题描述
我有一个data.frame
x
,格式如下:
I have a data.frame
x
with the following format:
species site count
1: A 1.1 25
2: A 1.2 1152
3: A 2.1 26
4: A 3.5 1
5: A 3.7 98
---
101: B 1.2 6
102: B 1.3 10
103: B 2.1 8
104: B 2.2 8
105: B 2.3 5
我还有另一个data.frame
area
,其格式如下:
I also have another data.frame
area
with the following format:
species area
1: A 59.7
2: B 34.4
3: C 37.7
4: D 22.8
当每个data.frame
的种类列中的值匹配时,我想将data.frame
x
的count
列除以area
列data.frame
area
中的值
I would like to divide the count
column of data.frame
x
by values in the area
column data.frame
area
when the values in the species column of each data.frame
match
我一直在尝试使其与ddply
函数一起工作:
I have been trying to make it work with a ddply
function:
density = ddply(x, "species", mutate, density = x$count/area[,2]
但是我无法弄清楚area[]
调用的正确索引语法,以仅选择与x$species
中找到的值匹配的行.但是,我对plyr
软件包(和整个apply*
函数)是超级新手,所以这可能是完全错误的方法
But I can't figure out the proper index syntax of the area[]
call to select only the row which matches the values found in x$species
. However, I am super new to the plyr
package (and apply*
functions as a whole) so this may be the completely wrong approach
我希望返回以下格式的data.frame
:
I'm hoping to return a data.frame
of the following format:
species site count density
1: A 1.1 25 0.419
2: A 1.2 152 2.546
3: A 2.1 26 0.436
4: A 3.5 1 0.017
5: A 3.7 98 1.641
---
101: B 1.2 6 0.174
102: B 1.3 10 0.291
103: B 2.1 8 0.233
104: B 2.2 8 0.233
105: B 2.3 5 0.145
推荐答案
使用data.table
很容易:
library(data.table)
#converting your data to the native type for the package (by reference)
setDT(x); setDT(area)
x[area, density:=count/i.area, on="species"]
:=
是在data.table
中添加列的自然方法(通过引用 ,请参见
:=
is the natural way to add columns in data.table
(by reference, see this vignette & particularly point b) for some more about this and why it's important), so x:=y
adds a column named x
to your data.table
and assigns it the value y
.
以X[Y,]
形式合并时,我们可以将Y
视为选择要操作的X
行;此外,当Y
是data.table
时,X
和Y
中的所有对象都可以在j
中使用(即,逗号后面的内容),因此我们可以说density:=count/area
;当我们想确定要引用Y
的一列时,我们在其名称前加上i.
,以便我们知道引用的是Y
的一列i
,即逗号前面的内容.合并即将发布.
When merging in the form X[Y,]
, we can think of Y
as selecting the rows of X
to operate on; further, when Y
is a data.table
, all objects in both X
and Y
are avaiable in j
(i.e., what comes after the comma), so we could have said density:=count/area
; when we want to be sure that we're referring to one of Y
's columns, we prepend its name with i.
so that we know we're referring to one of the columns in i
, i.e., what precedes the comma. There should be a vignette on merges forthcoming.
通常,一旦您认为跨不同数据集匹配",您的直觉就应该是合并.有关data.table
的更多信息,请参见
In general, as soon as you think "match across different data sets" your instinct should be to merge. For more on data.table
, see here.
这篇关于当行值匹配时,将数据帧的列中的值除以其他数据帧中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!