当行值匹配时,将数据帧的列中的值除以其他数据帧中的值 [英] Dividing values in a column of a data frame by values from a different data frame when row values match

查看:74
本文介绍了当行值匹配时,将数据帧的列中的值除以其他数据帧中的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个data.frame x,格式如下:

I have a data.frame x with the following format:

     species      site  count
1:         A       1.1     25
2:         A       1.2   1152
3:         A       2.1     26
4:         A       3.5      1
5:         A       3.7     98
---                         
101:       B       1.2      6
102:       B       1.3     10
103:       B       2.1      8
104:       B       2.2      8
105:       B       2.3      5

我还有另一个data.frame area,其格式如下:

I also have another data.frame area with the following format:

      species    area
1:          A    59.7
2:          B    34.4
3:          C    37.7
4:          D    22.8

当每个data.frame的种类列中的值匹配时,我想将data.frame xcount列除以areadata.frame area中的值

I would like to divide the count column of data.frame x by values in the area column data.frame area when the values in the species column of each data.frame match

我一直在尝试使其与ddply函数一起工作:

I have been trying to make it work with a ddply function:

density = ddply(x, "species", mutate, density = x$count/area[,2]

但是我无法弄清楚area[]调用的正确索引语法,以仅选择与x$species中找到的值匹配的行.但是,我对plyr软件包(和整个apply*函数)是超级新手,所以这可能是完全错误的方法

But I can't figure out the proper index syntax of the area[] call to select only the row which matches the values found in x$species. However, I am super new to the plyr package (and apply* functions as a whole) so this may be the completely wrong approach

我希望返回以下格式的data.frame:

I'm hoping to return a data.frame of the following format:

     species      site  count   density
1:         A       1.1     25     0.419
2:         A       1.2    152     2.546
3:         A       2.1     26     0.436
4:         A       3.5      1     0.017
5:         A       3.7     98     1.641
---                         
101:       B       1.2      6     0.174
102:       B       1.3     10     0.291
103:       B       2.1      8     0.233
104:       B       2.2      8     0.233
105:       B       2.3      5     0.145

推荐答案

使用data.table很容易:

library(data.table)
#converting your data to the native type for the package (by reference)
setDT(x); setDT(area) 
x[area, density:=count/i.area, on="species"]

:=是在data.table中添加列的自然方法(通过引用 ,请参见

:= is the natural way to add columns in data.table (by reference, see this vignette & particularly point b) for some more about this and why it's important), so x:=y adds a column named x to your data.table and assigns it the value y.

X[Y,]形式合并时,我们可以将Y视为选择要操作的X行;此外,当Ydata.table时,XY中的所有对象都可以在j中使用(即,逗号后面的内容),因此我们可以说density:=count/area;当我们想确定要引用Y的一列时,我们在其名称前加上i.,以便我们知道引用的是Y的一列i,即逗号前面的内容.合并即将发布.

When merging in the form X[Y,], we can think of Y as selecting the rows of X to operate on; further, when Y is a data.table, all objects in both X and Y are avaiable in j (i.e., what comes after the comma), so we could have said density:=count/area; when we want to be sure that we're referring to one of Y's columns, we prepend its name with i. so that we know we're referring to one of the columns in i, i.e., what precedes the comma. There should be a vignette on merges forthcoming.

通常,一旦您认为跨不同数据集匹配",您的直觉就应该是合并.有关data.table的更多信息,请参见此处.

In general, as soon as you think "match across different data sets" your instinct should be to merge. For more on data.table, see here.

这篇关于当行值匹配时,将数据帧的列中的值除以其他数据帧中的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆