如何将特定值从一个数据列复制到另一个数据列,同时匹配R中的其他列? [英] How to copy specific values from one data column to another while matching other columns in R?

查看:2487
本文介绍了如何将特定值从一个数据列复制到另一个数据列,同时匹配R中的其他列?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我搜索了很多地方(stackoverflow,r-blogger等等),但是没有找到一个很好的选择在R中。这是希望有人有一些想法。

I've searched a number of places (stackoverflow, r-blogger, etc), but haven't quite found a good option for doing this in R. Hopefully someone has some ideas.

我有一组环境采样数据。数据包括各种字段(访问日期,区域,位置,样品介质,样品组分,结果等)。

I have a set of environmental sampling data. The data includes a variety of fields (visit date, region, location, sample medium, sample component, result, etc.).

以下是相关字段的子集。这是我开始...

Here's a subset of the pertinent fields. This is where I start...

visit_date   region    location     media      component     result
1990-08-20   LAKE      555723       water       Mg            *Nondetect
1999-07-01   HILL      432422       water       Ca            3.2
2010-09-12   LAKE      555723       water       pH            6.8
2010-09-12   LAKE      555723       water       Mg            2.1
2010-09-12   HILL      432423       water       pH            7.2
2010-09-12   HILL      432423       water       N             0.8
2010-09-12   HILL      432423       water       NH4          112

我希望达到的是这样的表/数据框:

What I hope to reach is a table/dataframe like this:

visit_date   region    location     media      component     result        pH
1990-08-20   LAKE      555723       water       Mg            *Nondetect  *Not recorded
1999-07-01   HILL      432422       water       Ca            3.2         *Not recorded
2010-09-12   LAKE      555723       water       pH            6.8         6.8
2010-09-12   LAKE      555723       water       Mg            2.1         6.8
2010-09-12   HILL      432423       water       pH            7.2         7.2
2010-09-12   HILL      432423       water       N             0.8         7.2
2010-09-12   HILL      432423       water       NH4          112          7.2

I尝试使用此处的方法 -
R查找数据帧的行,其中某些列匹配另一个 - 但不幸的是没有得到我想要的结果。相反,pH栏是我的预填充值 -999 NA ,而不是该特定访问的pH值如果收集的日期。由于结果数据集约为500k条记录,因此我使用 unique(tResult $ pH)来确定pH值列。

I attempted to use the method here -- R finding rows of a data frame where certain columns match those of another -- but unfortunately didn't get to the result I wanted. Instead the pH column was either my pre-populated value -999 or NA and not the pH value for that particular visit date if it was collected. Since the result data set is around 500k records, I'm using unique(tResult$pH) to determine the values of the pH column.

这是尝试。 res 是原始结果data.frame,组件将是pH结果子集

Here's that attempt. res is the original result data.frame and component would be the pH result subset (the pH sample results from the main results table).

keys <- c("region", "location", "visit_date", "media")

tResults <- data.table(res, key=keys)
tComponent <- data.table(component, key=keys)

tResults[tComponent, pH>0]

我尝试使用 match 合并,但未成功。从那时起,我已经生成了一个子集的组件(pH在这个例子中),我复制结果列到一个新的pH列,认为我可以匹配的键,并更新一个新的pH列在主要结果设置

I've attempted using match, merge, and within on the original data frame without success. Since then I've generated a subset for the components (pH in this example) where I copied over the results column to a new "pH" column, thinking I could match the keys and update a new "pH" column in the main result set.

由于并非所有结果值都是数字值( *未记录),所以我试图使用数字如 -888 或其他可以替换的值,因此我可以强制至少结果和pH列为数值。除了 POSIXct 值的日期,剩余的列为字符列。原始数据框架使用 StringsAsFactors = FALSE 创建。

Since not all result values are numeric (with values like *Not recorded) I attempted to use numerics like -888 or other values which could substitute so I could force at least the result and pH columns to be numeric. Aside from the dates which are POSIXct values, the remaining columns are character columns. Original dataframe was created using StringsAsFactors=FALSE.

一旦我能做到这一点,用于其他组件的类似列,可用于填充和计算给定样本的其他值。至少这是我的目标。

Once I can do this, I'll be able to generate similar columns for other components that can be used to populate and calculate other values for a given sample. At least that's my goal.

所以我被困在这一个。在我看来,这应该很容易,但我一定不会看到它!

So I'm stumped on this one. In my mind it should be easy but I'm certainly NOT seeing it!

您的帮助和想法当然是欢迎和赞赏!

Your help and ideas are certainly welcome and appreciated!

推荐答案

#df1 is your first data set and is dataframe
df1$phtem<-with(df1,ifelse(component=="pH",result,NA))

library(data.table)
library(zoo) # locf function

setDT(df1)[,pH:=na.locf(phtem,na.rm = FALSE)]
    visit_date region location media component     result phtem  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect    NA  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2    NA  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8   6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1    NA 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2   7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8    NA 7.2
7: 2010-09-12   HILL   432423 water       NH4        112    NA 7.2

#如果你不需要,你可以删除phtem。

# you can delete phtem if you don't need.

编辑:

library(data.table)
setDT(df1)[,pH:=result[component=="pH"],by="region,location,visit_date,media"]
df1

   visit_date region location media component     result  pH
1: 1990-08-20   LAKE   555723 water        Mg *Nondetect  NA
2: 1999-07-01   HILL   432422 water        Ca        3.2  NA
3: 2010-09-12   LAKE   555723 water        pH        6.8 6.8
4: 2010-09-12   LAKE   555723 water        Mg        2.1 6.8
5: 2010-09-12   HILL   432423 water        pH        7.2 7.2
6: 2010-09-12   HILL   432423 water         N        0.8 7.2
7: 2010-09-12   HILL   432423 water       NH4        112 7.2

这篇关于如何将特定值从一个数据列复制到另一个数据列,同时匹配R中的其他列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆