如何将特定值从一个数据列复制到另一个数据列,同时匹配R中的其他列? [英] How to copy specific values from one data column to another while matching other columns in R?
问题描述
我搜索了很多地方(stackoverflow,r-blogger等等),但是没有找到一个很好的选择在R中。这是希望有人有一些想法。
I've searched a number of places (stackoverflow, r-blogger, etc), but haven't quite found a good option for doing this in R. Hopefully someone has some ideas.
我有一组环境采样数据。数据包括各种字段(访问日期,区域,位置,样品介质,样品组分,结果等)。
I have a set of environmental sampling data. The data includes a variety of fields (visit date, region, location, sample medium, sample component, result, etc.).
以下是相关字段的子集。这是我开始...
Here's a subset of the pertinent fields. This is where I start...
visit_date region location media component result
1990-08-20 LAKE 555723 water Mg *Nondetect
1999-07-01 HILL 432422 water Ca 3.2
2010-09-12 LAKE 555723 water pH 6.8
2010-09-12 LAKE 555723 water Mg 2.1
2010-09-12 HILL 432423 water pH 7.2
2010-09-12 HILL 432423 water N 0.8
2010-09-12 HILL 432423 water NH4 112
我希望达到的是这样的表/数据框:
What I hope to reach is a table/dataframe like this:
visit_date region location media component result pH
1990-08-20 LAKE 555723 water Mg *Nondetect *Not recorded
1999-07-01 HILL 432422 water Ca 3.2 *Not recorded
2010-09-12 LAKE 555723 water pH 6.8 6.8
2010-09-12 LAKE 555723 water Mg 2.1 6.8
2010-09-12 HILL 432423 water pH 7.2 7.2
2010-09-12 HILL 432423 water N 0.8 7.2
2010-09-12 HILL 432423 water NH4 112 7.2
I尝试使用此处的方法 -
R查找数据帧的行,其中某些列匹配另一个 - 但不幸的是没有得到我想要的结果。相反,pH栏是我的预填充值 -999
或 NA
,而不是该特定访问的pH值如果收集的日期。由于结果数据集约为500k条记录,因此我使用 unique(tResult $ pH)
来确定pH值列。
I attempted to use the method here --
R finding rows of a data frame where certain columns match those of another -- but unfortunately didn't get to the result I wanted. Instead the pH column was either my pre-populated value -999
or NA
and not the pH value for that particular visit date if it was collected. Since the result data set is around 500k records, I'm using unique(tResult$pH)
to determine the values of the pH column.
这是尝试。 res
是原始结果data.frame,组件
将是pH结果子集
Here's that attempt. res
is the original result data.frame and component
would be the pH result subset (the pH sample results from the main results table).
keys <- c("region", "location", "visit_date", "media")
tResults <- data.table(res, key=keys)
tComponent <- data.table(component, key=keys)
tResults[tComponent, pH>0]
我尝试使用 match $原始数据帧上的c $ c>,
合并
和,但未成功。从那时起,我已经生成了一个子集的组件(pH在这个例子中),我复制结果列到一个新的pH列,认为我可以匹配的键,并更新一个新的pH列在主要结果设置
I've attempted using match
, merge
, and within
on the original data frame without success. Since then I've generated a subset for the components (pH in this example) where I copied over the results column to a new "pH" column, thinking I could match the keys and update a new "pH" column in the main result set.
由于并非所有结果值都是数字值( *未记录
),所以我试图使用数字如 -888
或其他可以替换的值,因此我可以强制至少结果和pH列为数值。除了 POSIXct
值的日期,剩余的列为字符
列。原始数据框架使用 StringsAsFactors = FALSE
创建。
Since not all result values are numeric (with values like *Not recorded
) I attempted to use numerics like -888
or other values which could substitute so I could force at least the result and pH columns to be numeric. Aside from the dates which are POSIXct
values, the remaining columns are character
columns. Original dataframe was created using StringsAsFactors=FALSE
.
一旦我能做到这一点,用于其他组件的类似列,可用于填充和计算给定样本的其他值。至少这是我的目标。
Once I can do this, I'll be able to generate similar columns for other components that can be used to populate and calculate other values for a given sample. At least that's my goal.
所以我被困在这一个。在我看来,这应该很容易,但我一定不会看到它!
So I'm stumped on this one. In my mind it should be easy but I'm certainly NOT seeing it!
您的帮助和想法当然是欢迎和赞赏!
Your help and ideas are certainly welcome and appreciated!
推荐答案
#df1 is your first data set and is dataframe
df1$phtem<-with(df1,ifelse(component=="pH",result,NA))
library(data.table)
library(zoo) # locf function
setDT(df1)[,pH:=na.locf(phtem,na.rm = FALSE)]
visit_date region location media component result phtem pH
1: 1990-08-20 LAKE 555723 water Mg *Nondetect NA NA
2: 1999-07-01 HILL 432422 water Ca 3.2 NA NA
3: 2010-09-12 LAKE 555723 water pH 6.8 6.8 6.8
4: 2010-09-12 LAKE 555723 water Mg 2.1 NA 6.8
5: 2010-09-12 HILL 432423 water pH 7.2 7.2 7.2
6: 2010-09-12 HILL 432423 water N 0.8 NA 7.2
7: 2010-09-12 HILL 432423 water NH4 112 NA 7.2
#如果你不需要,你可以删除phtem。
# you can delete phtem if you don't need.
编辑:
library(data.table)
setDT(df1)[,pH:=result[component=="pH"],by="region,location,visit_date,media"]
df1
visit_date region location media component result pH
1: 1990-08-20 LAKE 555723 water Mg *Nondetect NA
2: 1999-07-01 HILL 432422 water Ca 3.2 NA
3: 2010-09-12 LAKE 555723 water pH 6.8 6.8
4: 2010-09-12 LAKE 555723 water Mg 2.1 6.8
5: 2010-09-12 HILL 432423 water pH 7.2 7.2
6: 2010-09-12 HILL 432423 water N 0.8 7.2
7: 2010-09-12 HILL 432423 water NH4 112 7.2
这篇关于如何将特定值从一个数据列复制到另一个数据列,同时匹配R中的其他列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!