使用R插入缺少数据的值以及来自另一个数据帧的值 [英] Using R to insert a value for missing data with a value from another data frame
问题描述
全部
我有一个问题,我担心可能在这里行不通,但是在其他地方进行搜索会使我误入歧途.我可能没有使用正确的搜索词.
I have a question that I fear might be too pedestrian to ask here, but searching for it elsewhere is leading me astray. I may not be using the right search terms.
我在R中有一个面板数据框(国家/地区),在给定变量上缺少一些值.我试图用另一个数据帧中另一个向量的值来估算它们.这是我要做的事的说明.
I have a panel data frame (country-year) in R with some missing values on a given variable. I'm trying to impute them with the value from another vector in another data frame. Here's an illustration of what I am trying to do.
假设Data
是感兴趣的数据帧,我尝试从另一个施主数据帧推算的给定矢量上缺少值.看起来像这样.
Assume Data
is the data frame of interest, which has missing values on a given vector that I'm trying to impute from another donor data frame. It looks like this.
country year x
70 1920 9.234
70 1921 9.234
70 1922 9.234
70 1923 9.234
70 1924 9.234
80 1920 NA
80 1921 NA
80 1922 NA
80 1923 NA
80 1924 NA
90 1920 7.562
90 1921 7.562
90 1922 7.562
90 1923 7.562
90 1924 7.562
这将是Donor
框架,其值为country == 80
country x
70 9.234
80 1.523
90 7.562
除了Data$x[Data$country == 80] <- 1.523
的命令外,我正在尝试找到一种无缝的方法来自动执行此操作.许多国家/地区缺少x
.
I'm trying to find a seamless way to automate this, beyond a command of Data$x[Data$country == 80] <- 1.523
. There are a lot of countries with missingness on x
.
也许有必要澄清一下,简单的merge
是最简单的方法,但不一定适合我尝试做的事情.有些国家的x
年份会有所不同.基本上,我要完成的命令是说,如果给定国家/地区的所有年份Data
的值均缺少x
,请从Donor
数据中获取该国家/地区的相应值,然后将其粘贴到所有国家/地区,作为最佳猜测".
It may be worth clarifying that a simple merge
would be the easiest, but not necessarily appropriate for what I'm trying to do. Some countries will see variation on x
over different years. Basically, what I'm trying to accomplish is a command that says that if the value of x
is missing from Data
for all years for a given country, take the corresponding value for the country from the Donor
data and paste it over all country years as a "best guess" of sorts.
感谢您的输入.我怀疑这是一个菜鸟问题,但我不知道搜索该词的正确方法.
Thanks for any input. I suspect this is a rookie question, but I didn't know the right terms to search for it.
上述数据的可复制代码如下.
Reproducible code for the above data follows.
country <- c(70,70,70,70,70,80,80,80,80,80,90,90,90,90,90)
year <- c(1920,1921,1922,1923,1924,1920,1921,1922,1923,1924,1920,1921,1922,1923,1924)
x <- c(9.234,9.234,9.234,9.234,9.234,NA,NA,NA,NA,NA,7.562,7.562,7.562,7.562,7.562)
Data=data.frame(country=country,year=year,x=x)
summary(Data)
country <- c(70,80,90)
x <- c(9.234,1.523,7.562)
Donor=data.frame(country=country,x=x)
summary(Donor)
推荐答案
使用merge
:
r = merge(Data, Donor, by="country", suffixes=c(".Data", ".Donor"))
Data$x = ifelse(is.na(r$x.Data), r$x.Donor, r$x.Data)
如果出于某种原因要覆盖x的 all 个值似乎不好,请使用which
仅覆盖NA(具有相同合并):
If for some reason idea of overwriting all values of x seems bad then use which
to overwrite only NAs (with the same merge):
r = merge(Data, Donor, by="country", suffixes=c(".Data", ".Donor"))
na.idx = which(is.na(Data$x))
Data[na.idx,"x"] = r[na.idx,"x.Donor"]
这篇关于使用R插入缺少数据的值以及来自另一个数据帧的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!