使用R插入缺少数据的值以及来自另一个数据帧的值 [英] Using R to insert a value for missing data with a value from another data frame

查看:81
本文介绍了使用R插入缺少数据的值以及来自另一个数据帧的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

全部

我有一个问题,我担心可能在这里行不通,但是在其他地方进行搜索会使我误入歧途.我可能没有使用正确的搜索词.

I have a question that I fear might be too pedestrian to ask here, but searching for it elsewhere is leading me astray. I may not be using the right search terms.

我在R中有一个面板数据框(国家/地区),在给定变量上缺少一些值.我试图用另一个数据帧中另一个向量的值来估算它们.这是我要做的事的说明.

I have a panel data frame (country-year) in R with some missing values on a given variable. I'm trying to impute them with the value from another vector in another data frame. Here's an illustration of what I am trying to do.

假设Data是感兴趣的数据帧,我尝试从另一个施主数据帧推算的给定矢量上缺少值.看起来像这样.

Assume Data is the data frame of interest, which has missing values on a given vector that I'm trying to impute from another donor data frame. It looks like this.

country    year      x
  70       1920    9.234
  70       1921    9.234
  70       1922    9.234
  70       1923    9.234
  70       1924    9.234
  80       1920      NA
  80       1921      NA
  80       1922      NA
  80       1923      NA
  80       1924      NA
  90       1920    7.562
  90       1921    7.562
  90       1922    7.562
  90       1923    7.562
  90       1924    7.562

这将是Donor框架,其值为country == 80

country      x
  70       9.234
  80       1.523
  90       7.562

除了Data$x[Data$country == 80] <- 1.523的命令外,我正在尝试找到一种无缝的方法来自动执行此操作.许多国家/地区缺少x.

I'm trying to find a seamless way to automate this, beyond a command of Data$x[Data$country == 80] <- 1.523. There are a lot of countries with missingness on x.

也许有必要澄清一下,简单的merge是最简单的方法,但不一定适合我尝试做的事情.有些国家的x年份会有所不同.基本上,我要完成的命令是说,如果给定国家/地区的所有年份Data的值均缺少x,请从Donor数据中获取该国家/地区的相应值,然后将其粘贴到所有国家/地区,作为最佳猜测".

It may be worth clarifying that a simple merge would be the easiest, but not necessarily appropriate for what I'm trying to do. Some countries will see variation on x over different years. Basically, what I'm trying to accomplish is a command that says that if the value of x is missing from Data for all years for a given country, take the corresponding value for the country from the Donor data and paste it over all country years as a "best guess" of sorts.

感谢您的输入.我怀疑这是一个菜鸟问题,但我不知道搜索该词的正确方法.

Thanks for any input. I suspect this is a rookie question, but I didn't know the right terms to search for it.

上述数据的可复制代码如下.

Reproducible code for the above data follows.

country <- c(70,70,70,70,70,80,80,80,80,80,90,90,90,90,90)
year <- c(1920,1921,1922,1923,1924,1920,1921,1922,1923,1924,1920,1921,1922,1923,1924)
x <- c(9.234,9.234,9.234,9.234,9.234,NA,NA,NA,NA,NA,7.562,7.562,7.562,7.562,7.562)

Data=data.frame(country=country,year=year,x=x)
summary(Data)

country <- c(70,80,90)
x <- c(9.234,1.523,7.562)
Donor=data.frame(country=country,x=x)
summary(Donor)

推荐答案

使用merge:

r = merge(Data, Donor, by="country", suffixes=c(".Data", ".Donor"))
Data$x = ifelse(is.na(r$x.Data), r$x.Donor, r$x.Data)

如果出于某种原因要覆盖x的 all 个值似乎不好,请使用which仅覆盖NA(具有相同合并):

If for some reason idea of overwriting all values of x seems bad then use which to overwrite only NAs (with the same merge):

r = merge(Data, Donor, by="country", suffixes=c(".Data", ".Donor"))
na.idx = which(is.na(Data$x))
Data[na.idx,"x"] = r[na.idx,"x.Donor"]

这篇关于使用R插入缺少数据的值以及来自另一个数据帧的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆