使用其他数据填充缺失值? [英] Fill up missing values using the other data?

查看:77
本文介绍了使用其他数据填充缺失值?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

A <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00EF", "00EF", "00FR", "00FR"),  
                Item_B = c(NA, NA, NA, NA, "JAMES RIVER", NA, NA))

B <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00FR", "00FR"), 
                Item_B = c("JAMES RIVER", NA, "JAMES RIVER",
                           "RICE MIDSTREAM", "RICE MIDSTREAM"))

预期:

A <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00EF", "00EF", "00FR", "00FR"),  
                Item_B = c("JAMES RIVER", "JAMES RIVER", "JAMES RIVER", 
                         "JAMES RIVER", "JAMES RIVER", "RICE MIDSTREAM", "RICE MIDSTREAM"))

B <- data.frame(Item_A = c("00EF", "00EF", "00EF", "00FR", "00FR"), 
                Item_B = c("JAMES RIVER", "JAMES RIVER", "JAMES RIVER", 
                           "RICE MIDSTREAM", "RICE MIDSTREAM"))

我必须根据Item_A相同的其他行的Item_B填写项目Item_B.例如,数据集A中的Item_B的第一个至第四个观察值必须变为"JAMES RIVER".

I have to fill in item Item_B according to the Item_B of other rows where Item_A is the same. For example, the first to fourth observation of Item_B in data set A need to become "JAMES RIVER".

您能建议一种方法来填写R中缺少的值吗?我尝试了许多技术,但无法获得想要的东西.

Can you please suggest a way to fill in the missing values in R? I tried many techniques but couldn't get what I wanted.

推荐答案

据我了解的问题,这不是 只是简单地在每个字段的一栏中填充缺失值的练习data.frame.我认为这需要借助查找或映射表来填充属于Item_AItem_B的值:

As far as I have understood the question, this is not just an exercise to simply filling up missing values in one column of each data.frame. I believe this requires to fill in the values of Item_B which belong to Item_A with help of a look up or mapping table:

library(data.table)
# create mapping table from both data.frames
map <- unique(rbindlist(list(A, B)))[!is.na(Item_B)]
# or, in case there are additional columns besides Item_A and Item_B
map <- unique(rbindlist(list(A, B))[!is.na(Item_B), .(Item_A, Item_B)])
map

   Item_A         Item_B
1:   00FF    JAMES RIVER
2:   00EF    JAMES RIVER
3:   00FR RICE MIDSTREAM

# join and replace
setDT(A)[map, on = c("Item_A"), Item_B := i.Item_B][]

   Item_A         Item_B
1:   00FF    JAMES RIVER
2:   00FF    JAMES RIVER
3:   00FF    JAMES RIVER
4:   00FF    JAMES RIVER
5:   00FF    JAMES RIVER
6:   00FR RICE MIDSTREAM
7:   00FR RICE MIDSTREAM

setDT(B)[map, on = c("Item_A"), Item_B := i.Item_B][]

   Item_A         Item_B
1:   00EF    JAMES RIVER
2:   00EF    JAMES RIVER
3:   00EF    JAMES RIVER
4:   00FR RICE MIDSTREAM
5:   00FR RICE MIDSTREAM

在连接期间,有两列名为Item_B,一列来自第一个数据表A(或分别为B),另一列来自第二个数据表map.为了区分它们,前缀i.表示应从map中提取i.Item_B.

During join, there are two columns named Item_B, one from the first data table, A (or B, resp.) and the other from the second data table map. To distinguish them, the i. prefix indicates that i.Item_B should be taken from map.

这篇关于使用其他数据填充缺失值?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆