部分合并两个数据集并填写R中的NAs [英] Partially merge two datasets and fill in NAs in R

查看:131
本文介绍了部分合并两个数据集并填写R中的NAs的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个数据集



a =具有数千个不同天气事件观察值的原始数据集

  STATE EVTYPE 
1 AL WINTER STORM
2 AL TORNADO
3 AL TSTM WIND
4 AL TSTM WIND
5 AL TSTM WIND
6 AL HAIL
7 AL HIGH WIND
8 AL TSTM WIND
9 AL TSTM WIND
10 AL TSTM WIND

b =一个字典表,其中有一些天气事件的标准拼写。

  EVTYPE evmatch 
1 HIGH SURF ADVISORY< NA>
2沿海FLOOD海岸FLOOD
3 FLASH FLOOD FLASH FLOOD
4 LIGHTNING LIGHTNING
5 TSTM WIND< NA>
6 TSTM WIND(G45)< NA>

都被合并到 df_new evtype

  library(dplyr)
df_new< - left_join a,b,by = c(EVTYPE))

STATE EVTYPE evmatch
1 AL WINTER STORM WINTER STORM
2 AL TORNADO NA
3 AL TSTM WIND THUNDERSTORM WIND
4 AL TSTM WIND THUNDERSTORM WIND
5 AL TSTM WIND THUNDERSTORM WIND
6 AL HAIL NA
7 AL HIGH WIND HIGH WIND
8 AL TSTM WIND THUNDERSTORM WIND
9 AL TSTM WIND THUNDERSTORM WIND
10 AL TSTM WIND THUNDERSTORM WIND
11 AL HEAVY RAIN NA
12 AL FLASH FLOOD NA
13 AL TSTM WIND THUNDERSTORM WIND
14 AL HEAVY RAIN NA
15 AL TSTM WIND THUNDERSTORM WIND



填写缺失的NAs



正如你可以在 df_ne w $ evmatch ,有一个NAs。如何合并数据集,但是从 EVTYPE 中的相应单词填入 evmatch 中的所有NA。例如...



想要的输出



  STATE EVTYPE evmatch 
1 AL WINTER STORM WINTER STORM
2 AL TORNADO TORNADO
3 AL TSTM WIND THUNDERSTORM WIND
4 AL TSTM WIND THUNDERSTORM WIND
5 AL TSTM WIND THUNDERSTORM WIND
6 AL HAIL HAIL
7 AL HIGH WIND HIGH WIND
8 AL TSTM WIND THUNDERSTORM WIND
9 AL TSTM WIND THUNDERSTORM WIND
10 AL TSTM WIND THUNDERSTORM WIND
11 AL HEAVY RAIN HEAVY RAIN
12 AL FLASH FLOOD FLASH FLOOD
13 AL TSTM WIND THUNDERSTORM WIND
14 AL HEAVY RAIN HEAVY RAIN
15 AL TSTM WIND THUNDERSTORM WIND


解决方案

在问题的意见中给出了答案:



1:使用base R



方法1:



(df_new,ifelse(is.na(evmatch),EVTYPE,evmatch))

方法2:

  df_new $ evmatch [is.na df_new $ evmatch]<  -  df_new $ EVTYPE [is.na(df_new $ evmatch)

>注意:确保两个var都是字符,否则会出现错误的结果。如果需要转换为 as.character



2:使用 data.table

  library(data.table)
setDT(df_new)[is.na ),evmatch:= EVTYPE]

3:使用 dplyr

  library(dplyr)
filter(df_new,is.na(evmatch)%>%
select(evmatch)< - filter(df_new,is.na(evmatch)%>%
select(EVTYPE)


I have two datasets

a = raw dataset with thousands of observations of different weather events

   STATE       EVTYPE
1     AL WINTER STORM
2     AL      TORNADO
3     AL    TSTM WIND
4     AL    TSTM WIND
5     AL    TSTM WIND
6     AL         HAIL
7     AL    HIGH WIND
8     AL    TSTM WIND
9     AL    TSTM WIND
10    AL    TSTM WIND

b = a dictionary table, which has a standard spelling for some weather events.

                    EVTYPE       evmatch
1    HIGH SURF ADVISORY          <NA>
2         COASTAL FLOOD COASTAL FLOOD
3           FLASH FLOOD   FLASH FLOOD
4             LIGHTNING     LIGHTNING
5             TSTM WIND          <NA>
6       TSTM WIND (G45)          <NA>

both are merged into df_new by evtype

library(dplyr)
df_new <- left_join(a, b, by = c("EVTYPE"))

   STATE       EVTYPE           evmatch
1     AL WINTER STORM      WINTER STORM
2     AL      TORNADO              NA
3     AL    TSTM WIND THUNDERSTORM WIND
4     AL    TSTM WIND THUNDERSTORM WIND
5     AL    TSTM WIND THUNDERSTORM WIND
6     AL         HAIL              NA
7     AL    HIGH WIND         HIGH WIND
8     AL    TSTM WIND THUNDERSTORM WIND
9     AL    TSTM WIND THUNDERSTORM WIND
10    AL    TSTM WIND THUNDERSTORM WIND
11    AL   HEAVY RAIN        NA
12    AL  FLASH FLOOD       NA
13    AL    TSTM WIND THUNDERSTORM WIND
14    AL   HEAVY RAIN        NA
15    AL    TSTM WIND THUNDERSTORM WIND

Fill in the missing NAs

As you can see in the df_new$evmatch, there are a NAs. How can I merge the dataset, but have all NA's in evmatch filled in by the corresponding word from EVTYPE. For example...

Wanted output

 STATE       EVTYPE           evmatch
1     AL WINTER STORM      WINTER STORM
2     AL      TORNADO           TORNADO
3     AL    TSTM WIND THUNDERSTORM WIND
4     AL    TSTM WIND THUNDERSTORM WIND
5     AL    TSTM WIND THUNDERSTORM WIND
6     AL         HAIL              HAIL
7     AL    HIGH WIND         HIGH WIND
8     AL    TSTM WIND THUNDERSTORM WIND
9     AL    TSTM WIND THUNDERSTORM WIND
10    AL    TSTM WIND THUNDERSTORM WIND
11    AL   HEAVY RAIN        HEAVY RAIN
12    AL  FLASH FLOOD       FLASH FLOOD
13    AL    TSTM WIND THUNDERSTORM WIND
14    AL   HEAVY RAIN        HEAVY RAIN
15    AL    TSTM WIND THUNDERSTORM WIND

解决方案

Answers as given in the comments to the question:

1: using base R

Method 1:

df_new$evmatch <- with(df_new, ifelse(is.na(evmatch), EVTYPE, evmatch))

Method 2:

df_new$evmatch[is.na(df_new$evmatch] <- df_new$EVTYPE[is.na(df_new$evmatch]

Note: Make sure that both vars are characters or erroneous results will occur. If needed transform with as.character.

2: using data.table

library(data.table)
setDT(df_new)[is.na(evmatch), evmatch := EVTYPE]

3: using dplyr

library(dplyr)
filter(df_new, is.na(evmatch) %>% 
         select(evmatch) <- filter(df_new, is.na(evmatch) %>% 
                                     select(EVTYPE)

这篇关于部分合并两个数据集并填写R中的NAs的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆