如何将函数应用于R中数据帧中的特定列集以替换NA [英] How to Apply functions to specific set of columns in data frame in R to replace NAs

查看：80 发布时间：2021/5/3 18:49:57 r function dataframe dry na

本文介绍了如何将函数应用于R中数据帧中的特定列集以替换NA的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个数据集，我想以不同的方式替换不同列中的NA.以下是虚拟数据集和用于复制它的代码.

I have a data set in which I want to replace NAs in different columns differently. Following is the dummy data set and code to replicate it .

test <- data.frame(ID = c(1:5),
               FirstName = c(NA,"Sid",NA,"Harsh","CJ"),
               LastName = c("Snow",NA,"Lapata","Khan",NA),
               BillNum = c(6:10),
               Phone = c(1213,3123,3123,NA,NA),
               Married = c("Yes","Yes",NA,"NO","Yes"),
               ZIP = c(1111,2222,333,444,555),
               Gender = c("M",NA,"F",NA,"M"),
               Address = c("A","B",NA,"C","D"))
> test
  ID FirstName LastName BillNum Phone Married  ZIP Gender Address
1  1      <NA>     Snow       6  1213     Yes 1111      M       A
2  2       Sid     <NA>       7  3123     Yes 2222   <NA>       B
3  3      <NA>   Lapata       8  3123    <NA>  333      F    <NA>
4  4     Harsh     Khan       9    NA      NO  444   <NA>       C
5  5        CJ     <NA>      10    NA     Yes  555      M       D

在某些列中，我想指出是否由客户提供了一个值，而没有保留提供的值，如下所示.

In some columns I want to indicate if a value was supplied by customer or not without retaining the supplied value as following.

Availability_Indicator <- function(x){
  x <- ifelse(is.na(x),"NotAvialable","Available")
  return(x)
}
test$FirstName <- Availability_Indicator(test$FirstName)
test$LastName <- Availability_Indicator(test$LastName)
test$Phone <- Availability_Indicator(test$Phone)
test$Address <- Availability_Indicator(test$Address)

我得到以下数据

> test
ID    FirstName     LastName BillNum        Phone Married  ZIP Gender
 1 NotAvialable    Available       6    Available     Yes 1111      M
 2    Available NotAvialable       7    Available     Yes 2222   <NA> 
 3 NotAvialable    Available       8    Available    <NA>  333      F
 4    Available    Available       9 NotAvialable      NO  444   <NA>
 5    Available NotAvialable      10 NotAvialable     Yes  555      M

Address
Available
Available
NotAvialable
Available
Available

在已婚和性别变量中，我不想丢失column的值，而只是按以下方式替换NA.

In married and gender variable I dont want to lose the value of column and just replace the NAs as following.

NotAvailable_Indicator <- function(x){
  x[is.na(x)]<-"NotAvailable"
  return(x)
}
test$Married <- NotAvailable_Indicator(test$Married)
test$Gender <- NotAvailable_Indicator(test$Gender)

我得到以下数据集.

ID    FirstName     LastName BillNum        Phone      Married  ZIP       Gender      Address
 1 NotAvialable    Available       6    Available          Yes 1111            M    Available
 2    Available NotAvialable       7    Available          Yes 2222 NotAvailable    Available
 3 NotAvialable    Available       8    Available NotAvailable  333            F NotAvialable
 4    Available    Available       9 NotAvialable           NO  444 NotAvailable    Available
 5    Available NotAvialable      10 NotAvialable          Yes  555            M    Available

我的问题是我不想重复每列的函数调用，因为我有大约200列.我无法使用Apply函数，因为我必须对数据进行子集处理，然后使用lapply应用函数，然后再次绑定到更改列顺序的原始数据.有什么方法可以提供列和函数的名称，是否可以将修改后的列以及其他未更改的列作为数据集返回，或者在不返回任何内容的情况下就地修改了列(如DataFrame.fillna带有参数inplace = logical的python)

My problem is that I dont want to repeat the function calls for each column separately as I have about 200 columns. I was not able to use apply functions as I had to subset data then apply the functions using lapply and then cbind again to original data which changed the order of columns. Is there any method where I can supply names of column and the function and I get modified columns along with other columns(which were not changed) in return as a data set or the columns are modified inplace without returning anything(like DataFrame.fillna in python which has argument inplace=logical)

数据

与 factor 类列相比，更改 character 的值更容易.因此，在"data.frame"调用中使用 stringsAsFActors = FALSE ，非数字列将是 character class

data

It is easier to change the values of character compared to factor class column. So, using stringsAsFActors=FALSE in the 'data.frame' call, the non-numeric columns would be character class

test <- data.frame(ID = c(1:5),
           FirstName = c(NA,"Sid",NA,"Harsh","CJ"),
           LastName = c("Snow",NA,"Lapata","Khan",NA),
           BillNum = c(6:10),
           Phone = c(1213,3123,3123,NA,NA),
           Married = c("Yes","Yes",NA,"NO","Yes"),
           ZIP = c(1111,2222,333,444,555),
           Gender = c("M",NA,"F",NA,"M"),
           Address = c("A","B",NA,"C","D"), stringsAsFactors=FALSE)

这篇关于如何将函数应用于R中数据帧中的特定列集以替换NA的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将函数应用于R中数据帧中的特定列集以替换NA [英] How to Apply functions to specific set of columns in data frame in R to replace NAs

问题描述

推荐答案

数据

data

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何将函数应用于R中数据帧中的特定列集以替换NA [英] How to Apply functions to specific set of columns in data frame in R to replace NAs

问题描述

推荐答案

数据

data

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭