如何让 R 忽略 N/A 值而不删除整行? [英] How do I get R to ignore N/A values without having it delete the whole row?

查看:20
本文介绍了如何让 R 忽略 N/A 值而不删除整行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在处理一个大数据集(大约 40 列),我需要按月聚合不同列的值,对月内的值求平均值.数据集看起来像这样.

I am working on a big dataset (around 40 column) and I need to aggregate the values of different column by months, averaging the values inside the month. The dataset looks something like this.

dd <-
mo  yr Na   NH4 NO2
1 2009 0.4  N/A N/A
1 2009 0.2  0.1 N/A
2 2009 0.5  0.6 0.4
2 2009 0.7  0.2 0.1

我用过

dd.agg=aggregate(.~mo+yr, dd, FUN=mean)

创建一个新的数据集,但由于我在 NO2 列中有一些 N/A 数据(并且我无法删除它们或将它们更改为 0,因为它们是由于采样过程中的某些问题),因此整个一月从 dd.agg 数据集中删除.我尝试添加 na.rm=TRUE,但似乎没有帮助.

to create a new dataset, but since I have some N/A datas (and I can't remove them or change them into 0s because they are due to some problem in the sampling procedure) in the NO2 column, the whole month of January is removed from the dd.agg dataset. I've tried to add na.rm=TRUE, but it doesn't seem to help.

本质上,我需要的是让 R 忽略 N/A 数据的存在.不要将它们视为 0(它会影响平均值),而是要从该数据集中获得类似这样的东西

What I need is, essentially, for R to just ignore the presence of the N/A data. Not to treat them like 0s (it would affect the average) but to obtain from that dataset something like this

dd.agg <-
mo  yr Na   NH4 NO2
1 2009 0.3  0.1 N/A
2 2009 0.6  0.4 0.25

如果整个月由 N/As 组成,则平均值只是 N/A 值(或空单元格,这对我来说并不重要,因为我在图中不需要它们)以及何时一个月只有几个 N/As,以平均非 N/As 值.我可以逐行执行相同的聚合过程,然后手动将所有内容放入新数据集,但是对于 40 列来说有点痛苦......有什么想法吗?

Where if a whole month is comprised of N/As the average is just a N/A value (or an empty cell, it doesn't really matter to me since I don't need them in the plots) and when a month has just a couple of N/As, to average the non N/As value. I could do row by row the same aggregate procedure and then put everything inside a new dataset manually, but for 40 column is a bit of a pain... Any ideas?

推荐答案

我们可以使用na.action = na.pass

aggregate(.~mo+yr, dd, FUN=mean, na.rm = TRUE, na.action = na.pass)
#   mo   yr  Na NH4  NO2
#1  1 2009 0.3 0.1  NaN
#2  2 2009 0.6 0.4 0.25

<小时>

使用tidyverse,可以使用

library(tidyverse)
dd %>% 
    group_by(mo, yr) %>% 
    summarise_all(mean, na.rm = TRUE)

<小时>

注意:N/AR中不被视为NA.在尝试此操作之前,应先将其转换为 NA


NOTE: N/A is not regarded as NA in R. It should be first converted to NA before attempting this

在使用 read.table/read.csv 读取数据时,使用 na.strings 指定将成为 NA 的元素

While reading the data with read.table/read.csv, specify the elements that are going to be NA with na.strings

dd <- read.csv('file.csv', na.strings = "N/A")

数据

dd <- structure(list(mo = c(1L, 1L, 2L, 2L), yr = c(2009L, 2009L, 2009L, 
 2009L), Na = c(0.4, 0.2, 0.5, 0.7), NH4 = c(NA, 0.1, 0.6, 0.2
 ), NO2 = c(NA, NA, 0.4, 0.1)), class = "data.frame", row.names = c(NA, 
 -4L))

这里,我们指定为NA,因为N/A是一个字符串,这可以改变character的列类型factor 取决于读取方式(stringsAsFactors - 选项)

Here, we specify as NA because N/A is a string and this can change the column type of character or factor depending on the way it was read (stringsAsFactors - option)

这篇关于如何让 R 忽略 N/A 值而不删除整行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆