在R中使用data.table时,中位数返回错误 [英] Median returning an error when using data.table in R

查看:96
本文介绍了在R中使用data.table时,中位数返回错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据集

> head(DT)
    V1 V2 V3   V4   V5     V6 V7
1:  2  1  2 0.91 0.02 880.00  1
2:  3  2  1 0.02 0.00   2.24  2
3:  1  1  1 0.15 0.01   3.41  3
4:  1  2  1 3.92 0.05 268.67  2
5:  1  1  2 0.10 0.01   1.59  3
6:  0  1  1 1.20 0.04   1.43  3

> sapply(DT, class)
       V1        V2        V3        V4        V5        V6        V7 
"integer" "integer" "integer" "numeric" "numeric" "numeric"  "factor" 

可扩展数千行。我正在尝试计算由变量V7定义的8个组中V1-V6的中位数。

which expands for thousands of rows. I am trying to calculate the median values of V1-V6 within the 8 groups defined by the factor Variable V7

> levels(DT$V7)
[1] "1" "2" "3" "4" "5" "6" "7" "8"

此刻,我正在使用以下命令,该命令会返回错误:

At the moment I am using the following command, which returns an error:

> DT[, lapply(.SD, median), by = V7]
 Error in `[.data.table`(DF, , lapply(.SD, median), by = V7) : 
 Column 1 of result for group 4 is type 'integer' but expecting type 'double'. Column types must be consistent for each group.

我在某处读到一种解决方法是使用 as.double(median (X))。但这适用于单个列: DT [,as.double(median(X)),by = V7] ,但不适用于考虑所有列的情况: DT [,lapply(.SD,as.double(median)),by = V7] (正如预期的那样,因为您必须将输入传递给中值)

I read somewhere that a way around this was using as.double(median(X)). But this works for individual columns: DT[, as.double(median(X)), by = V7], but not for when considering all columns: DT[, lapply(.SD, as.double(median)), by = V7] (as expected, because you have to pass an input to median)

我可以使用汇总

> aggregate(DT[,c(1:6), with = FALSE], by = list(DF$V7), FUN = median)
  Group.1 V1 V2 V3     V4   V5      V6
   1       1  0  1  1  1.285 0.04 401.500
   2       2  1  2  1  3.565 0.06   6.400
   3       3  0  1  1  0.360 0.03  11.200
   4       4  1  1  1 74.290 0.26 325.960
   5       5  2  1  0  1.145 0.04   1.415
   6       6  0  1  1 10.100 0.18  93.000
   7       7  1  1  0  0.740 0.04   1.080
   8       8  1  1  0  7.970 0.40   0.050

但是我想知道是否有一种方法可以解决上述错误,并使用data.table进行计算

But I'd like to know if there is a way to solve the error described above and do this calculation using data.table

推荐答案

中位数很不寻常,因为它可以为相同的输入类型返回不同类型的返回值:

median is unusual because it can return different types of return values for the same input type:


默认方法返回与x,
类型相同的长度为1的对象,除非x是偶数长度的整数,那么结果将是
的两倍。

The default method returns a length-one object of the same type as x, except when x is integer of even length, when the result will be double.

但是,data.table需要一致的返回值类型。您有两种可能:

However, data.table needs a consistent return value type. You have two possibilities:

将所有列转换为数字:

DT[, paste0("V", 1:6) := lapply(.SD, as.numeric), by = V7]

或转换返回值中位数

DT[, lapply(.SD, function(x) as.numeric(median(x))), by = V7]

这篇关于在R中使用data.table时,中位数返回错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆