如何使用data.table有效地计算行中位数 [英] How to calculate row medians efficiently with data.table

查看：110 发布时间：2020/10/15 20:38:20 r data.table

本文介绍了如何使用data.table有效地计算行中位数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有一个相当大的data.table（1500万行，15列），我想为其计算每行的中位数。我可以使用

I have a fairly large data.table (15M rows, 15 columns) for which I want to calculate the median of each row. I can do this using

apply(DT, 1, median)  # DT is my data.table

但这很慢。有没有更快的，对数据表友好的替代方法？

but this is very slow. Is there a faster, data.table-friendly alternative?

作为一个小例子，如果我有

As a small working example, if I have

DT = data.table(a = c(1, 2, 4), b = c(6, 4, 7), 
                c = c(3, 9, 9), d = c(18, 1, -5))
#    a b c  d
# 1: 1 6 3 18
# 2: 2 4 9  1
# 3: 4 7 9 -5

什么是计算行中位数的最有效方法？

what is the most efficient way of computing the row medians?

apply(DT, 1, median)
# [1] 4.5 3.0 5.5

推荐答案

一种选择是使用 rowMedians 函数matrixstats 程序包：

An option is to use rowMedians-function from the matrixstats package:

library(matrixStats)
DT[, med := rowMedians(as.matrix(.SD))][]

它给出：

> DT
   a b c  d med
1: 1 6 3 18 4.5
2: 2 4 9  1 3.0
3: 4 7 9 -5 5.5

或仅使用 data.table ：

DT[, med := melt(DT, measure.vars = names(DT))[, r := 1:.N, variable][, median(value), by = r]$V1][]

这篇关于如何使用data.table有效地计算行中位数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何使用data.table有效地计算行中位数 [英] How to calculate row medians efficiently with data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用data.table有效地计算行中位数 [英] How to calculate row medians efficiently with data.table

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭