使用dplyr的多列的行中位数 [英] Rowwise median for multiple columns using dplyr

查看：173 发布时间：2020/10/26 3:09:38 r dplyr

本文介绍了使用dplyr的多列的行中位数的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

鉴于以下数据集，我想为每一行计算M1，M2和M3列的中值。我正在寻找一种解决方案，其中将最后一列以中位数的名称添加到数据框。列名称（M1：M3）不应直接使用（在原始数据集中，有更多列，而不仅仅是3列）。

Given the following dataset, I want to compute for each row the median of the columns M1,M2 and M3. I am looking for a solution where the final column is added to the dataframe under the name 'Median'. The column names (M1:M3) should not be used directly (in the original dataset, there are many more columns, not just 3).

# A tibble: 8 x 5
 I1    M1    M2    I2    M3
<int> <int> <int> <int> <int>
1     3     4     5     3     5
2     2     2     2     2     1
3     2     2     2     2     2
4     3     1     3     3     1
5     2     1     3     3     1
6     3     2     4     4     3
7     3     1     3     4     1
8     2     1     3     2     3

您可以使用以下数据加载数据集：

You can load the dataset using:

df = structure(list(I1 = c(3L, 2L, 2L, 3L, 2L, 3L, 3L, 2L), M1 = c(4L, 
2L, 2L, 1L, 1L, 2L, 1L, 1L), M2 = c(5L, 2L, 2L, 3L, 3L, 4L, 3L, 
3L), I2 = c(3L, 2L, 2L, 3L, 3L, 4L, 4L, 2L), M3 = c(5L, 1L, 2L, 
1L, 1L, 3L, 1L, 3L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), .Names = c("I1", "M1", "M2", "I2", 
"M3"))

我知道已经提出了几个类似的问题。但是，大多数发布的解决方案使用 rowMeans 或 rowSums 。我正在寻找一种解决方案，其中：

I know that several similar questions have already been asked. However, most solutions posted use rowMeans or rowSums. I'm looking for a solution where:

不能使用行功能。

该解决方案是一个简单的dplyr解决方案

之所以（2）是因为我正在向所有初学者教授 tidyverse。

The reason for (2) is that I am teaching the 'tidyverse' to total beginners.

推荐答案

我们可以使用 rowMedians

library(matrixStats)
library(dplyr)
df %>% 
    mutate(Median = rowMedians(as.matrix(.[grep('M\\d+', names(.))])))

< hr>

或者如果我们只需要使用 tidyverse 函数，请使用将其转换为'long'格式收集，通过行汇总并获得中位数值列的code>






Or if we need to use only tidyverse functions, convert it to 'long' format with gather, summarize by row and get the median of the 'value' column
df %>% 
    rownames_to_column('rn') %>%
    gather(key, value, starts_with('M')) %>%
    group_by(rn) %>% 
    summarise(Median = median(value)) %>%
    ungroup %>% 
    select(-rn) %>%
    bind_cols(df, .)

或者另一种选择是<$ c $ row>（ rowwise（） $ c> dplyr （希望该行没有问题）

Or another option is rowwise() from dplyr (hope the row is not a problem)

df %>% 
   rowwise() %>% 
   mutate(Median =  median(c(!!! rlang::syms(grep('M', names(.), value=TRUE)))))

这篇关于使用dplyr的多列的行中位数的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用dplyr的多列的行中位数 [英] Rowwise median for multiple columns using dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用dplyr的多列的行中位数 [英] Rowwise median for multiple columns using dplyr

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭