使用dplyr的多列的行中位数 [英] Rowwise median for multiple columns using dplyr
问题描述
鉴于以下数据集,我想为每一行计算M1,M2和M3列的中值。我正在寻找一种解决方案,其中将最后一列以中位数的名称添加到数据框。列名称(M1:M3)不应直接使用(在原始数据集中,有更多列,而不仅仅是3列)。
Given the following dataset, I want to compute for each row the median of the columns M1,M2 and M3. I am looking for a solution where the final column is added to the dataframe under the name 'Median'. The column names (M1:M3) should not be used directly (in the original dataset, there are many more columns, not just 3).
# A tibble: 8 x 5
I1 M1 M2 I2 M3
<int> <int> <int> <int> <int>
1 3 4 5 3 5
2 2 2 2 2 1
3 2 2 2 2 2
4 3 1 3 3 1
5 2 1 3 3 1
6 3 2 4 4 3
7 3 1 3 4 1
8 2 1 3 2 3
您可以使用以下数据加载数据集:
You can load the dataset using:
df = structure(list(I1 = c(3L, 2L, 2L, 3L, 2L, 3L, 3L, 2L), M1 = c(4L,
2L, 2L, 1L, 1L, 2L, 1L, 1L), M2 = c(5L, 2L, 2L, 3L, 3L, 4L, 3L,
3L), I2 = c(3L, 2L, 2L, 3L, 3L, 4L, 4L, 2L), M3 = c(5L, 1L, 2L,
1L, 1L, 3L, 1L, 3L)), class = c("tbl_df", "tbl", "data.frame"
), row.names = c(NA, -8L), .Names = c("I1", "M1", "M2", "I2",
"M3"))
我知道已经提出了几个类似的问题。但是,大多数发布的解决方案使用 rowMeans
或 rowSums
。我正在寻找一种解决方案,其中:
I know that several similar questions have already been asked. However, most solutions posted use rowMeans
or rowSums
. I'm looking for a solution where:
- 不能使用行功能。
- 该解决方案是一个简单的dplyr解决方案
之所以(2)是因为我正在向所有初学者教授 tidyverse。
The reason for (2) is that I am teaching the 'tidyverse' to total beginners.
推荐答案
我们可以使用 rowMedians
library(matrixStats)
library(dplyr)
df %>%
mutate(Median = rowMedians(as.matrix(.[grep('M\\d+', names(.))])))
< hr>
或者如果我们只需要使用 tidyverse
函数,请使用将其转换为'long'格式收集
,通过行
汇总并获得
中位数
值列的code>
Or if we need to use only tidyverse
functions, convert it to 'long' format with gather
, summarize
by row
and get the median
of the 'value' column
df %>%
rownames_to_column('rn') %>%
gather(key, value, starts_with('M')) %>%
group_by(rn) %>%
summarise(Median = median(value)) %>%
ungroup %>%
select(-rn) %>%
bind_cols(df, .)
或者另一种选择是<$ c $ row>( rowwise()
$ c> dplyr (希望该行没有问题)
Or another option is rowwise()
from dplyr
(hope the row is not a problem)
df %>%
rowwise() %>%
mutate(Median = median(c(!!! rlang::syms(grep('M', names(.), value=TRUE)))))
这篇关于使用dplyr的多列的行中位数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!