各个列的摘要统计信息，其中列名表示组 [英] summary stats across columns, where column names indicate groups

查看：72 发布时间：2021/4/29 18:40:07 r loops dplyr data-manipulation summary

本文介绍了各个列的摘要统计信息，其中列名表示组的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

数据帧具有包含遵循命名模式的数千个矢量.每个向量名称都包含一个名词，然后是 _a ， _b 或 _c .以下是前10个var和obs:

Data frame have includes a few thousand vectors that follow a naming pattern. Each vector name includes a noun, then either _a, _b, or _c. Below are the first 10 vars and obs:

id  turtle_a   banana_a   castle_a   turtle_b   banana_b   castle_b   turtle_c   banana_c   castle_c
A      -0.58      -0.88      -0.56      -0.53      -0.32      -0.42      -0.52      -0.89      -0.72
B         NA         NA         NA      -0.84      -0.36      -0.26         NA         NA         NA
C       0.00      -0.43      -0.75      -0.35      -0.88      -0.14      -0.26      -0.15      -0.81
D      -0.81      -0.63      -0.77      -0.82      -0.83      -0.50      -0.77      -0.25      -0.07
E      -0.25      -0.33      -0.09      -0.51      -0.27      -0.81      -0.06      -0.23      -0.97
F      -0.80      -0.88      -0.05         NA         NA         NA         NA         NA         NA
G      -0.25      -0.76      -0.21         NA         NA         NA         NA         NA         NA
H      -0.47      -0.10      -0.67      -0.46      -0.71      -0.24      -0.76      -0.04      -0.11
I      -0.15      -0.34      -0.57      -0.40      -0.14      -0.49         NA         NA         NA
J      -0.65      -0.86      -0.37      -0.67      -0.81      -0.63         NA         NA         NA

数据框架 want 是名词组中每组变量在所有列中的均值.例如，对 id = A 的 turtle_a ， turtle_b 和 turtle_c 平均 -0.54 .如果我只是对示例中的少数名词组进行操作，这就是 want 的样子.

Data frame want is the mean across all columns for every set of variables in a noun group. For example, averaging turtle_a, turtle_b, and turtle_c for id=A equals -0.54. Here's what want looks like if I just do it for the handful of noun groups in the example.

id   turtle_m    banana_m    castle_m
A       -0.54       -0.70       -0.57
B       -0.84       -0.36       -0.26
C       -0.20       -0.49       -0.57
D       -0.80       -0.57       -0.45
E       -0.27       -0.28       -0.62
F       -0.80       -0.88       -0.05
G       -0.25       -0.76       -0.21
H       -0.56       -0.29       -0.34
I       -0.27       -0.24       -0.53
J       -0.66       -0.83       -0.50

到目前为止的选项:

使用 dplyr 中的 group_by()函数转换为长整型的 summary ，然后转置为宽幅.
对向量进行排序，使名词组彼此相邻出现，并编写一个循环计算列的均值，并在每次迭代中采用三列步骤

convert to long, summarize with a group_by() function in dplyr, and transpose back to wide.
resort the vectors so the noun groups appear next to each other, and write a loop that computes means across columns, taking three-column steps at each iteration

似乎 summarize_at 或 summarize_all 可能比我当前的任何一个选项都更有效地使用，但是我不确定如何以某种方式使用它通过命名约定对变量进行动态分组.

It seems like summarize_at or summarize_all could be used more effectively than either of my current options, but I'm not sure how to use it in a way that will dynamically group variables by naming convention.

有什么想法吗?

各个列的摘要统计信息，其中列名表示组 [英] summary stats across columns, where column names indicate groups

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

各个列的摘要统计信息，其中列名表示组 [英] summary stats across columns, where column names indicate groups

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭