使用tabular(){表格:如何为多个采样定义函数(例如p值)? [英] Using tabular(){tables: How to define a function (for p value, e.g.) for multiple samplings?

查看:116
本文介绍了使用tabular(){表格:如何为多个采样定义函数(例如p值)?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我经常不得不产生包含例如均值,标准差和简单测试结果的表.为了具有可复制和共享的工作流程,我尝试使用tables::tabular做到这一点. (请参阅

I often have to produce tables containing, e.g., mean, sd, and the result of a simple test. To have a reproducable and shareable workflow, I tried to do this with tables::tabular . (See here on how to include a test as a function.)

这有效:

nicetable <- tabular(sampling~treatment*var1*(mean+sd), data=tab)

但是,我没有为例如成对的Wilcoxon签名秩检验定义一个函数来比较多次采样情况下的处理,这似乎失败了:我似乎无法将正确的参数或数据传递给该函数.

However, I failed to define a function for, e.g., a paired Wilcoxon signed rank test to compare treatments in a multiple sampling case: it seems like I'm failing to pass the right arguments or data to the function.

没有比我自己愚蠢的人可以帮忙吗?

Can anybody who is less stupid than myself help out?

如果您愿意,以下是一些可重复性数据:

Here's some data for reproducibility, if you care:

structure(list(plot = structure(c(6L, 9L, 6L, 9L, 6L, 9L, 6L, 
9L, 12L, 15L, 12L, 15L, 12L, 15L, 12L, 15L, 5L, 16L, 5L, 16L, 
5L, 16L, 5L, 16L, 8L, 17L, 8L, 17L, 8L, 17L, 8L, 17L, 4L, 10L, 
4L, 10L, 4L, 10L, 4L, 10L, 2L, 11L, 2L, 11L, 2L, 11L, 2L, 11L, 
3L, 13L, 3L, 13L, 3L, 13L, 3L, 13L, 1L, 14L, 1L, 14L, 1L, 14L, 
1L, 14L, 24L, 19L, 24L, 19L, 24L, 19L, 24L, 19L, 22L, 23L, 22L, 
23L, 22L, 23L, 22L, 23L, 20L, 21L, 20L, 21L, 20L, 21L, 20L, 21L, 
7L, 18L, 7L, 18L, 7L, 18L, 7L, 18L), .Label = c("1", "2", "3", 
"4", "5", "6", "7", "8", "9", "10", "11", "12", "13", "14", "15", 
"16", "17", "18", "19", "20", "21", "22", "23", "24"), class = "factor"), 
    sampling = structure(c(1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 
    1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 
    1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 
    4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 
    4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 
    3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 
    3L, 3L, 4L, 4L, 1L, 1L, 2L, 2L, 3L, 3L, 4L, 4L), .Label = c("1", 
    "2", "3", "4"), class = "factor"), pairs = structure(c(1L, 
    1L, 1L, 1L, 1L, 1L, 1L, 1L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 2L, 
    3L, 3L, 3L, 3L, 3L, 3L, 3L, 3L, 4L, 4L, 4L, 4L, 4L, 4L, 4L, 
    4L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 5L, 6L, 6L, 6L, 6L, 6L, 6L, 
    6L, 6L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 7L, 8L, 8L, 8L, 8L, 8L, 
    8L, 8L, 8L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 9L, 10L, 10L, 10L, 
    10L, 10L, 10L, 10L, 10L, 11L, 11L, 11L, 11L, 11L, 11L, 11L, 
    11L, 12L, 12L, 12L, 12L, 12L, 12L, 12L, 12L), .Label = c("pair_1", 
    "pair_10", "pair_11", "pair_12", "pair_2", "pair_3", "pair_4", 
    "pair_5", "pair_6", "pair_7", "pair_8", "pair_9"), class = "factor"), 
    treatment = structure(c(1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 
    2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 
    1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L, 1L, 2L), .Label = c("A", 
    "B"), class = "factor"), var1 = c(4, 6, 21, 11, 6, 11, 21, 
    16, 2, 5, 18, 18, 8, 5, 26, 24, 0, 4, 28, 26, 7, 11, 20, 
    29, 1, 4, 17, 28, 20, 11, 20, 24, 11, 8, 19, 15, 11, 10, 
    16, 17, 7, 4, 18, 21, 6, 6, 18, 16, 7, 2, 17, 15, 3, 12, 
    18, 26, 8, 5, 23, 17, 9, 8, 24, 20, 7, 7, 1, 17, 9, 10, 7, 
    0, 0, 5, 18, 0, 8, 10, 15, 17, 5, 7, 24, 19, 13, 8, 20, 17, 
    1, 2, 22, 19, 6, 8, 17, 13), var2 = c(0.531857406453951, 
    0.99147016935005, 0.978386084625517, 0.547701177005542, 0.557590884845267, 
    0.986951076487171, 0.562417675727868, 0.986951076487171, 
    0.483984835736487, 0.726909676798849, 0.388579012270421, 
    0.745553604701919, 0.465094116634877, 0.726909676798849, 
    0.488003757207879, 0.557184406817338, 0.701676487711027, 
    0.869080260649975, 0.720173845681177, 0.750917673793786, 
    0.755303408639525, 0.506987878760014, 0.686245881868453, 
    0.60763119427203, 0.548453587721443, 0.703832816328718, 0.412731402996848, 
    0.717973047643672, 0.550210159561483, 0.671791216125084, 
    0.361548563337832, 0.606668062640702, 0.518806412571116, 
    0.742554357381421, 0.507677339941509, 0.923200219631054, 
    0.341071242549443, 0.681636160803754, 0.384435345144425, 
    0.61998338971563, 0.557812388143911, 0.632317782224629, 0.603677751166685, 
    0.632317782224629, 0.624604514381939, 0.623183042284434, 
    0.589665731283708, 0.338738325837909, 0.448751068565499, 
    0.620695986589587, 0.412147458001507, 0.354008373981433, 
    0.444023865279733, 0.366742726110414, 0.368307839974067, 
    0.338054566392881, 0.492950438718815, 0.722825772176568, 
    0.529502336899605, 0.834207208644564, 0.523569852219379, 
    0.834207208644564, 0.591655754114154, 0.725359004030846, 
    0.604856790039767, 0.787389376103932, 0.491331714116263, 
    0.828838159960298, 0.506594233666576, 0.75537998521935, 0.477785779781003, 
    0.925304881641062, 0.425400499022199, 0.537980402016095, 
    0.443113792876767, 0.991210220561304, 0.366372451776005, 
    0.585051630458758, 0.363869227771921, 0.67007984346546, 0.37054162796269, 
    0.574771389575503, 0.446535654066238, 0.700306153200489, 
    0.358793598876081, 0.309159322200134, 0.372983177758783, 
    0.353384010493424, 0.492456412584678, 0.359873708654463, 
    0.436447650900556, 0.591291884661869, 0.436447650900556, 
    0.603360031882414, 0.453002902987777, 0.370462648444931)), .Names = c("plot", 
"sampling", "pairs", "treatment", "var1", "var2"), row.names = c(NA, 
96L), class = "data.frame")

推荐答案

如果您显示希望使用的语法以及输出的外观,则可能会有所帮助. tabular当前工作方式的问题在于,它会将单个向量(正在处理的数据列的子集,与表的当前单元格相对应)传递给摘要函数,而没有其他参数.但是Wilcox.test和其他测试功能需要多个参数,而tabular当前无法传递这些参数.

It might help if you showed what syntax you would like to be able to use and what you would like the output to look like. The problem with how tabular currently works is that it passes a single vector (the subset of the data column being processed corresponding to the current cell of the table) to the summary function, and no other arguments. But Wilcox.test and other testing functions need multiple arguments that tabular currently has no way of passing along.

已添加了计算百分比的可能性(在公式中使用Percent),因此将来向作者/维护者提出的功能请求可能会产生类似的测试可能性,但是对于所有人而言,这将变得更加复杂可能要传入的参数.

The possibility of computing percentages (using Percent in the formula) has been added, so a feature request to the author/maintainer may yield a similar possibility for tests in the future, but this will be much more complicated with all the possible arguments to be passed in.

您可以使用上一答案中的硬编码方式创建自己的函数,但这可能会导致难以维护可重复和可共享的工作流.

You can create your own function with the extra arguments hard coded in as in the previous answer, but this could cause difficulty with trying to maintain a reproducible and sharable workflow.

另一种选择是预先计算p值和任何其他感兴趣的摘要,并将其附加到数据框,然后在公式中仅包含一个术语即可获取计算列的第一个元素.这是一个使用ddply进行计算并返回增强数据帧,然后调用tabular进行显示的示例:

Another option is to pre compute the p-value and any other summaries of interest and attach them to the data frame, then just include a term in the formula to grab the 1st element of the computed column(s). Here is an example that uses ddply to do the computations and return an augmented data frame, then calls tabular to display:

library(plyr)

tab2 <- ddply(tab, .(sampling), function(df) {
    x <- df$var1
    g <- df$treatment
    df$P.value <- wilcox.test(x[g=='A'], x[g=='B'], paired=TRUE)$p.value
    df$diff <- x[g=='A'] - x[g=='B']
    df
})

` ` <- function(x) x[1]

tabular(sampling ~ treatment*var1*(mean+sd) + 
     ` `*diff +(Wilcoxon = ` `*P.value), data=tab2)

它不如仅向tabular的公式中添加一个术语那样好,但是您可以完全控制要计算的内容然后显示.

It is not as nice as being able to just add a term to the formula for tabular, but you can control exactly what you want computed, then displayed.

这篇关于使用tabular(){表格:如何为多个采样定义函数(例如p值)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆