在R中的数据帧的每行上运行Fisher测试 [英] Running a Fisher test on each row of a data frame in R

查看:147
本文介绍了在R中的数据帧的每行上运行Fisher测试的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

 
INVESTIGATOR_ID \\\\ SAMPLE_ID \ \\ MEASUREMENT
1000 \\\ 38942 \\\ 20.1
1000 \\\ 38942 \\\ 10.2
1001 \ \\ 38432 \\\ 5.6
1002 \\\ 553 \\\ 10.6
...

我的目标是将每个研究者的样本量与整个数据集的测量值进行比较:


  1. 每位研究人员计算那些与该研究者收集的测量平均值相差+/-一个标准差的测量值。

  2. 对于整个数据帧,计算那些+/-一个标准的测量值偏离平均值。

  3. 对于每个具有抽样测量的研究者,与平均值相差一个标准偏差,执行Fisher精确测试,以确定样本数是否显着(与整个数据帧相比)。

我已经使用Plyr库( ddply )来总结数据由 INVESTIGATOR_ID 。将数据合并在一起,最终结果是一个数据框,其中每行由调查员ID,由该研究者测量的样本数,由该研究者测量的样本数+/- 1SD,15000和50000(其中15000和50000是对应的样本号+/- 1 SD和整个数据帧的样本总数)。

  INVESTIGATOR_ID \\\ NUMBER_OF_SAMPLES \\\ NUMBER_OF_SAMPLES_SD \\\ 15000 \\\ 50000 

如何将数据框中的每一行,将字段 c(2:5)转换为矩阵,运行Fisher测试,并创建一个新的数据框架的结果?



感谢任何建议。

解决方案

这样的东西(从我的脚本改编,可能需要更多的修改以适应您的需要):

  get_fisher< ;  -  function(df){
mat< - matrix (as.numeric(df [c(2:5)]),ncol = 2)
f< - fisher.test(as.table(mat),alt =two.sided)
return(c(df [1],f $ p.value))
}

fishers< - apply(df,1,get_fisher)


I have a data frame of ~50k measurements taken by ~3k investigators.

INVESTIGATOR_ID \\\ SAMPLE_ID \\\ MEASUREMENT
1000            \\\ 38942     \\\ 20.1
1000            \\\ 38942     \\\ 10.2
1001            \\\ 38432     \\\ 5.6
1002            \\\ 553       \\\ 10.6
...

My goal is to compare sample measurements per investigator to measurements from the entire data set:

  1. For each investigator, count those measurements that are +/- one standard deviation from the measurement mean collected by that investigator.
  2. For the entire data frame, count those measurements that are +/- one standard deviation from the mean.
  3. For each investigator that has sample measurements +/- one standard deviation from the mean, run a Fisher's exact test to determine if the number of samples is significant (compared to the entire data frame).

I've used the Plyr library (ddply) to summarise the data by INVESTIGATOR_ID. Merging data together, the end result is a data frame, where each row consists of an investigator ID, the number of samples measured by that investigator, number of samples measured by that investigator +/- 1 SD, 15000, and 50000 (where 15000 and 50000 are the corresponding sample numbers +/- 1 SD and the total number of samples for the entire data frame).

INVESTIGATOR_ID \\\ NUMBER_OF_SAMPLES \\\ NUMBER_OF_SAMPLES_SD \\\ 15000 \\\ 50000

How do I take each row of the data frame, convert fields c(2:5) to a matrix, run a Fisher's test, and create a new data frame of the results?

Thanks for any suggestions.

解决方案

Something like that (adapted from a script of mine, could need more modifications to fit you needs):

get_fisher <- function(df){
  mat <- matrix(as.numeric(df[c(2:5)]), ncol=2)
  f <- fisher.test(as.table(mat), alt="two.sided")
  return(c(df[1], f$p.value))
}

fishers <- apply(df, 1,  get_fisher)

这篇关于在R中的数据帧的每行上运行Fisher测试的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆