如何计算R中的经验CDF? [英] How can I calculate an empirical CDF in R?

查看:135
本文介绍了如何计算R中的经验CDF?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在从一个看起来像这样的文件中读取一个稀疏表:

I'm reading a sparse table from a file which looks like:

1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1  0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1

注释行的长度不同.

每行代表一个模拟.每行第i列中的值表示在此模拟中观察到i-1值的次数.例如,在第一个模拟(第一行)中,我们得到了一个值为'0'的单个结果(第一列),七个结果为'2'的结果(第三列),等等.

Each row represents a single simulation. The value in the i-th column in each row says how many times value i-1 was observed in this simulation. For example, in the first simulation (first row), we got a single result with value '0' (first column), 7 results with value '2' (third column) etc.

我希望为所有模拟结果创建一个平均累积分布函数(CDF),以便以后可以使用它来为真实结果计算经验p值.

I wish to create an average cumulative distribution function (CDF) for all the simulation results, so I could later use it to calculate an empirical p-value for true results.

要做到这一点,我首先要总结每一列,但对于undef列,我需要取零.

To do this I can first sum up each column, but I need to take zeros for the undef columns.

如何读取具有不同行长的此类表?如何汇总用0'替换'undef'值的列?最后,我如何创建CDF? (我可以手动执行此操作,但我想有些软件包可以执行此操作.)

How do I read such a table with different row lengths? How do I sum up columns replacing 'undef' values with 0'? And finally, how do I create the CDF? (I can do this manually but I guess there is some package which can do that).

推荐答案

这将读取以下数据:

dat <- textConnection("1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1  0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1")
df <- data.frame(scan(dat, fill = TRUE, what = as.list(rep(1, 29))))
names(df) <- paste("Val", 1:29)
close(dat)

结果:

> head(df)
  Val 1 Val 2 Val 3 Val 4 Val 5 Val 6 Val 7 Val 8 Val 9 Val 10 Val 11 Val 12
1     1     0     7     0     0     1     0     0     0      5      0      0
2     1     0     0     1     0     0     0     3     0      0      0      0
3     0     0     0     1     0     0     0     2     0      0      0      0
4     1     0     0     1     0     3     0     0     0      0      1      0
5     0     0     0     1     0     0     0     2     0      0      0      0
....

如果数据在文件中,请提供文件名而不是dat.根据您提供的数据,此代码假定最多有29列.更改29以适合实际数据.

If the data are in a file, provide the file name instead of dat. This code presumes that there are a maximum of 29 columns, as per the data you supplied. Alter the 29 to suit the real data.

我们使用

df.csum <- colSums(df, na.rm = TRUE)

ecdf()函数生成所需的ECDF,

the ecdf() function generates the ECDF you wanted,

df.ecdf <- ecdf(df.csum)

,我们可以使用plot()方法绘制它:

and we can plot it using the plot() method:

plot(df.ecdf, verticals = TRUE)

这篇关于如何计算R中的经验CDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆