如何计算R中的经验CDF? [英] How can I calculate an empirical CDF in R?

查看：135 发布时间：2020/5/7 19:28:44 r matrix cumulative-sum

本文介绍了如何计算R中的经验CDF?的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在从一个看起来像这样的文件中读取一个稀疏表:

I'm reading a sparse table from a file which looks like:

1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1  0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1

注释行的长度不同.

每行代表一个模拟.每行第i列中的值表示在此模拟中观察到i-1值的次数.例如，在第一个模拟(第一行)中，我们得到了一个值为'0'的单个结果(第一列)，七个结果为'2'的结果(第三列)，等等.

Each row represents a single simulation. The value in the i-th column in each row says how many times value i-1 was observed in this simulation. For example, in the first simulation (first row), we got a single result with value '0' (first column), 7 results with value '2' (third column) etc.

我希望为所有模拟结果创建一个平均累积分布函数(CDF)，以便以后可以使用它来为真实结果计算经验p值.

I wish to create an average cumulative distribution function (CDF) for all the simulation results, so I could later use it to calculate an empirical p-value for true results.

要做到这一点，我首先要总结每一列，但对于undef列，我需要取零.

To do this I can first sum up each column, but I need to take zeros for the undef columns.

如何读取具有不同行长的此类表?如何汇总用0'替换'undef'值的列?最后，我如何创建CDF? (我可以手动执行此操作，但我想有些软件包可以执行此操作.)

How do I read such a table with different row lengths? How do I sum up columns replacing 'undef' values with 0'? And finally, how do I create the CDF? (I can do this manually but I guess there is some package which can do that).

推荐答案

这将读取以下数据:

dat <- textConnection("1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1  0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1")
df <- data.frame(scan(dat, fill = TRUE, what = as.list(rep(1, 29))))
names(df) <- paste("Val", 1:29)
close(dat)

结果:

> head(df)
  Val 1 Val 2 Val 3 Val 4 Val 5 Val 6 Val 7 Val 8 Val 9 Val 10 Val 11 Val 12
1     1     0     7     0     0     1     0     0     0      5      0      0
2     1     0     0     1     0     0     0     3     0      0      0      0
3     0     0     0     1     0     0     0     2     0      0      0      0
4     1     0     0     1     0     3     0     0     0      0      1      0
5     0     0     0     1     0     0     0     2     0      0      0      0
....

如果数据在文件中，请提供文件名而不是dat.根据您提供的数据，此代码假定最多有29列.更改29以适合实际数据.

If the data are in a file, provide the file name instead of dat. This code presumes that there are a maximum of 29 columns, as per the data you supplied. Alter the 29 to suit the real data.

我们使用

df.csum <- colSums(df, na.rm = TRUE)

ecdf()函数生成所需的ECDF，

the ecdf() function generates the ECDF you wanted,

df.ecdf <- ecdf(df.csum)

，我们可以使用plot()方法绘制它:

and we can plot it using the plot() method:

plot(df.ecdf, verticals = TRUE)

这篇关于如何计算R中的经验CDF?的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何计算R中的经验CDF? [英] How can I calculate an empirical CDF in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何计算R中的经验CDF? [英] How can I calculate an empirical CDF in R?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭