如何计算R中的经验CDF? [英] How can I calculate an empirical CDF in R?
问题描述
我正在从一个看起来像这样的文件中读取一个稀疏表:
I'm reading a sparse table from a file which looks like:
1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1
注释行的长度不同.
每行代表一个模拟.每行第i列中的值表示在此模拟中观察到i-1值的次数.例如,在第一个模拟(第一行)中,我们得到了一个值为'0'的单个结果(第一列),七个结果为'2'的结果(第三列),等等.
Each row represents a single simulation. The value in the i-th column in each row says how many times value i-1 was observed in this simulation. For example, in the first simulation (first row), we got a single result with value '0' (first column), 7 results with value '2' (third column) etc.
我希望为所有模拟结果创建一个平均累积分布函数(CDF),以便以后可以使用它来为真实结果计算经验p值.
I wish to create an average cumulative distribution function (CDF) for all the simulation results, so I could later use it to calculate an empirical p-value for true results.
要做到这一点,我首先要总结每一列,但对于undef列,我需要取零.
To do this I can first sum up each column, but I need to take zeros for the undef columns.
如何读取具有不同行长的此类表?如何汇总用0'替换'undef'值的列?最后,我如何创建CDF? (我可以手动执行此操作,但我想有些软件包可以执行此操作.)
How do I read such a table with different row lengths? How do I sum up columns replacing 'undef' values with 0'? And finally, how do I create the CDF? (I can do this manually but I guess there is some package which can do that).
推荐答案
这将读取以下数据:
dat <- textConnection("1 0 7 0 0 1 0 0 0 5 0 0 0 0 2 0 0 0 0 1 0 0 0 1
1 0 0 1 0 0 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1
1 0 0 1 0 3 0 0 0 0 1 0 0 0 1
0 0 0 1 0 0 0 2 0 0 0 0 1 0 0 0 1 0 1 0 0 1 1 2 1 0 1 0 1")
df <- data.frame(scan(dat, fill = TRUE, what = as.list(rep(1, 29))))
names(df) <- paste("Val", 1:29)
close(dat)
结果:
> head(df)
Val 1 Val 2 Val 3 Val 4 Val 5 Val 6 Val 7 Val 8 Val 9 Val 10 Val 11 Val 12
1 1 0 7 0 0 1 0 0 0 5 0 0
2 1 0 0 1 0 0 0 3 0 0 0 0
3 0 0 0 1 0 0 0 2 0 0 0 0
4 1 0 0 1 0 3 0 0 0 0 1 0
5 0 0 0 1 0 0 0 2 0 0 0 0
....
如果数据在文件中,请提供文件名而不是dat
.根据您提供的数据,此代码假定最多有29列.更改29
以适合实际数据.
If the data are in a file, provide the file name instead of dat
. This code presumes that there are a maximum of 29 columns, as per the data you supplied. Alter the 29
to suit the real data.
我们使用
df.csum <- colSums(df, na.rm = TRUE)
ecdf()
函数生成所需的ECDF,
the ecdf()
function generates the ECDF you wanted,
df.ecdf <- ecdf(df.csum)
,我们可以使用plot()
方法绘制它:
and we can plot it using the plot()
method:
plot(df.ecdf, verticals = TRUE)
这篇关于如何计算R中的经验CDF?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!