从 JAVA 调用 R 以获取卡方统计量和 p 值 [英] Call R from JAVA to get Chi-squared statistic and p-value

查看:30
本文介绍了从 JAVA 调用 R 以获取卡方统计量和 p 值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 JAVA 中有两个 4*4 矩阵,其中一个矩阵保存观察到的计数,另一个保存预期的计数.

I have two 4*4 matrices in JAVA, where one matrix holds observed counts and the other expected counts.

我需要一种自动方法来根据这两个矩阵之间的卡方统计量计算 p 值;然而,据我所知,JAVA没有这样的功能.

I need an automated way to calculate the p-value from the chi-square statistic between these two matrices; however, JAVA has no such function as far as I am aware.

我可以通过将两个矩阵以 .csv 文件格式读入 R 中,然后使用 chisq.test 函数来计算卡方及其 p 值,如下所示:

I can calculate the chi-square and its p-value by reading the two matrices into R as .csv file formats, and then using the chisq.test function as follows:

obs<-read.csv("obs.csv")
exp<-read.csv("exp.csv")
chisq.test(obs,exp)

其中 .csv 文件的格式如下:

where the format of the .csv files would as follows:

A, C, G, T
A, 197.136, 124.32, 63.492, 59.052
C, 124.32, 78.4, 40.04, 37.24
G, 63.492, 40.04, 20.449, 19.019
T, 59.052, 37.24, 19.019, 17.689

给定这些命令,R 将给出以下格式的输出:

Given these commands, R will give an output of the format:

X-squared = 20.6236, df = 9, p-value = 0.01443

其中包括我正在寻找的 p 值.

which includes the p-value I was looking for.

有谁知道自动化以下过程的有效方法:

Does anyone know of an efficient way to automate the process of:

1) 将我的矩阵从 JAVA 输出到 .csv 文件中2) 将 .csv 文件上传到 R3) 将 .csv 文件上的 chisq.test 调用到 R4)将输出的p值返回到JAVA中?

1) Outputting my matrices from JAVA into .csv files 2) Uploading the .csv files into R 3) Calling the chisq.test on the .csv files into R 4) Returning the outputted p-value back into JAVA?

感谢您的帮助....

推荐答案

(至少)有两种方法可以解决这个问题.

There are (at least) two ways of going about this.


您可以使用 Rscript.exe 从命令行执行 Rscripts.例如.在您的脚本中,您将拥有:

You can execute Rscripts from the command line with Rscript.exe. E.g. in your script you would have:

# Parse arguments.
# ...
# ...

chisq.test(obs, exp)

与其在 Java 中创建 CSV 并让 R 读取它们,您应该能够将它们直接传递给 R.我认为没有必要创建 CSV 并以这种方式传递数据,除非您的矩阵非常大.您可以传递的命令行参数的大小有限制(我认为因操作系统而异).

Rather than creating CSVs in Java and having R read them, you should be able to pass them straight to R. I don't see the need to create CSVs and pass data that way, UNLESS your matrices are quite big. There are limitations on the size of command line arguments you can pass (varies across operating system I think).

您可以将参数传递到 Rscripts 并使用 commandArgs() 函数或各种包(例如 optparsegetopt).请参阅此主题了解详情.

You can pass arguments into Rscripts and parse them using the commandArgs() functions or with various packages (e.g. optparse or getopt). See this thread for more information.

Java 中有多种从命令行调用和读取的方法.我对此知之甚少,无法为您提供建议,但稍微谷歌一下就会给您一个结果.从命令行调用脚本是这样完成的:

There are several ways of calling and reading from the command line in Java. I don't know enough about it to give you advice but a bit of googling will give you a result. Calling a script from the command line is done like this:

Rscript my_script.R

<小时>

日本联合研究所

JRI 让您可以直接从 Java 与 R 对话.下面是一个示例,说明如何将双精度数组传递给 R 并让 R 求和(现在是 Java):


JRI

JRI lets you talk to R straight from Java. Here's an example of how you would pass a double array to R and have R sum it (this is Java now):

// Start R session.
Rengine re = new Rengine (new String [] {"--vanilla"}, false, null);

// Check if the session is working.
if (!re.waitForR()) {
    return;
}

re.assign("x", new double[] {1.5, 2.5, 3.5});
REXP result = re.eval("(sum(x))");
System.out.println(result.asDouble());
re.end();

这里的assign()函数与R中的相同:

The function assign() here is the same as doing this in R:

x <- c(1.5, 2.5, 3.5)

您应该能够弄清楚如何扩展它以使用矩阵.

You should be able to work out how to extend this to work with a matrix.

我认为 JRI 一开始是相当困难的.因此,如果您想快速完成此操作,命令行选项可能是最佳选择.我会说 JRI 方法一旦设置好就不会那么混乱.如果您在 R 和 Java 之间有很多来回的情况,那绝对比调用多个脚本要好.

I think JRI is quite difficult at the beginning. So if you want to get this done quickly the command line option is probably best. I would say the JRI approach is less messy once you get it set up though. And if you have situations where you have a lot of back and forth between R and Java it is definitely better than calling multiple scripts.

  1. JRI 链接.
  2. 推荐用于设置 JRI 的 Eclipse 插件.

这篇关于从 JAVA 调用 R 以获取卡方统计量和 p 值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆