在 R 中按列在两个数据帧之间应用函数 (ks.test) [英] Applying function (ks.test) between two data frames colum-wise in R

查看:21
本文介绍了在 R 中按列在两个数据帧之间应用函数 (ks.test)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的简单问题是:如何在两个数据框之间逐列执行 ks.test ?

My simple question is: How do you do a ks.test between two data frames column by column?

例如.我们有两个数据框:

Eg. We have two data frames:

D1 <- data.frame(D$Ag, D$Al, D$As, D$Ba, D$Be, D$Ca, D$Cd, D$Co, D$Cu, D$Cr)
D2 <- data.frame(S$Ag, S$Al, S$As, S$Ba, S$Be, S$Ca, S$Cd, S$Co, S$Cu, S$Cr)

注意:这只是一个例子 - 实际情况将包括更多的列,并且它们包含特定位置中某个元素的浓度.

Note: this is just an example - real case would include much more columns and they contain concentrations of a certain element in a specific location.

现在我想在两个数据帧之间运行 ks.test :

Now i would like to run a ks.test between the two data frames :

ks.test(D$Ag, S$Ag)
ks.test(D$Al, S$Al)
ks.test(D$As, S$As)

等等.不做奴隶工作怎么做?

etc. how is that done without doing the slavery work?

当我对一个数据框进行 shapiro.test 时,我只是使用:

When i did a shapiro.test on one data frame i simply use:

lshap1 <- lapply(D1, shapiro.test)
lres1 <- sapply(lshap1, `[`, c("statistic","p.value"))

我读过一些关于循环、聚合、映射的东西 - 尝试了不同的东西,比如:

I have read something abot a loop, aggregate, mapply - tried different stuff like:

apply(D1, 2, function(D2) ks.test(D2,D1[,1])$p.value)

但后来我得到了很多 p 值 = 0..当我手动执行时,情况并非如此.

but then i get a lot of p-values = 0.. . which is not the case when i do it manually.

09.10.2017我将数据作为两个数据框导入,然后将一些数据提取到较小"的数据框进行分析 - 例如在这种情况下,查看有毒元素并排除其他元素.

09.10.2017 I import the data as two data frames and then i extract some data to "smaller" data frames for analysis - e.g. in this case looking at toxic elements and excluding others.

示例数据:dput(head(D1))dput(head(D2)).

## Output dput(head(D1)):
structure(list(DF.As = c(-0.154868225169351, -0.291459578010276,
0.0355227595866723, 0.0892191549433623, 0.189115121672669,
-0.365222418641706
), DF.Cd = c(1.28810277421719, 1.45844987179892, 0.642331353138319,
0.673164023466527, 0.131548822144598, 0.146964746525726), DF.Cu
c(8.01131080231879, 
6.52606822875086, 2.93449454196807, 4.08720148249298, 1.55494291704341,
1.73663851851503), DF.Cr = c(0.164849379809527, 0.196759436988158,
0.307645386162046, 0.302917612808149, 0.187202322026229, 0.25358922601195
), DF.Ni = c(0.362592459542858, 0.527078409257359, 0.477116357433909,
0.469287608844157, 0.225865184678244, 0.355321456594576), DF.Pb
c(0.414448963979605,
0.616598678960665, -0.0531899082482045, 0.47477978516042,
0.422106471495816,
0.0326241032568164), DF.Zn = c(74.7657982668, 74.2978919524635,
36.6575117549406, 47.8440365300156, 21.4962811912273, 23.3823413091772
)), .Names = c("DF.As", "DF.Cd", "DF.Cu", "DF.Cr", "DF.Ni", "DF.Pb",
"DF.Zn"), row.names = c(NA, 6L), class = "data.frame")

## Output dput(head(D2)):
structure(list(DO.As = c(0.0150158517208966, -0.0477743050574027,
-0.121541780066373, -0.0376195600535572, 0.115393920133327,
0.265450918075612), DO.Cd = c(0.367936811743133, 0.445545318262818,
0.350071986298948, 
0.331513644782201, 0.603874629105229, 0.598527030667747), DO.Cu
c(1.65127139067621, 
1.90306634226191, 1.08280240161368, 1.12130376047927, 1.23137174481965,
1.16618813144813), DO.Cr = c(0.162996340978278, 0.493799568371693,
0.18441814919492, 0.179883906525139, 0.128058190333676, 0.030406737049484
), DO.Ni = c(0.290717040452464, 0.331891307317008, 0.387987078391917,
0.36147470695146, 0.774910299821917, 0.323259411199816), DO.Pb
c(-0.0584055598838365, 
0.377799120780818, -0.0741768575020139, 0.511278669452117,
0.320822577941608, 0.250377389869303), DO.Zn = c(16.5625482436821,
14.5084409384572, 16.571001044493, 18.4509635406253, 15.6876446591721,
12.7649440587945)), .Names = c("DO.As", "DO.Cd", "DO.Cu", "DO.Cr", "DO.Ni",
"DO.Pb", "DO.Zn"), row.names = c(NA, 6L), class = "data.frame")

我发布这个是因为我仍然收到一个错误:

I am posting this as i still get an error:

## This is code for execution:
col.names = colnames(D1)
lapply(col.names, function(t, d1, d2){ks.test(d1[, t], d2[, t])}, D1, D2)

## Output:
 Error in `[.data.frame`(d2, , t) : undefined columns chosen

(回溯按钮显示):

6.stop("undefined columns selected") 
5.`[.data.frame`(d2, , t) 
4.d2[, t] 
3.ks.test(d1[, t], d2[, t]) 
2.FUN(X[[i]], ...) 
1.lapply(col.names, function(t, d1, d2) {ks.test(d1[, t], d2[, t])}, D1, D2) 

推荐答案

用一些随机数和相同的列名创建了两个 data.frames D1D2.

Created two data.frames D1 and D2 with some random numbers and same column names.

set.seed(12)
D1 = data.frame(A=rnorm(n = 30,mean = 5,sd = 2.5),B=rnorm(n = 30,mean = 4.5,sd = 2.2),C=rnorm(n = 30,mean = 2.5,sd = 12))
D2 = data.frame(A=rnorm(n = 30,mean = 5,sd = 2.49),B=rnorm(n = 30,mean = 4.4,sd = 2.2),C=rnorm(n = 30,mean = 2,sd = 12))

现在我们可以使用列名循环并将其传递给D1D2 对相应的ks.test 执行ks.test各个数据框的列.

Now we can use the column names to loop through and pass it to D1 and D2 to perform the ks.test on the corresponding columns of the respective data.frames.

col.names = colnames(D1)
lapply(col.names,function(t,d1,d2){ks.test(d1[,t],d2[,t])},D1,D2)

#[[1]]

#Two-sample Kolmogorov-Smirnov test

#data:  d1[, t] and d2[, t]
#D = 0.167, p-value = 0.81
#alternative hypothesis: two-sided


#[[2]]

#Two-sample Kolmogorov-Smirnov test

#data:  d1[, t] and d2[, t]
#D = 0.233, p-value = 0.39
#alternative hypothesis: two-sided


#[[3]]

#Two-sample Kolmogorov-Smirnov test

#data:  d1[, t] and d2[, t]
#D = 0.2, p-value = 0.59
#alternative hypothesis: two-sided

在您在问题描述中使用的符号中,理想情况下应该使用以下代码:

In the notation you have used in the question description, ideally the following code should work:

col.names =colnames(S)
lapply(col.names,function(t,d1,d2){ks.test(d1[,t],d2[,t])},D,S)

这篇关于在 R 中按列在两个数据帧之间应用函数 (ks.test)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆