R-运行Spearman相关中的p值不一致 [英] R - Inconsistent p-value in running Spearman correlation

查看：744 发布时间：2020/6/18 19:08:14 r ggplot2 graph correlation hmisc

本文介绍了R-运行Spearman相关中的p值不一致的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我的问题是当我出于某种奇怪的原因计算运行相关性时，对于相同的估计/相关性值，我没有获得相同的p值.

My problem is when I compute running correlation for some odd reason I do not get the same p-value for the same estimates/correlations values.

我的目标是要在同一data.frame(以下示例中的subject1和subject2)上的两个向量上计算连续的Spearman相关性.另外，我的窗口(向量的长度)和步幅(每个窗口之间的跳跃/步长)是恒定的.因此，查看下面的公式(来自 wiki )，我应该得到相同的结果临界t，因此对于相同的Spearman相关性，具有相同的p值.这是因为n表示相同(窗口大小相同)，而r相同.但是，我的最终p值是不同的.

My target is to calculate a running Spearman correlation on two vectors in the same data.frame (subject1 and subject2 in the example below). In addition, my window (length of the vector) and stide (the jumps/steps between each window) are constant. As such, when looking at the formula below (from wiki) I should get the same critical t hence the same p-value for the same Spearman correlation. These is because the n states the same (it's the same window size) and the r is same. However, my end p value is different.

#Needed pkgs    
require(tidyverse)
require(pspearman)
require(gtools)

#Sample data
set.seed(528)
subject1 <- rnorm(40, mean = 85, sd = 5)

set.seed(528)
subject2 <- c(
  lag(subject1[1:21]) - 10, 
  rnorm(n = 6, mean = 85, sd = 5), 
  lag(subject1[length(subject1):28]) - 10)

df <- data.frame(subject1 = subject1, 
                 subject2 = subject2) %>% 
  rowid_to_column(var = "Time") 

df[is.na(df)] <- subject1[1] - 10

rm(subject1, subject2)

#Function for Spearman
psSpearman <- function(x, y) 
{
  out <- pspearman::spearman.test(x, y,
                                  alternative = "two.sided", 
                                  approximation = "t-distribution") %>% 
    broom::tidy()
  return(data.frame(estimate = out$estimate,
                    statistic = out$statistic,
                    p.value = out$p.value )
}

#Running correlation along the subjects
dfRunningCor <- running(df$subject1, df$subject2, 
                        fun = psSpearman,
                        width = 20,
                        allow.fewer = FALSE, 
                        by = 1,
                        pad = FALSE, 
                        align = "right") %>% 
  t() %>% 
  as.data.frame() 

#Arranging the Results into easy to handle data.frame 
Results <- do.call(rbind.data.frame, dfRunningCor) %>% 
  t() %>%
  as.data.frame() %>%
  rownames_to_column(var = "Win") %>% 
  gather(CorValue, Value, -Win) %>% 
  separate(Win, c("fromIndex", "toIndex")) %>%
  mutate(fromIndex = as.numeric(substring(fromIndex, 2)),
         toIndex = as.numeric(toIndex, 2)) %>%
  spread(CorValue, Value) %>% 
  arrange(fromIndex) %>% 
  select(fromIndex, toIndex, estimate, statistic, p.value)

我的问题是当我绘制带有估计值(Spearman rho; estimate)，窗口编号(fromIndex)的Results并为p值上色时，我应该像跨相同区域的相同颜色的隧道"/路径"-我不知道. 例如，在下面的图片中，红色圆圈中相同高度的点应该具有相同的颜色-但不是.

My problem is when I plot the Results with estimates (Spearman rho;estimate), window number (fromIndex) and I color the p value, I should get like a "tunnel"/"path" of the same color across the same area - I don't. For example, in the picture below, points in the same height in the red circle should be with the same color - but the aren't.

图形代码:

Results %>% 
  ggplot(aes(fromIndex, estimate, color = p.value)) + 
  geom_line()

我到目前为止发现的原因可能是: 1.像Hmisc::rcorr()这样的函数在小样本或多次联系中往往不会给出相同的p.value.这就是为什么我使用pspearman::spearman.test的原因，根据我在这里阅读的内容，它可以解决此问题. 2.小样本-我尝试使用大样本.我仍然遇到同样的问题. 3.我尝试将p值取整-我仍然遇到相同的问题.

What I found so far is that it might might be due to: 1. Functions like Hmisc::rcorr() tend to not give the same p.value in small sample or many ties. This is why I use pspearman::spearman.test which from what I read here suppose to solve this problem. 2. Small sample size - I tried using a bigger sample size. I still get the same problem. 3. I tried rounding my p values - I still get the same problem.

谢谢您的帮助！

可能是由ggplot进行的伪"着色吗?难道ggplot只是插值最后一个"颜色直到下一个点?这就是为什么我从第5点到第6点变成浅蓝色"，而从第7点到第8点变成深蓝色"的原因呢?

Could it be "pseudo" coloring by ggplot? Could it be that ggplot just interpolate "last" color until the next point?. Which is why I get "light blue" from point 5 to 6 but "dark blue" from point 7 to 8?

R-运行Spearman相关中的p值不一致 [英] R - Inconsistent p-value in running Spearman correlation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

R-运行Spearman相关中的p值不一致 [英] R - Inconsistent p-value in running Spearman correlation

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭