在大量数据的python中使用Mann Kendall [英] Using Mann Kendall in python with a lot of data

查看:55
本文介绍了在大量数据的python中使用Mann Kendall的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一组 46 年的降雨数据.它采用 46 个 numpy 数组的形式,每个数组的形状为 145, 192,因此在给定模型中的每个纬度和经度坐标处,每年都是不同的最大降雨量数据数组.

I have a set of 46 years worth of rainfall data. It's in the form of 46 numpy arrays each with a shape of 145, 192, so each year is a different array of maximum rainfall data at each lat and lon coordinate in the given model.

我需要通过对 46 年来的每个坐标进行 M-K 测试 (Mann-Kendall) 来创建 tau 值的全球地图.

I need to create a global map of tau values by doing an M-K test (Mann-Kendall) for each coordinate over the 46 years.

我仍在学习 python,所以我一直无法找到一种方法来以简单的方式浏览所有数据,而无需为每个坐标创建 27840 个新数组.

I'm still learning python, so I've been having trouble finding a way to go through all the data in a simple way that doesn't involve me making 27840 new arrays for each coordinate.

到目前为止,我已经研究了如何使用 scipy.stats.kendalltau 并使用此处的定义:https://github.com/mps9506/Mann-Kendall-Trend

So far I've looked into how to use scipy.stats.kendalltau and using the definition from here: https://github.com/mps9506/Mann-Kendall-Trend

为了澄清并添加更多细节,我需要对每个坐标进行测试,而不仅仅是对每个文件进行单独测试.例如,对于第一个 MK 测试,我想要我的 x=46,我想要 y=data1[0,0],data2[0,0],data3[0,0]...data46[0,0]].然后对每个数组中的每个坐标重复此过程.M-K 测试总共进行了 27840 次,剩下 27840 个 tau 值,然后我可以在全球地图上绘制这些值.

To clarify and add a little more detail, I need to perform a test on for each coordinate and not just each file individually. For example, for the first M-K test, I would want my x=46 and I would want y=data1[0,0],data2[0,0],data3[0,0]...data46[0,0]. Then to repeat this process for every single coordinate in each array. In total the M-K test would be done 27840 times and leave me with 27840 tau values that I can then plot on a global map.

编辑 2:

我现在遇到了另一个问题.离开建议的代码,我有以下几点: for i in range(145):对于范围内的 j(192):out[i,j] = mk_test(yrmax[:,i,j],alpha=0.05)打印出来

I'm now running into a different problem. Going off of the suggested code, I have the following: for i in range(145): for j in range(192): out[i,j] = mk_test(yrmax[:,i,j],alpha=0.05) print out

我使用 numpy.stack 将所有 46 个数组堆叠成一个形状为单个数组 (yrmax):(46L, 145L, 192L) 我已经测试过了如果我将代码从 out[i,j] 更改为 just out,它会正确计算 p 和 tau.但是,这样做会弄乱 for 循环,因此它只从最后一个坐标而不是所有坐标中获取结果.如果我保留上面的代码,我会收到错误:TypeError: list indices must be integers, not tuple

I used numpy.stack to stack all 46 arrays into a single array (yrmax) with shape: (46L, 145L, 192L) I've tested it out and it calculates p and tau correctly if I change the code from out[i,j] to just out. However, doing this messes up the for loop so it only takes the results from the last coordinate in stead of all of them. And if I leave the code as it is above, I get the error: TypeError: list indices must be integers, not tuple

我的第一个猜测是它与 mk_test 以及定义中应该如何返回信息有关.因此,我尝试更改上面链接中的代码以更改数据的返回方式,但我不断收到与元组相关的错误.所以现在我不确定哪里出了问题以及如何解决它.

My first guess was that it has to do with mk_test and how the information is supposed to be returned in the definition. So I've tried altering the code from the link above to change how the data is returned, but I keep getting errors relating back to tuples. So now I'm not sure where it's going wrong and how to fix it.

编辑 3:

我想我应该补充一点.我已经修改了链接中的定义,因此它只返回我想要创建地图的两个数值,p 和 z.

One more clarification I thought I should add. I've already modified the definition in the link so it returns only the two number values I want for creating maps, p and z.

推荐答案

感谢提供的答案和一些工作,我能够制定出一个解决方案,我将在此处为需要使用 Mann-Kendall 的任何其他人提供测试数据分析.

Thanks to the answers provided and some work I was able to work out a solution that I'll provide here for anyone else that needs to use the Mann-Kendall test for data analysis.

我需要做的第一件事是将我拥有的原始数组展平为一维数组.我知道可能有一种更简单的方法来执行此操作,但我最终根据 Grr 建议使用的代码使用了以下代码.

The first thing I needed to do was flatten the original array I had into a 1D array. I know there is probably an easier way to go about doing this, but I ultimately used the following code based on code Grr suggested using.

`x = 46
out1 = np.empty(x)
out = np.empty((0))
for i in range(146):
    for j in range(193):
        out1 = yrmax[:,i,j]
        out = np.append(out, out1, axis=0) `

然后我将结果数组(输出)重塑如下:

Then I reshaped the resulting array (out) as follows:

out2 = np.reshape(out,(27840,46))

我这样做是为了让我的数据采用与 scipy.stats.kendalltau 兼容的格式 27840 是我在地图上每个坐标处的值总数(即它只是145*192),46 是数据跨越的年数.

I did this so my data would be in a format compatible with scipy.stats.kendalltau 27840 is the total number of values I have at every coordinate that will be on my map (i.e. it's just 145*192) and the 46 is the number of years the data spans.

然后我使用从 Grr 的代码修改的以下循环来查找 Kendall-tau 以及它在 46 年期间每个纬度和经度的相应 p 值.

I then used the following loop I modified from Grr's code to find Kendall-tau and it's respective p-value at each latitude and longitude over the 46 year period.

`x = range(46)
 y = np.zeros((0))
for j in range(27840):
    b = sc.stats.kendalltau(x,out2[j,:])
    y = np.append(y, b, axis=0)`

最后,我对数据进行了一次整形,如下所示:newdata = np.reshape(y,(145,192,2)) 所以最终的数组采用合适的格式,用于创建tau 和 p 值的全局图.

Finally, I reshaped the data one for time as shown:newdata = np.reshape(y,(145,192,2)) so the final array is in a suitable format to be used to create a global map of both tau and p-values.

感谢大家的帮助!

这篇关于在大量数据的python中使用Mann Kendall的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆