ValueError: 数据必须为正 (boxcox scipy) [英] ValueError: Data must be positive (boxcox scipy)

查看:139
本文介绍了ValueError: 数据必须为正 (boxcox scipy)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试将我的数据集转换为正态分布.

I'm trying to transform my dataset to a normal distribution.

0      8.298511e-03
1      3.055319e-01
2      6.938647e-02
3      2.904091e-02
4      7.422441e-02
5      6.074046e-02
6      9.265747e-04
7      7.521846e-02
8      5.960521e-02
9      7.405019e-04
10     3.086551e-02
11     5.444835e-02
12     2.259236e-02
13     4.691038e-02
14     6.463911e-02
15     2.172805e-02
16     8.210005e-02
17     2.301189e-02
18     4.073898e-07
19     4.639910e-02
20     1.662777e-02
21     8.662539e-02
22     4.436425e-02
23     4.557591e-02
24     3.499897e-02
25     2.788340e-02
26     1.707958e-02
27     1.506404e-02
28     3.207647e-02
29     2.147011e-03
30     2.972746e-02
31     1.028140e-01
32     2.183737e-02
33     9.063370e-03
34     3.070437e-02
35     1.477440e-02
36     1.036309e-02
37     2.000609e-01
38     3.366233e-02
39     1.479767e-03
40     1.137169e-02
41     1.957088e-02
42     4.921303e-03
43     4.279257e-02
44     4.363429e-02
45     1.040123e-01
46     2.930958e-02
47     1.935434e-03
48     1.954418e-02
49     2.980253e-02
50     3.643772e-02
51     3.411437e-02
52     4.976063e-02
53     3.704608e-02
54     7.044161e-02
55     8.101365e-03
56     9.310477e-03
57     7.626637e-02
58     8.149728e-03
59     4.157399e-01
60     8.200258e-02
61     2.844295e-02
62     1.046601e-01
63     6.565680e-02
64     9.825436e-04
65     9.353639e-02
66     6.535298e-02
67     6.979044e-04
68     2.772859e-02
69     4.378422e-02
70     2.020185e-02
71     4.774493e-02
72     6.346146e-02
73     2.466264e-02
74     6.636585e-02
75     2.548934e-02
76     1.113937e-06
77     5.723409e-02
78     1.533288e-02
79     1.027341e-01
80     4.294570e-02
81     4.844853e-02
82     5.579620e-02
83     2.531824e-02
84     1.661426e-02
85     1.430836e-02
86     3.157232e-02
87     2.241722e-03
88     2.946256e-02
89     1.038383e-01
90     1.868837e-02
91     8.854596e-03
92     2.391759e-02
93     1.612714e-02
94     1.007823e-02
95     1.975513e-01
96     3.581289e-02
97     1.199747e-03
98     1.263381e-02
99     1.966746e-02
100    4.040786e-03
101    4.497264e-02
102    4.030524e-02
103    8.627087e-02
104    3.248317e-02
105    5.727582e-03
106    1.781355e-02
107    2.377991e-02
108    4.299568e-02
109    3.664353e-02
110    5.167902e-02
111    4.006848e-02
112    7.072990e-02
113    6.744938e-03
114    1.064900e-02
115    9.823497e-02
116    8.992714e-03
117    1.792453e-01
118    6.817763e-02
119    2.588843e-02
120    1.048027e-01
121    6.468491e-02
122    1.035536e-03
123    8.800684e-02
124    5.975065e-02
125    7.365861e-04
126    4.209485e-02
127    4.232421e-02
128    2.371866e-02
129    5.894714e-02
130    7.177195e-02
131    2.116566e-02
132    7.579219e-02
133    3.174744e-02
134    0.000000e+00
135    5.786439e-02
136    1.458493e-02
137    9.820156e-02
138    4.373873e-02
139    4.271649e-02
140    5.532575e-02
141    2.311324e-02
142    1.644508e-02
143    1.328273e-02
144    3.908473e-02
145    2.355468e-03
146    2.519321e-02
147    1.131868e-01
148    1.708967e-02
149    1.027661e-02
150    2.439899e-02
151    1.604058e-02
152    1.134323e-02
153    2.247722e-01
154    3.408590e-02
155    2.222239e-03
156    1.659830e-02
157    2.284733e-02
158    4.618550e-03
159    3.674162e-02
160    4.131283e-02
161    8.846273e-02
162    2.504404e-02
163    6.004396e-03
164    1.986309e-02
165    2.347111e-02
166    3.865636e-02
167    3.672307e-02
168    6.658419e-02
169    3.726879e-02
170    7.600138e-02
171    7.184871e-03
172    1.142840e-02
173    9.741311e-02
174    8.165448e-03
175    1.529210e-01
176    6.648081e-02
177    2.617601e-02
178    9.547816e-02
179    6.857775e-02
180    8.129399e-04
181    7.107914e-02
182    5.884794e-02
183    8.398721e-04
184    6.972981e-02
185    4.461767e-02
186    2.264404e-02
187    5.566633e-02
188    6.595136e-02
189    2.301914e-02
190    7.488919e-02
191    3.108619e-02
192    4.989364e-07
193    4.834949e-02
194    1.422578e-02
195    9.398186e-02
196    4.870391e-02
197    3.841369e-02
198    6.406801e-02
199    2.603315e-02
200    1.692629e-02
201    1.409982e-02
202    4.099215e-02
203    2.093724e-03
204    2.640732e-02
205    1.032129e-01
206    1.581881e-02
207    8.977325e-03
208    1.941141e-02
209    1.502126e-02
210    9.923589e-03
211    2.757357e-01
212    3.096234e-02
213    4.388900e-03
214    1.784778e-02
215    2.179550e-02
216    3.944159e-03
217    3.703552e-02
218    4.033897e-02
219    1.157076e-01
220    2.400446e-02
221    5.761179e-03
222    1.899621e-02
223    2.401468e-02
224    4.458745e-02
225    3.357898e-02
226    5.331003e-02
227    3.488753e-02
228    7.466599e-02
229    6.075236e-03
230    9.815318e-03
231    9.598735e-02
232    7.103607e-03
233    1.100602e-01
234    5.677641e-02
235    2.420500e-02
236    9.213369e-02
237    4.024043e-02
238    6.987694e-04
239    8.612055e-02
240    5.663353e-02
241    4.871693e-04
242    4.533811e-02
243    3.593244e-02
244    1.982537e-02
245    5.490786e-02
246    5.603109e-02
247    1.671653e-02
248    6.522711e-02
249    3.341356e-02
250    2.378629e-06
251    4.299939e-02
252    1.223163e-02
253    8.392798e-02
254    4.272826e-02
255    3.183946e-02
256    4.431299e-02
257    2.661024e-02
258    1.686707e-02
259    4.070924e-03
260    3.325947e-02
261    2.023611e-03
262    2.402284e-02
263    8.369778e-02
264    1.375093e-02
265    8.899898e-03
266    2.148740e-02
267    1.301483e-02
268    8.355791e-03
269    2.549934e-01
270    2.792516e-02
271    4.652563e-03
272    1.556313e-02
273    1.936942e-02
274    3.547794e-03
275    3.412516e-02
276    3.932606e-02
277    5.305868e-02
278    2.354438e-02
279    5.379380e-03
280    1.904203e-02
281    2.045495e-02
282    3.275855e-02
283    3.007389e-02
284    8.227664e-02
285    2.479949e-02
286    6.573835e-02
287    5.165842e-03
288    7.599650e-03
289    9.613557e-02
290    6.690175e-03
291    1.779880e-01
292    5.076263e-02
293    3.117607e-02
294    7.495692e-02
295    3.707768e-02
296    7.086975e-04
297    8.935981e-02
298    5.624249e-02
299    7.105331e-04
300    3.339868e-02
301    3.354603e-02
302    2.041988e-02
303    3.862522e-02
304    5.977081e-02
305    1.730081e-02
306    6.909621e-02
307    3.729478e-02
308    3.940647e-07
309    4.385336e-02
310    1.391891e-02
311    8.898305e-02
312    3.840141e-02
313    3.214408e-02
314    4.284080e-02
315    1.841022e-02
316    1.528207e-02
317    3.106559e-03
318    3.945481e-02
319    2.085094e-03
320    2.464190e-02
321    7.844914e-02
322    1.526590e-02
323    9.922147e-03
324    1.649218e-02
325    1.341602e-02
326    8.124446e-03
327    2.867380e-01
328    2.663867e-02
329    5.342012e-03
330    1.752612e-02
331    2.010863e-02
332    3.581845e-03
333    3.652284e-02
334    4.484362e-02
335    4.600939e-02
336    2.213280e-02
337    5.494917e-03
338    2.016594e-02
339    2.118010e-02
340    2.964000e-02
341    3.405549e-02
342    1.014185e-01
343    2.451624e-02
344    7.966998e-02
345    5.301538e-03
346    8.198895e-03
347    8.789368e-02
348    7.222417e-03
349    1.448276e-01
350    5.676056e-02
351    2.987054e-02
352    6.851434e-02
353    4.193034e-02
354    7.025054e-03
355    8.557358e-02
356    5.812736e-02
357    2.263676e-02
358    2.922588e-02
359    3.363161e-02
360    1.495056e-02
361    5.871619e-02
362    6.235094e-02
363    1.691340e-02
364    5.361939e-02
365    3.722318e-02
366    9.828477e-03
367    4.155345e-02
368    1.327760e-02
369    7.205372e-02
370    4.151130e-02
371    3.265365e-02
372    2.879418e-02
373    2.314340e-02
374    1.653692e-02
375    1.077611e-02
376    3.481427e-02
377    1.815487e-03
378    2.232305e-02
379    1.005192e-01
380    1.491262e-02
381    3.752658e-02
382    1.271613e-02
383    1.223707e-02
384    8.088923e-03
385    2.572550e-01
386    2.300194e-02
387    2.847960e-02
388    1.782098e-02
389    1.900759e-02
390    3.647629e-03
391    3.723368e-02
392    4.079514e-02
393    5.510332e-02
394    3.072313e-02
395    4.183566e-03
396    1.891549e-02
397    1.870293e-02
398    3.182769e-02
399    4.167840e-02
400    1.343152e-01
401    2.451973e-02
402    7.567017e-02
403    4.837843e-03
404    6.477297e-03
405    7.664675e-02
Name: value, dtype: float64

这是我用来转换数据集的代码:

This is the code I used for transforming dataset:

from scipy import stats
x,_ = stats.boxcox(df)

我收到此错误:

            if any(x <= 0):
-> 1031         raise ValueError("Data must be positive.")
   1032 
   1033     if lmbda is not None:  # single transformation

ValueError: Data must be positive

是不是因为我的值太小而导致错误?不知道我做错了什么.刚开始使用 boxcox,在本例中可能使用不当.对建议和替代方案持开放态度.谢谢!

Is it because my values are too small that it's producing an error? Not sure what I'm doing wrong. New to using boxcox, could be using it incorrectly in this example. Open to suggestions and alternatives. Thanks!

推荐答案

您的数据包含值 0(在索引 134).当 boxcox 说数据必须是正数时,它意味着严格正数.

Your data contains the value 0 (at index 134). When boxcox says the data must be positive, it means strictly positive.

你的数据是什么意思?0有意义吗?那个 0 实际上是一个非常小的数字,四舍五入为 0?

What is the meaning of your data? Does 0 make sense? Is that 0 actually a very small number that was rounded down to 0?

您可以简单地丢弃那个 0.或者,您可以执行以下操作.(这相当于暂时丢弃 0,然后使用 -1/λ 作为转换后的 0 值,其中 λ 是 Box-Cox 转换参数.)

You could simply discard that 0. Alternatively, you could do something like the following. (This amounts to temporarily discarding the 0, and then using -1/λ for the transformed value of 0, where λ is the Box-Cox transformation parameter.)

首先,创建一些包含一个 0 的数据(所有其他值都是正数):

First, create some data that contains one 0 (all other values are positive):

In [13]: np.random.seed(8675309)

In [14]: data = np.random.gamma(1, 1, size=405)

In [15]: data[100] = 0

(在您的代码中,您可以将其替换为 data = df.values.)

(In your code, you would replace that with, say, data = df.values.)

将严格的正数据复制到posdata:

Copy the strictly positive data to posdata:

In [16]: posdata = data[data > 0]

找到最优的 Box-Cox 变换,并验证 λ 是否为正.如果 λ ≤ 0,此变通方法不起作用.

Find the optimal Box-Cox transformation, and verify that λ is positive. This work-around doesn't work if λ ≤ 0.

In [17]: bcdata, lam = boxcox(posdata)

In [18]: lam
Out[18]: 0.244049919975582

创建一个新数组来保存该结果,以及 0 变换的极限值(即 -1/λ):

Make a new array to hold that result, along with the limiting value of the transform of 0 (which is -1/λ):

In [19]: x = np.empty_like(data)

In [20]: x[data > 0] = bcdata

In [21]: x[data == 0] = -1/lam

下图显示了 datax 的直方图.

The following plot shows the histograms of data and x.

这篇关于ValueError: 数据必须为正 (boxcox scipy)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆