pandas 对某些方法和某些大小的数据帧进行插值返回ValueErrors [英] Pandas Interpolate returning ValueErrors for some methods and some sizes of dataframes
问题描述
我在插入熊猫数据框时遇到一些问题.
I am having some issues with interpolation of a Pandas dataframe.
基本上,我有一个295339行的数据框,并人工生成了nan,以研究不同的采样率和完成方法.
Basically, I have a dataframe of 295339 rows and have artificially generated nan's to study different sampling rates and completion methods.
问题是,当我对采样率和完成方法进行某种组合时,所有方法都可以解决,而对于其他人,我会收到以下错误消息,
The issue is that when I do some combinations of my sampling rates and completion methods it all works out while for others I get the following error message,
ValueError: The number of derivatives at boundaries does not match: expected. 1, got 0+0.
ValueError
的类型取决于我使用的采样率和完成方法的组合.
The type of ValueError
depends on the combination of sampling rate and completion method I'm using.
因此,例如,如果我每位客户每小时赚一纳,然后使用线性或三次方法进行插值,那么它将起作用.但是,如果我每位客户每四个小时采样一次,则它适用于线性方法,而不适用于三次方法(插值波纹管的代码):
So for example, if I make one nan per hour per customer and then interpolate using either the linear or the cubic method it works. But if I sample once every four hours per customer it works for the linear method but not for the cubic method (code for the interpolation bellow):
latitude = my_frame.filter(['Customer_id', 'Lat'], axis=1)
latitude = latitude.groupby('Customer_id').apply(lambda group: group.interpolate(method= 'cubic')
奇怪的是,在测试过程中,出于速度目的,我仅将方法限制在3个客户(代表8500行)上,没有出现任何问题.
The weird thing is that during my tests I limited my approach to 3 customers (representing 8500 rows) for speed purposes and no issues were raised.
所以,我的问题是为什么会发生这种情况,并且有任何解决方法.
So, my question is why does this happen and is there any workaround.
推荐答案
我发现问题是,对于记录较少的客户,我无法使用三次方法进行插值,因为他们没有至少4个已知点
I found that the issue was that for customers with fewer records I wasn't capable to interpolate using the cubic method because they did not have at least 4 known points.
这篇关于 pandas 对某些方法和某些大小的数据帧进行插值返回ValueErrors的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!