带有无序索引的模糊数据框会导致无提示错误吗? [英] Can a dask dataframe with a unordered index cause silent errors?

查看:71
本文介绍了带有无序索引的模糊数据框会导致无提示错误吗?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

围绕dask.DataFrame的方法似乎都可以确保索引列已排序。但是,通过使用 from_delayed ,可以构造一个具有未排序的索引列的dask数据框:

Methods around dask.DataFrame all seem to make sure, that the index column is sorted. However, by using from_delayed, it is possible to construct a dask dataframe that has a index column, which is not sorted:

pdf1 = delayed(pd.DataFrame(dict(A=[1,2,3], B = [1,1,1])).set_index('A'))
pdf2 = delayed(pd.DataFrame(dict(A=[1,2,3], B = [1,1,1])).set_index('A'))
ddf = dd.from_delayed([pdf1,pdf2]) #dask.DataFrame with unordered index

组合[index设置,索引未排序,未知分区]这是我在Dask自己创建的数据帧中从未见过的东西。所以我的问题是:

The combination [index is set, index is not sorted, divisions are unknown] is something that I have never seen among dataframes that dask created itself. So my questions are:


  • 是否已对dask进行了测试,使其能够很好地与此类数据帧配合使用?

  • 可能甚至在这种数据帧上的计算都会无声地给出错误的结果,例如是因为它们假定索引要排序或对不完整的数据子集执行索引?

  • 或更笼统:如果未对索引列进行排序,是否只会减慢按索引访问的速度或它会破坏功能吗?

推荐答案

许多dask.dataframe操作将拒绝运行或将运行在没有已知除法的情况下对数据帧使用较慢的算法。请参见 http://dask.pydata.org/en/latest/dataframe -design.html#partitions

Many dask.dataframe operations will refuse to operate or will operate with slower algorithms on dataframes without known divisions. See http://dask.pydata.org/en/latest/dataframe-design.html#partitions

例如,如果dask.dataframe知道,则 df.loc 很快。索引已排序,并且知道每个分区的最小值/最大值。但是,如果此信息未知,则 df.loc 必须详尽浏览所有分区。

For example df.loc is fast if dask.dataframe knows that the index is sorted and it knows the min/max of each partition. However if this information is not known then df.loc has to look through all of the partitions exhaustively.

通常讲dask.dataframe知道您长大的可能性,应采取相应的措施。有些操作会变慢。有些操作会拒绝操作。

Generally speaking dask.dataframe is aware of the possibility that you bring up and should act accordingly. Some operations will be slower. Some operations will refuse to operate.

这篇关于带有无序索引的模糊数据框会导致无提示错误吗?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆