ValueError:并非所有分区均已知，无法在dask数据帧上对齐分区错误 [英] ValueError: Not all divisions are known, can't align partitions error on dask dataframe

查看：186 发布时间：2020/8/10 18:57:59 python dataframe dask dask-distributed

本文介绍了ValueError:并非所有分区均已知，无法在dask数据帧上对齐分区错误的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有以下带有以下列的pandas数据框

I have the following pandas dataframe with the following columns

user_id user_agent_id requests

所有列均包含整数.我不会对它们执行某些操作，而无法使用dask数据框运行它们.这就是我的工作.

All columns contain integers. I wan't to perform some operations on them and run them using dask dataframe. This is what I do.

user_profile = cache_records_dataframe[['user_id', 'user_agent_id', 'requests']] \
    .groupby(['user_id', 'user_agent_id']) \
    .size().to_frame(name='appearances') \
    .reset_index() # I am not sure I can run this on dask dataframe

user_profile_ddf = df.from_pandas(user_profile, npartitions=4)
user_profile_ddf['percent'] = user_profile_ddf.groupby('user_id')['appearances'] \
    .apply(lambda x: x / x.sum(), meta=float) #Percentage of appearance for each user group

但是我收到以下错误

raise ValueError("Not all divisions are known, can't align "
ValueError: Not all divisions are known, can't align partitions. Please use `set_index` to set the index.

我做错什么了吗?在纯熊猫中，它的效果很好，但对于许多行(尽管它们适合存储在内存中)，它的运行速度很慢，因此我想并行化计算.

Am I doing something wrong? In pure pandas it works great but it gets slow for many lines (although they fit in memory) so I want to parallelize the computations.

ValueError:并非所有分区均已知，无法在dask数据帧上对齐分区错误 [英] ValueError: Not all divisions are known, can't align partitions error on dask dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

ValueError:并非所有分区均已知，无法在dask数据帧上对齐分区错误 [英] ValueError: Not all divisions are known, can&#39;t align partitions error on dask dataframe

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

ValueError:并非所有分区均已知，无法在dask数据帧上对齐分区错误 [英] ValueError: Not all divisions are known, can't align partitions error on dask dataframe

登录关闭