将文件夹中的许多羽毛文件加载到dask中 [英] Load many feather files in a folder into dask

查看:203
本文介绍了将文件夹中的许多羽毛文件加载到dask中的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在一个包含许多.feather文件的文件夹中,我想将所有文件加载到python中的dask中.

With a folder with many .feather files, I would like to load all of them into dask in python.

到目前为止,我已经尝试了以下来自GitHub上类似问题的内容 https://github.com/dask/dask/issues/1277

So far, I have tried the following sourced from a similar question on GitHub https://github.com/dask/dask/issues/1277

files = [...]
dfs = [dask.delayed(feather.read_dataframe)(f) for f in files]
df = dd.concat(dfs)

不幸的是,这给了我上面提到的错误TypeError: Truth of Delayed objects is not supported,但是解决方法不明确.

Unfortunately, this gives me the error TypeError: Truth of Delayed objects is not supported which is mentioned there, but a workaround is not clear.

是否有可能做到上述目的?

Is it possible to do the above in dask?

推荐答案

您想使用如果可能,还应该提供meta=(零长度数据帧,描述列,索引和dtypes)和divisions=(索引沿分区的边界值)kwarg.

If possible, you should also supply the meta= (a zero-length dataframe, describing the columns, index and dtypes) and divisions= (the boundary values of the index along the partitions) kwargs.

这篇关于将文件夹中的许多羽毛文件加载到dask中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆