将文件夹中的许多羽毛文件加载到dask中 [英] Load many feather files in a folder into dask
问题描述
在一个包含许多.feather
文件的文件夹中,我想将所有文件加载到python中的dask中.
With a folder with many .feather
files, I would like to load all of them into dask in python.
到目前为止,我已经尝试了以下来自GitHub上类似问题的内容 https://github.com/dask/dask/issues/1277
So far, I have tried the following sourced from a similar question on GitHub https://github.com/dask/dask/issues/1277
files = [...]
dfs = [dask.delayed(feather.read_dataframe)(f) for f in files]
df = dd.concat(dfs)
不幸的是,这给了我上面提到的错误TypeError: Truth of Delayed objects is not supported
,但是解决方法不明确.
Unfortunately, this gives me the error TypeError: Truth of Delayed objects is not supported
which is mentioned there, but a workaround is not clear.
是否有可能做到上述目的?
Is it possible to do the above in dask?
推荐答案
您想使用如果可能,还应该提供meta=
(零长度数据帧,描述列,索引和dtypes)和divisions=
(索引沿分区的边界值)kwarg.
If possible, you should also supply the meta=
(a zero-length dataframe, describing the columns, index and dtypes) and divisions=
(the boundary values of the index along the partitions) kwargs.
这篇关于将文件夹中的许多羽毛文件加载到dask中的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!