如何使用dask / dask-cudf将单个大型实木复合地板文件读取到多个分区中？ [英] How to read a single large parquet file into multiple partitions using dask/dask-cudf?

查看：237 发布时间：2020/10/15 18:46:29 dask cudf

本文介绍了如何使用dask / dask-cudf将单个大型实木复合地板文件读取到多个分区中？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在尝试使用 dask_cudf / <$读取单个大的 parquet 文件（大小> gpu_size） c $ c> dask ，但它目前正在将其读取到单个分区中，我猜这是从文档字符串推断出的预期行为：

I am trying to read a single large parquet file (size > gpu_size), using dask_cudf/dask but it is currently reading it into a single partition, which i am guessing is the expected behavior inferring from the doc-string:

dask.dataframe.read_parquet(path, columns=None, filters=None, categories=None, index=None, storage_options=None, engine='auto', gather_statistics=None, **kwargs):

    Read a Parquet file into a Dask DataFrame
    This reads a directory of Parquet data into a Dask.dataframe, one file per partition. 
    It selects the index among the sorted columns if any exist.

是否有解决方法，我可以将其读入多个分区？

Is there a work-around i can do read it into multiple partitions ?

如何使用dask / dask-cudf将单个大型实木复合地板文件读取到多个分区中？ [英] How to read a single large parquet file into multiple partitions using dask/dask-cudf?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何使用dask / dask-cudf将单个大型实木复合地板文件读取到多个分区中？ [英] How to read a single large parquet file into multiple partitions using dask/dask-cudf?

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭