流实木复合地板文件python和仅向下采样 [英] Streaming parquet file python and only downsampling

查看：131 发布时间：2020/7/22 21:35:03 python-3.x parquet pyarrow fastparquet

本文介绍了流实木复合地板文件python和仅向下采样的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我有实木复合地板格式的数据，太大而无法放入内存(6 GB).我正在寻找一种使用Python 3.6读取和处理文件的方法.有没有一种方法可以流式传输文件，缩减采样并保存到dataframe?最终，我希望使用dataframe格式的数据.

I have data in parquet format which is too big to fit into memory (6 GB). I am looking for a way to read and process the file using Python 3.6. Is there a way to stream the file, down-sample, and save to a dataframe? Ultimately, I would like to have the data in dataframe format to work with.

我在不使用Spark框架的情况下尝试这样做是错误的吗?

Am I wrong to attempt to do this without using a spark framework?

我尝试使用pyarrow和fastparquet，但是在尝试读取整个文件时遇到内存错误. 任何提示或建议，将不胜感激！

I have tried using pyarrow and fastparquet but I get memory errors on trying to read the entire file in. Any tips or suggestions would be greatly appreciated!

流实木复合地板文件python和仅向下采样 [英] Streaming parquet file python and only downsampling

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

流实木复合地板文件python和仅向下采样 [英] Streaming parquet file python and only downsampling

问题描述

推荐答案

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭