如何使用Pyarrow实现流写入效果 [英] How to use Pyarrow to achieve stream writing effect
问题描述
我拥有的数据是一种流数据.我想将它们存储到一个 Parquet 文件中.但是 Pyarrow 每次都会覆盖 Parquet 文件.那我该怎么办?
The data I have is a kind of streaming data. And I want to store them into a single Parquet file. But Pyarrow will overwrite the Parquet file everytime. So How should I do?
我尽量不关闭编写器,但似乎不可能,因为如果我不关闭它,那么我将无法读取此文件.
I try not to close the writer but it seems impossible since If I didn't close it, then I could not read this file.
这是包:
import pyarrow.parquet as pp
import pyarrow as pa
for name in ['LEE','LSY','asd','wer']:
writer=pq.ParquetWriter('d:/test.parquet', table.schema)
arrays=[pa.array([name]),pa.array([2])]
field=[pa.field('name',pa.string()),pa.field('age',pa.int64())]
table=pa.Table.from_arrays(arrays,schema=pa.schema(field))
writer.write_table(table)
writer.close()
但实际上我每次都想关闭编写器,然后重新打开它以在数据后添加一行,如下所示:
But actually I want to close the writer everytime, and reopen it to append one line to the data which like this:
for name in ['LEE','LSY','asd','wer']:
writer=pq.ParquetWriter('d:/test.parquet', table.schema)
arrays=[pa.array([name]),pa.array([2])]
field=[pa.field('name',pa.string()),pa.field('age',pa.int64())]
table=pa.Table.from_arrays(arrays,schema=pa.schema(field))
writer.write_table(table)
writer.close()
推荐答案
Parquet 文件一旦写入就无法追加.这种情况的典型解决方案是每次写入一个新的parquet文件(可以一起形成一个单独的分区parquet数据集),或者,如果数据不多,先将python中的数据收集到一个表中,然后再写入一次.
Parquet files cannot be appended once they are written. The typical solution for this case to write a new parquet file each time (which can together form a single partitioned parquet dataset), or, if it is not much data, first gather the data in python into a single table and then write once.
请参阅此电子邮件线程并对其进行更多讨论:https://lists.apache.org/thread.html/07b1e3f13b5dae7e34ee3752f3cd4d16a94deb3a5f43893b73475900@%3Cdev.arrow.apache.org%3E
See this email thread with some more discussion about it: https://lists.apache.org/thread.html/07b1e3f13b5dae7e34ee3752f3cd4d16a94deb3a5f43893b73475900@%3Cdev.arrow.apache.org%3E
这篇关于如何使用Pyarrow实现流写入效果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!