如何在Python中使用pyarrow读取有条件的镶木地板文件 [英] How to read parquet file with a condition using pyarrow in Python

查看:201
本文介绍了如何在Python中使用pyarrow读取有条件的镶木地板文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已经从数据库中创建了一个包含三列(id,作者,title)的实木复合地板文件,并希望在有条件(title = Learn Python)的情况下读取该实木复合地板文件。
下面提到的是我为此POC使用的python代码。

I have created a parquet file with three columns (id, author, title) from database and want to read the parquet file with a condition (title='Learn Python'). Below mentioned is the python code which I am using for this POC.

import pyarrow as pa
import pyarrow.parquet as pq
import pandas as pd
import pyodbc

def write_to_parquet(df, out_path, compression='SNAPPY'):
arrow_table = pa.Table.from_pandas(df)
if compression == 'UNCOMPRESSED':
    compression = None
pq.write_table(arrow_table, out_path, use_dictionary=False,
               compression=compression)

def read_pyarrow(path, nthreads=1):
return pq.read_table(path, nthreads=nthreads).to_pandas()


path = './test.parquet'
sql = "SELECT * FROM [dbo].[Book] (NOLOCK)"

conn = pyodbc.connect(r'Driver={SQL 
Server};Server=.;Database=APP_BBG_RECN;Trusted_Connection=yes;')
df = pd.io.sql.read_sql(sql, conn)

write_to_parquet(df, path)

df1 = read_pyarrow(path)

如何设置条件(title ='学习Python' )中的read_pyarrow方法?

How can I put a condition (title='Learn Python') in read_pyarrow method?

推荐答案

尚不支持。我们打算在将来开发此功能。从箭头表转换后,我建议对熊猫进行过滤。

This is not yet supported. We intend to develop this functionality in the future. I recommend doing the filtering with pandas after the conversion from Arrow table.

这篇关于如何在Python中使用pyarrow读取有条件的镶木地板文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆