使用 pandas read_excel从标准输入中读取 [英] using pandas read_excel to read from stdin
问题描述
注意:我已经按照以下方法解决了这个问题:
Note: I have solve this problem as per below:
我可以使用to_csv在python/pandas中写入标准输出.像这样的东西工作正常:
I can use to_csv to write to stdout in python / pandas. Something like this works fine:
final_df.to_csv(sys.stdout, index=False)
我想读取一个实际的excel文件(而不是csv).我想输出CSV,但输入xlsx.我有这个文件
I would like to read in an actual excel file (not a csv). I want to output CSV, but input xlsx. I have this file
bls_df = pd.read_excel(sys.stdin, sheet_name="MSA_dl", index_col=None)
但这似乎不起作用.是否可以做我正在尝试的事情,如果可以,怎么做?
But that doesn't seem to work. Is it possible to do what I'm trying and, if so, how does one do it?
注意:
- 实际的输入文件是"MSA_M2018_dl.xlsx",它位于zip文件中 https://www.bls.gov/oes/special.requests/oesm18ma.zip .
我这样下载并提取数据文件:
I download and extract the datafile like this:
curl -o oesm18ma.zip'https://www.bls.gov/oes/special.requests/oesm18ma.zip'
7z x oesm18ma.zip
-
我已经用脚本test01.py解决了以下问题,该脚本从stdin读取并写入stdout.注意在read_excel()调用中使用 sys.stdin.buffer .
导入系统导入操作系统将熊猫作为pd导入
import sys import os import pandas as pd
BLS_DF = pd.read_excel(sys.stdin.buffer,sheet_name ="MSA_dl",index_col = None)
BLS_DF = pd.read_excel(sys.stdin.buffer, sheet_name="MSA_dl", index_col=None)
BLS_DF.to_csv(sys.stdout,index = False)
BLS_DF.to_csv(sys.stdout, index=False)
我将其调用为:
cat MSA_M2018_dl.xlsx |python3 test01.py
cat MSA_M2018_dl.xlsx | python3 test01.py
这是一个小型测试程序,用于在消除复杂性的同时说明该想法.这不是我正在处理的实际程序.
This is a small test program to illustrate the idea while removing complexity. It's not the actual program I'm working on.
推荐答案
基于此答案,可能是:
import sys
import pandas as pd
import io
csv = ""
for line in sys.stdin:
csv += line
df = pd.read_csv(io.StringIO(csv))
这篇关于使用 pandas read_excel从标准输入中读取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!