python:将CSV阅读器与从tarfile中提取的单个文件一起使用 [英] python: use CSV reader with single file extracted from tarfile
问题描述
我正在尝试使用 Python CSV阅读器来读取CSV我使用 Python的tarfile库从.tar.gz
文件中提取的文件./p>
我有这个:
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
tarredCSV = tarFile.extractfile(file)
reader = csv.reader(tarredCSV)
next(reader) # skip header
for row in reader:
if row[3] not in CSVRows.values():
CSVRows[row[3]] = row
tar文件中的所有文件都是CSV.
我在第一个文件上遇到了异常.我在第一行next
上收到此异常:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
我如何打开所说的文件(不解压文件然后打开它)?
tarfile.extractfile
返回io.BufferedReader
对象,字节流,但csv.reader
需要文本流.您可以使用io.TextIOWrapper
将字节流转换为文本流:
import io
...
reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))
I am trying to use the Python CSV reader to read a CSV file that I extract from a .tar.gz
file using Python's tarfile library.
I have this:
tarFile = tarfile.open(name=tarFileName, mode="r")
for file in tarFile.getmembers():
tarredCSV = tarFile.extractfile(file)
reader = csv.reader(tarredCSV)
next(reader) # skip header
for row in reader:
if row[3] not in CSVRows.values():
CSVRows[row[3]] = row
All the files in the tar file are all CSVs.
I am getting an exception on the first file. I am getting this exception on the first next
line:
_csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
How do I open said file (without extracting the file then opening it)?
tarfile.extractfile
returns an io.BufferedReader
object, a bytes stream, and yet csv.reader
expects a text stream. You can use io.TextIOWrapper
to convert the bytes stream to a text stream instead:
import io
...
reader = csv.reader(io.TextIOWrapper(tarredCSV, encoding='utf-8'))
这篇关于python:将CSV阅读器与从tarfile中提取的单个文件一起使用的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!