如何将python csv.DictReader与二进制文件一起使用? (用于babel自定义提取方法) [英] How to use python csv.DictReader with a binary file? (For a babel custom extraction method)
问题描述
我正在尝试为babel编写一种自定义提取方法,以从csv文件中的特定列中提取字符串.我在此处进行了操作.
I'm trying to write a custom extraction method for babel, to extract strings from a specific column in a csv file. I followed the documentation here.
这是我的提取方法代码:
Here is my extraction method code:
def extract_csv(fileobj, keywords, comment_tags, options):
import csv
reader = csv.DictReader(fileobj, delimiter=',')
for row in reader:
if row and row['caption'] != '':
yield (reader.line_num, '', row['caption'], '')
当我尝试运行提取程序时,出现此错误:
When i try to run the extraction i get this error:
extract_csv中的文件"/Users/tiagosilva/repos/naltio/csv_extractor.py",第18行 对于阅读器中的行: 下一个中的文件"/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py",第111行 自我字段名 文件"/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py",行98,在字段名称中 self._fieldnames = next(self.reader) _csv.Error:迭代器应返回字符串,而不是字节(您是否以文本模式打开文件?)
File "/Users/tiagosilva/repos/naltio/csv_extractor.py", line 18, in extract_csv for row in reader: File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 111, in next self.fieldnames File "/usr/local/Cellar/python/3.6.5/Frameworks/Python.framework/Versions/3.6/lib/python3.6/csv.py", line 98, in fieldnames self._fieldnames = next(self.reader) _csv.Error: iterator should return strings, not bytes (did you open the file in text mode?)
似乎以二进制模式打开了传递给该函数的 fileobj .
It seems the fileobj that is passed to the function was opened in binary mode.
如何进行这项工作?我可以想到2种可能的解决方案,但我不知道如何编写它们:
How to make this work? I can think of 2 possible solutions, but I don't know how to code them:
1)是否可以在DictReader中使用它?
1) is there a way to use it with DictReader?
2)是否有信号通知babel以文本模式打开文件?
2) Is there a way to signal babel to open the file in text mode?
我愿意接受其他未列出的解决方案.
I'm open to other non listed solutions.
推荐答案
我实际上找到了一种方法!
I actually found a way to do it!
解决方案1,一种处理二进制文件的方法.解决方案是将TextIOWrapper包裹在二进制文件周围,并对其进行解码,然后将其传递给DictReader.
It's solution 1, a way to handle a binary file. The solution is to wrap a TextIOWrapper around the binary file and decode it and pass that to the DictReader.
import csv
import io
with io.TextIOWrapper(fileobj, encoding='utf-8') as text_file:
reader = csv.DictReader(text_file, delimiter=',')
for row in reader:
if row and 'caption' in row.keys():
yield (reader.line_num, '', row['caption'], '')
这篇关于如何将python csv.DictReader与二进制文件一起使用? (用于babel自定义提取方法)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!