当字段预先未知时,使用DictWriter写入CSV [英] Using DictWriter to write a CSV when the fields are not known beforehand

查看:259
本文介绍了当字段预先未知时,使用DictWriter写入CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将大文本解析为字典,最终目的是创建一个CSV文件,其中的键作为列标题。

I am parsing a large piece of text into dictionaries, with the end objective of creating a CSV file with the keys as column headers.

csv.DictWriter(csvfile,fieldnames,restval ='',extrasaction ='raise',dialect ='excel',* args,* * kwds)

问题出现,因为任何第n行的dict可以包含一个新的,从未使用过的键。然后,我想要CSV包含此新键的列。总之,我的所有字段都不是预先知道的,所以我不能在开头编译一个完整的 fieldnames

The problem arises as the dict for any 'n'th row can include a new, never before used key. I then want the CSV to contain a column for this new key as well. In short, all my fields are not known beforehand so I cannot compile a complete fieldnames at the beginning.

建议您 csv .DictWriter 不忽略丢失的字段,而是将它们添加到 fieldnames 中?此时只更改 fieldnames 会使前面的行的字段数不正确。

Is there a recommended way to have csv.DictWriter not ignore missing fields but add them to fieldnames instead? Merely changing fieldnames at this point would leave the prior lines with an incorrectly lower number of fields.

推荐答案

而不是使用 DictWriter ,这可能会混淆你的情况下,因为字典没有排序,我尝试使用 csv writerow >。
这是我做的:

Instead of using DictWriter which can be confusing in your case as dictionaries are not ordered I tried using writerow method of csv. Here is what i did :

"""
a) First took all the keys of dictionary and sorted it, which is not necessary.
b) Created a result list which appends value related the headers which is key of our input dict and if key is not available then .get() will return None. 
   So result list will contain lists for rows data.
c) Wrote header and each row from result list in csv file
"""

data_dict = [{ "Header_1":"data_1", "Header_2":"data_2", "Header_3":"data_3"},
             { "Header_1":"data_4", "Header_2":"data_5", "Header_3":"data_6"},
             { "Header_1":"data_7", "Header_2":"data_8", "Header_3":"data_9", "Header_4":"data_10"},
             { "Header_1":"data_11", "Header_3":"data_12"},
             { "Header_1":"data_13", "Header_2":"data_14", "Header_3":"data_15"}]

"""
   In the third dict we have extra key, value.
   In forth we dont have have header_2 were we aspect blank value in our csv file.
"""
process_data = [ [k,v] for _dict in data_dict for k,v in _dict.iteritems() ]           

headers = [ i[0] for i in process_data ]
headers = sorted(list(set(headers)))

result = []
for _dict in data_dict:
    row = []
    for header in headers:
        row.append(_dict.get(header, None))
    result.append(row)


import csv
with open('demo.csv', 'wb') as csvfile:
    spamwriter = csv.writer(csvfile, delimiter=';', dialect='excel', 
                            quotechar='|', quoting=csv.QUOTE_MINIMAL)
    spamwriter.writerow(headers)    
    for r in result:
        spamwriter.writerow(r)

>

这篇关于当字段预先未知时,使用DictWriter写入CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆