如何使用csv.DictReader跳过前标题行? [英] How to skip pre header lines with csv.DictReader?

查看:1682
本文介绍了如何使用csv.DictReader跳过前标题行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要 csv.DictReader 从文件中推导出字段名称。 文档如果省略了fieldnames参数,则csvfile第一行中的值将用作字段名称。,但在我的示例中,第一行包含标题,第二行包含名称。



我不能根据 应用下一个(读者) 2-skip-a-line-in-csv-dictreader> Python 3.2跳过csv.DictReader中的一行,因为字段名分配发生在初始化阅读器时(或我做错了)。

csvfile(从Excel 2010导出,原始来源):

  CanVec v1.1.0 ,,,,,,,, ,^ M 
实体,属性组合,规范代码
点,规范代码
线,规范代码
区域,通用代码,主题GML-实体名
形状 - 文件名
点,GML - 实体名
形状 - 文件名
线,GML - 实体名
形状 - 文件名
区域^ M
游乐园,游乐园,,, 2260012,2260009,LX ,,, LX_2260009_2 ^ M
自动清障车,自动清障车,,, 2360012,2360009,IC ,, ,IC_2360009_2 ^ M

我的代码:

  f = open(entities_table,'rb')
try:
dialect = csv.Sniffer()。sniff(f.read(1024))
f.seek(0)

reader = csv.DictReader(f,dialect = dialect)
print'我认为字段名是:\\\
%s\\\
'% (reader.fieldnames)

i = 0
对于读取器中的行:
如果i < 20:
print row
i = i + 1

finally:
f.close()

当前结果:

 我认为字段名称是:
['CanVec v1.1.0','','','','','','','','','']
pre>
$ b 所需结果:

 我认为字段名称是: 
['Entity','Attributes combination','Specification Code Point',... snip]


b $ b

我意识到简单地删除第一行并继续,但我想尽可能接近只是原位读取数据,就像我可以和最小化手动干预。

解决方案

我使用了来自itertools的islice。我的头在一个大前言的最后一行。我已通过序言,并将字段名称使用hederline:

 打开(文件r)为f:
'''pass preamble'''
n = 0
for f.readlines()中的行:
n + = 1
如果'same_field_name'在行:#行与字段名被发现
h = line.split(',')
break
f.close()
f = islice(open(i,r),n,None)

reader = csv.DictReader(f,fieldnames = h)


I want to csv.DictReader to deduce the field names from the file. The docs say "If the fieldnames parameter is omitted, the values in the first row of the csvfile will be used as the fieldnames.", but in my case the first row containts a title and the 2nd row which contains the names.

I can't apply next(reader) as per Python 3.2 skip a line in csv.DictReader because the fieldname assignment takes place when initializing the reader (or I'm doing it wrong).

The csvfile (exported from Excel 2010, original source):

CanVec v1.1.0,,,,,,,,,^M
Entity,Attributes combination,"Specification Code
Point","Specification Code
Line","Specification Code
Area",Generic Code,Theme,"GML - Entity name
Shape - File name
Point","GML - Entity name
Shape - File name
Line","GML - Entity name
Shape - File name
Area"^M
Amusement park,Amusement park,,,2260012,2260009,LX,,,LX_2260009_2^M
Auto wrecker,Auto wrecker,,,2360012,2360009,IC,,,IC_2360009_2^M

My code:

f = open(entities_table,'rb')
try:
    dialect = csv.Sniffer().sniff(f.read(1024))
    f.seek(0)

    reader = csv.DictReader(f, dialect=dialect)
    print 'I think the field names are:\n%s\n' % (reader.fieldnames)

    i = 0
    for row in reader:
        if i < 20:
            print row
            i = i + 1

finally:
    f.close()

Current results:

I think the field names are:
['CanVec v1.1.0', '', '', '', '', '', '', '', '', '']

Desired result:

I think the field names are:
['Entity','Attributes combination','"Specification Code Point"',...snip]

I realize it would be expedient to simply delete the first row and carry on, but I'm trying to get as close to just reading the data in situ as I can and minimize manual intervention.

解决方案

I used islice from itertools. My header was in the last line of a big preamble. I have passed preamble and used hederline for fieldnames:

with open(file, "r") as f:
    '''Pass preamble'''
    n = 0
    for line in f.readlines():
        n += 1
        if 'same_field_name' in line: # line with field names was found
            h = line.split(',')
            break
    f.close()
    f = islice(open(i, "r"), n, None)

    reader = csv.DictReader(f, fieldnames = h)

这篇关于如何使用csv.DictReader跳过前标题行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆