Python DictReader - 跳过缺少列的行? [英] Python DictReader - Skipping rows with missing columns?

查看:163
本文介绍了Python DictReader - 跳过缺少列的行?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Excel .CSV文件我试图读入DictReader。



看起来很好,除了它似乎省略行,特别是那些缺少列。



我们的输入如下:

  givenName,sn,lorem,ipsum,dolor,telephoneNumber 
ian.bay @ blah.com,ian,bay,3424,8403,2535,+ 65(2)34523534545
mike.gibson@blah.com ,mike,gibson,3424,8403,2535,+ 65(2)34523534545
ross.martin @ blah.com,ross,martin ,,,, + 65(2)34523534545
david.connor @ blah.com,david,connor ,,,, + 65(2)34523534545
chris.call @ blah.com,chr​​is,call,3424,8403,2535,+ 65(2)34523534545

因此,某些行缺少lorem / ipsum / dolor列,并且它们只是一串逗号。 p>

我们正在读取它:

  def read_gd_dump(input_file = blah 20100423.csv):
gd_extract = csv.DictReader(open('blah 20100423.csv'),restval ='missing',dialect ='excel')
return dict ['something'],row)for row in gd_extract])

(我们的dict的关键)不是一个缺少的列,我最初怀疑可能是。



但是,DictReader似乎完全跳过了这些行。我试着设置restval的东西,似乎没有什么区别。我似乎在Python的CSV文档中找不到任何东西( http://docs.python.org) /library/csv.html ),可能会解释这个行为,但我可能误读了一些东西。

解决方案

不再重现您的问题 - 当我保存数据,然后分配列表(gd_extract),我看到:

  [{'telephoneNumber':'+65(2)34523534545','ipsum':'8403','sn':'bay','dolor':'2535' mail':'ian.bay@blah.com','givenName':'ian','lorem':'3424'},{'telephoneNumber':'+65(2)34523534545','ipsum':'8403 ','sn':'gibson','dolor':'2535','mail':'mike.gibson@blah.com','givenName':'mike','lorem':'3424'}, 'telephoneNumber':'+65(2)34523534545','ipsum':'','sn':'martin','dolor':'','mail':'ross.martin@blah.com' givenName':'ross','lorem':''},{'telephoneNumber':'+65(2)34523534545','ipsum':'','sn':'connor','dolor':' ,'mail':'david.connor@blah.com','givenName':'david','lorem':''},{'telephoneNumber':'+65(2)34523534545','ipsum' 8403','sn':'call','dolor':'2535','mail':'chris.call@blah.com','givenName':'chris','lorem':'3424'}] 

五个病例,包括缺少 ipsum 等等。我担心你在简化问题时,你已经过度简化了它,以至于你的错误已经消失了。



如果你在列 something (无法检查,因为您在示例数据中不该列),这当然会解释显然缺少行 - 他们不是从csv读者的返回流丢失,他们得到覆盖在你返回的dict。这是问题吗?


I have a Excel .CSV file I'm attempting to read in with DictReader.

All seems to be well, except it seems to omit rows, specifically those with missing columns.

Our input looks like:

mail,givenName,sn,lorem,ipsum,dolor,telephoneNumber
ian.bay@blah.com,ian,bay,3424,8403,2535,+65(2)34523534545
mike.gibson@blah.com,mike,gibson,3424,8403,2535,+65(2)34523534545
ross.martin@blah.com,ross,martin,,,,+65(2)34523534545
david.connor@blah.com,david,connor,,,,+65(2)34523534545
chris.call@blah.com,chris,call,3424,8403,2535,+65(2)34523534545

So some of the rows have missing lorem/ipsum/dolor columns, and it's just a string of commas for those.

We're reading it in with:

def read_gd_dump(input_file="blah 20100423.csv"):
    gd_extract = csv.DictReader(open('blah 20100423.csv'), restval='missing', dialect='excel')
    return dict([(row['something'], row) for row in gd_extract])

And I checked that "something" (the key for our dict) isn't one of the missing columns, I had originally suspected it might be that. It's one of the columns after that.

However, DictReader seems to completely skip over the rows. I tried setting restval to something, didn't seem to make any difference. I can't seem to find anything in Python's CSV docs (http://docs.python.org/library/csv.html) that would explain this behaviour, but I may have misread something.

解决方案

Can't reproduce your problem -- when I save that data and then assign list(gd_extract), I see:

[{'telephoneNumber': '+65(2)34523534545', 'ipsum': '8403', 'sn': 'bay', 'dolor': '2535', 'mail': 'ian.bay@blah.com', 'givenName': 'ian', 'lorem': '3424'}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '8403', 'sn': 'gibson', 'dolor': '2535', 'mail': 'mike.gibson@blah.com', 'givenName': 'mike', 'lorem': '3424'}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '', 'sn': 'martin', 'dolor': '', 'mail': 'ross.martin@blah.com', 'givenName': 'ross', 'lorem': ''}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '', 'sn': 'connor', 'dolor': '', 'mail': 'david.connor@blah.com', 'givenName': 'david', 'lorem': ''}, {'telephoneNumber': '+65(2)34523534545', 'ipsum': '8403', 'sn': 'call', 'dolor': '2535', 'mail': 'chris.call@blah.com', 'givenName': 'chris', 'lorem': '3424'}]

five dicts, including those with missing ipsum etc. I fear that in your laudable attempt at simplifying the problem you've simplified it excessively, so that your bug has gone away.

If you have duplicates in column something (can't check, since you don't have that column in your sample data) that would of course explain the "apparently missing" rows -- they're not missing from the csv reader's returned stream, they get "overwritten" in the dict you're returning. Could that be the issue?

这篇关于Python DictReader - 跳过缺少列的行?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆