制表符分隔文件使用csv.reader不分隔我期望它的位置 [英] Tab-delimited file using csv.reader not delimiting where I expect it to

查看:1567
本文介绍了制表符分隔文件使用csv.reader不分隔我期望它的位置的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用Python循环通过一个制表符分隔文件的选举结果。以下代码不工作,但是当我使用具有相同结果的本地文件(注释掉的行)时,它会按预期工作。



可以想到的是一些标题或内容类型我需要传递的url,但我不能想出来。



为什么会发生这种情况?

  import csv 
import requests

r = requests.get('http://vote.wa.gov/results /current/export/MediaResults.txt')
data = r.text
#data = open('data / MediaResults.txt','r')
reader = csv.reader data,delimiter ='\t')
对于阅读器中的行:
打印行


$ b b

结果:

  ... 
['','']
[ '','']
['2']
['3']
['1']
['1']
['8' ]
['','']
['D']
['a']
['v']
['i']
['d']
['']
['F']
['r']
['a']
['z' ]
['i']
['e']
['','']
...
pre>

解决方案

所以发生了什么,好了,调用 help 减少了一些光。

 >>> help(csv.reader)
reader(...)
csv_reader = reader(iterable [,dialect ='excel']
[可选关键字args])
csv_reader:
进程(行)

iterable参数可以是每次迭代返回一行
的任何对象,例如文件对象或列表。
可选的dialect参数将在下面讨论。函数
还接受可选的关键字参数,它覆盖方言提供的
设置。

,所以看起来 csv.reader 一个迭代器,它会返回一行,但是我们传递一个字符串,它在char基础上迭代,这就是为什么它的字符解析的一个字符,一种方法来解决这将是生成一个临时文件,但我们不需要,我们只需要传递任何可迭代对象。



请注意以下内容,它简单地将字符串拆分为行列表它提供给读者。

  import csv 
导入请求

r = requests.get ('http://vote.wa.gov/results/current/export/MediaResults.txt')
data = r.text
reader = csv.reader(data.splitlines(),delimiter = '\t')
对于阅读器中的行:
打印行

这似乎工作。



我还建议使用 csv.DictReader 它非常有用。

 >>> reader = csv.DictReader(data.splitlines(),delimiter ='\t')
>>> for row in reader:
... print row
{'Votes':'417141','BallotName':'Michael Baumgartner','RaceID':'2','RaceName':' Senator','PartyName':'(Prefers Republican Party)','TotalBallotsCastByRace':'1387059','RaceJurisdictionTypeName':'Federal','BallotID':'23036'}
{'Votes':'15005 ','BallotName':'Will Baker','RaceID':'2','RaceName':'US Senator','PartyName':'(Prefers Reform Party)','TotalBallotsCastByRace':'1387059','RaceJurisdictionTypeName':'Federal','BallotID':'27435'}



基本上,它会为每一行返回一个字典,使用标题作为键,这样我们不需要跟踪订单,对于我们来说名字变得更容易,即 row ['Votes'] 似乎更容易阅读,然后 row [4] 。 ..


I am trying to loop through a tab-delimited file of election results using Python. The following code does not work, but when I use a local file with the same results (the commented out line), it does work as expected.

The only thing I can think of is some headers or content type I need to pass the url, but I cannot figure it out.

Why is this happening?

import csv
import requests

r = requests.get('http://vote.wa.gov/results/current/export/MediaResults.txt') 
data = r.text
#data = open('data/MediaResults.txt', 'r')
reader = csv.reader(data, delimiter='\t')
for row in reader:
    print row

Results in:

...
['', '']
['', '']
['2']
['3']
['1']
['1']
['8']
['', '']
['D']
['a']
['v']
['i']
['d']
[' ']
['F']
['r']
['a']
['z']
['i']
['e']
['', '']
...

解决方案

so whats happening, well, a call to help may shed some light.

>>> help(csv.reader)
 reader(...)
    csv_reader = reader(iterable [, dialect='excel']
                            [optional keyword args])
        for row in csv_reader:
            process(row)

    The "iterable" argument can be any object that returns a line
    of input for each iteration, such as a file object or a list.  The
    optional "dialect" parameter is discussed below.  The function
    also accepts optional keyword arguments which override settings
    provided by the dialect.

so it appears that csv.reader expects an iterator of some kind which will return a line, but we are passing a string which iterates on a char bases which is why its parsing character by character, one way to fix this would be to generate a temp file, but we don't need to, we just need to pass any iterable object.

note the following, which simply splits the string to a list of lines, before its fed to the reader.

import csv
import requests

r = requests.get('http://vote.wa.gov/results/current/export/MediaResults.txt') 
data = r.text
reader = csv.reader(data.splitlines(), delimiter='\t')
for row in reader:
    print row

this seems to work.

I also recommend using csv.DictReader its quite useful.

>>> reader = csv.DictReader(data.splitlines(), delimiter='\t')
>>> for row in reader:
...      print row
{'Votes': '417141', 'BallotName': 'Michael Baumgartner', 'RaceID': '2', 'RaceName': 'U.S. Senator', 'PartyName': '(Prefers Republican Party)', 'TotalBallotsCastByRace': '1387059', 'RaceJurisdictionTypeName': 'Federal', 'BallotID': '23036'}
{'Votes': '15005', 'BallotName': 'Will Baker', 'RaceID': '2', 'RaceName': 'U.S. Senator', 'PartyName': '(Prefers Reform Party)', 'TotalBallotsCastByRace': '1387059', 'RaceJurisdictionTypeName': 'Federal', 'BallotID': '27435'}

basically it returns a dictionary for every row, using the header as the key, this way we don't need to keep track of the order but instead just the name making a bit easier for us ie row['Votes'] seems more readable then row[4]...

这篇关于制表符分隔文件使用csv.reader不分隔我期望它的位置的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆