Python:使用多分割分隔符分割文件 [英] Python: split files using mutliple split delimiters

查看:629
本文介绍了Python:使用多分割分隔符分割文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有多个CSV文件,我需要在循环中解析收集信息。
问题是,虽然他们是相同的格式,一些用\t分隔,其他的用','分隔。
此后,我要删除字符串周围的双引号。



可以通过多个可能的分隔符分割python吗?



在这一刻,我可以使用以下命令来分割线:

  f = open文件名r)
fields = f.readlines()
for fs in fields:
sf = fs.split('\t')
tf = [fi .strip('')for fi in sf]

欢迎任何建议。

解决方案

拆分这样的文件不是一个好主意:如果一个字段中有逗号,它会失败。一个制表符分隔的文件):field1\tHello,world\tfield3将被拆分为4个字段, / p>

请改用 csv 模组,其中包含有用的 Sniffer 类,它可以检测文件中使用的定界符。 csv模块也会删除您的双引号。

  import csv 

csvfile = open (example.csv)
dialect = csv.Sniffer()。sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile,方言)

读取线路:
#process line


I have multiple CSV files which I need to parse in a loop to gather information. The problem is that while they are the same format, some are delimited by '\t' and others by ','. After this, I want to remove the double-quote from around the string.

Can python split via multiple possible delimiters?

At the minute, I can split the line with one by using:

f = open(filename, "r")
fields = f.readlines()
for fs in fields:
    sf = fs.split('\t')
    tf = [fi.strip ('"') for fi in sf]

Any suggestions are welcome.

解决方案

Splitting the file like that is not a good idea: It will fail if there is a comma within one of the fields. For example (for a tab-delimited file): The line "field1"\t"Hello, world"\t"field3" will be split into 4 fields instead of 3.

Instead, you should use the csv module. It contains the helpful Sniffer class which can detect which delimiters are used in the file. The csv module will also remove the double-quotes for you.

import csv

csvfile = open("example.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)

for line in reader:
    #process line

这篇关于Python:使用多分割分隔符分割文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆