Python:使用多分割分隔符分割文件 [英] Python: split files using mutliple split delimiters
问题描述
我有多个CSV文件,我需要在循环中解析收集信息。
问题是,虽然他们是相同的格式,一些用\t分隔,其他的用','分隔。
此后,我要删除字符串周围的双引号。
可以通过多个可能的分隔符分割python吗?
在这一刻,我可以使用以下命令来分割线:
f = open文件名r)
fields = f.readlines()
for fs in fields:
sf = fs.split('\t')
tf = [fi .strip('')for fi in sf]
欢迎任何建议。
拆分这样的文件不是一个好主意:如果一个字段中有逗号,它会失败。一个制表符分隔的文件):field1\tHello,world\tfield3
将被拆分为4个字段, / p>
请改用 csv
模组,其中包含有用的 Sniffer
类,它可以检测文件中使用的定界符。 csv模块也会删除您的双引号。
import csv
csvfile = open (example.csv)
dialect = csv.Sniffer()。sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile,方言)
读取线路:
#process line
I have multiple CSV files which I need to parse in a loop to gather information. The problem is that while they are the same format, some are delimited by '\t' and others by ','. After this, I want to remove the double-quote from around the string.
Can python split via multiple possible delimiters?
At the minute, I can split the line with one by using:
f = open(filename, "r")
fields = f.readlines()
for fs in fields:
sf = fs.split('\t')
tf = [fi.strip ('"') for fi in sf]
Any suggestions are welcome.
Splitting the file like that is not a good idea: It will fail if there is a comma within one of the fields. For example (for a tab-delimited file): The line "field1"\t"Hello, world"\t"field3"
will be split into 4 fields instead of 3.
Instead, you should use the csv
module. It contains the helpful Sniffer
class which can detect which delimiters are used in the file. The csv module will also remove the double-quotes for you.
import csv
csvfile = open("example.csv")
dialect = csv.Sniffer().sniff(csvfile.read(1024))
csvfile.seek(0)
reader = csv.reader(csvfile, dialect)
for line in reader:
#process line
这篇关于Python:使用多分割分隔符分割文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!