使用csv阅读器在文本文件中保留双引号 [英] Keep double quotes in a text file using csv reader

查看:192
本文介绍了使用csv阅读器在文本文件中保留双引号的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有字符串的文本文件:

Hi I have a text file with string :

你好,"foo,bar"

hello,"foo, bar"

我想将其拆分为以下列表:

i want to split it into a list as:

['hello', '"foo, bar"']

有没有办法做到这一点?

Is there a way I can acheive this ?

到目前为止,我正在尝试这样做:

I am trying this as of now :

for line in sys.stdin: csv_file = StringIO.StringIO(line) csv_reader = csv.reader(csv_file)

for line in sys.stdin: csv_file = StringIO.StringIO(line) csv_reader = csv.reader(csv_file)

我希望它们分成两个字符串,即:

I want them to split into two string i.e:

'hello' and '"foo, bar"'

推荐答案

假设您从CSV中读取一行:

Say you read a row from a CSV:

from StringIO import StringIO
import csv

infile = StringIO('hello,"foo, bar"')
reader = csv.reader(infile)
row = reader.next()  # row is ['hello', 'foo, bar']

该行中的第二个值是foo, bar而不是"foo, bar".这不是Python的怪异之处,它是CSV语法的合理解释.引号可能不是放在一个值的一部分,而是为了显示foo, bar是一个值,并且不应基于逗号(,)拆分为foobar.一种替代解决方案是在创建CSV文件时转义逗号,因此该行应如下所示:

The second value in the row is foo, bar instead of "foo, bar". This isn't some Python oddity, it's a reasonable interpretation of CSV syntax. The quotes probably weren't placed there to be part of a value, but rather to show that foo, bar is one value and shouldn't be split into foo and bar based on the comma (,). An alternative solution would be to escape the comma when creating the CSV file, so the line would look like:

hello,foo \,bar

因此要保留这些引号是一个很奇怪的请求.如果我们对您的用例和全局有更多的了解,我们可以为您提供更好的帮助.您想达到什么目的?输入文件来自哪里?真的是CSV还是其他类似的语法?例如,如果您知道每一行都包含两个用逗号分隔的值,并且第一个值从不包含逗号,那么您可以在第一个逗号处拆分:

So it's quite a strange request to want to keep those quotes. If we know more about your use case and the bigger picture we can help you better. What are you trying to achieve? Where does the input file come from? Is it really a CSV or is it some other syntax that looks similar? For example if you know that every line consists of two values separated by a comma, and the first value never contains a comma, then you can just split on the first comma:

print 'hello,"foo, bar"'.split(',', 1)  # => ['hello', '"foo, bar"']

但是我怀疑输入内容是否有这样的限制,这就是为什么需要使用引号等来解决歧义的原因.

But I doubt the input has such restrictions which is why things like quotes are needed to resolve ambiguities.

如果您尝试再次写入CSV,则将在执行操作时重新创建引号.他们不必在中间列表中出现:

If you're trying to write to a CSV again, then the quotes will be recreated as you're doing so. They don't have to be there in the intermediate list:

outfile = StringIO()
writer = csv.writer(outfile)
writer.writerow(row)
print outfile.getvalue()

这将打印

hello,"foo, bar"

您可以通过设置新的方言来自定义确切的CSV输出.

You can customise the exact CSV output by setting a new dialect.

如果您想在行中应用适当的报价规则来获取单个值,则可以,但是有点麻烦:

If you want to grab the individual values in the row with the appropriate quoting rules applied to them, it's possible, but it's a bit of a hack:

# We're going to write individual strings, so we don't want a line terminator
csv.register_dialect('no_line_terminator', lineterminator='')

def maybe_quote_string(s):
    out = StringIO()

    # writerow iterates over its argument, so don't give it a plain string
    # or it'll break it up into characters
    csv.writer(out, 'no_line_terminator').writerow([s])

    return out.getvalue()

print maybe_quote_string('foo, bar')
print map(maybe_quote_string, row)

输出为:

"foo, bar"
['hello', '"foo, bar"']

这是我最能回答您的问题的地方.并不是真正地保留双引号,而是将其删除并以可能与将它们放在首位的规则相同的方式添加回去.

This is the closest I can come to answering your question. It's not really keeping the double quotes, rather it's removing them and adding them back with likely the same rules that put them there in the first place.

我再说一遍,您可能会在这个问题上走错了路.其他人可能会同意.这就是为什么您在努力获得良好答案的原因.您要解决的最大问题是什么?我们可以帮助您更好地实现这一目标.

I'll say it again, you're probably headed down the wrong path with this question. Others will probably agree. That's why you're struggling to get good answers. What is the bigger problem that you're trying to solve? We can help you better to achieve that.

这篇关于使用csv阅读器在文本文件中保留双引号的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆