CSV编写需要唯一定界符的文本字符串 [英] CSV writing strings of text that need a unique delimiter

查看:90
本文介绍了CSV编写需要唯一定界符的文本字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我用python写了一个HTML解析器,用于提取数据,使其在csv文件中看起来像这样:

I wrote an HTML parser in python used to extract data to look like this in a csv file:

    itemA, itemB, itemC, Sentence that might contain commas, or colons: like this,\n

delmiter :::::认为不会在数据中进行挖掘

so I used a delmiter ":::::" thinking that it wouldn't be mined in the data

    itemA, itemB, itemC, ::::: Sentence that might contain commas, or colons: like this,::::\n

这适用于数千行的大多数行,但是显然冒号:当我在Calc中导入csv时,可以抵消它。

This works for most of the thousands of lines, however, apparently a colon : offset this when I imported the csv in Calc.

我的问题是,什么是创建带有多个变体的csv时需要使用的最佳还是唯一的定界符?我是否正确理解了分隔符,因为它们将CSV中的值分开了?

My question is, what is the best or a unique delimiter to use when creating a csv with many variations of sentences that need to be separated with some delimiter? Am I understanding delimiters correctly in that they separate the values within a CSV?

推荐答案

正如我在评论中非正式建议的那样,唯一意味着您需要使用一些数据中不会包含的字符- chr(255)可能是一个不错的选择。例如:

As I suggested informally in a comment, unique just means you need to use some character that won't be in the data — chr(255) might be a good choice. For example:

注意:显示的代码适用于Python 2.x,请参见Python 3版本的注释。

Note: The code shown is for Python 2.x — see comments for a Python 3 version.

import csv

DELIMITER = chr(255)
data = ["itemA", "itemB", "itemC",
        "Sentence that might contain commas, colons: or even \"quotes\"."]

with open('data.csv', 'wb') as outfile:
    writer = csv.writer(outfile, delimiter=DELIMITER)
    writer.writerow(data)

with open('data.csv', 'rb') as infile:
    reader = csv.reader(infile, delimiter=DELIMITER)
    for row in reader:
        print row

输出:

['itemA', 'itemB', 'itemC', 'Sentence that might contain commas, colons: or even "quotes".']

如果您不使用 csv 模块,而是手动写入和/或读取数据,然后它将d进行如下操作:

If you're not using the csv module and instead are writing and/or reading the data manually, then it would go something like this:

with open('data.csv', 'wb') as outfile:
    outfile.write(DELIMITER.join(data) + '\n')

with open('data.csv', 'rb') as infile:
    row = infile.readline().rstrip().split(DELIMITER)
    print row

这篇关于CSV编写需要唯一定界符的文本字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆