从 CSV 文件读取数据并将字符串转换为正确的数据类型,包括整数列列表 [英] Read data from CSV file and transform from string to correct data-type, including a list-of-integer column

查看:60
本文介绍了从 CSV 文件读取数据并将字符串转换为正确的数据类型,包括整数列列表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当我从 CSV 文件读回数据时,每个单元格都被解释为一个字符串.

When I read data back in from a CSV file, every cell is interpreted as a string.

  • 如何将我读入的数据自动转换为正确的类型?
  • 或者更好:如何告诉 csv 阅读器每列的正确数据类型?

(我写了一个二维列表,其中每一列都是不同的类型(bool、str、int、整数列表),输出到 CSV 文件中.)

(I wrote a 2-dimensional list, where each column is of a different type (bool, str, int, list of integer), out to a CSV file.)

示例数据(在 CSV 文件中):

Sample data (in CSV file):

IsActive,Type,Price,States
True,Cellphone,34,"[1, 2]"
,FlatTv,3.5,[2]
False,Screen,100.23,"[5, 1]"
True,Notebook, 50,[1]

推荐答案

作为 docs 解释,CSV 阅读器不执行自动数据转换.您有 QUOTE_NONNUMERIC 格式选项,但这只会将所有未引用的字段转换为浮点数.这是与其他 csv 阅读器非常相似的行为.

As the docs explain, the CSV reader doesn't perform automatic data conversion. You have the QUOTE_NONNUMERIC format option, but that would only convert all non-quoted fields into floats. This is a very similar behaviour to other csv readers.

我不相信 Python 的 csv 模块对这种情况有任何帮助.正如其他人已经指出的那样,literal_eval() 是一个更好的选择.

I don't believe Python's csv module would be of any help for this case at all. As others have already pointed out, literal_eval() is a far better choice.

以下确实有效并进行了转换:

The following does work and converts:

  • 字符串
  • 内部
  • 漂浮
  • 列表
  • 字典

您也可以将它用于布尔值和 NoneType,尽管这些必须相应地格式化才能通过 literal_eval().LibreOffice Calc 以大写字母显示布尔值,而在 Python 中布尔值大写.此外,您必须将空字符串替换为 None(不带引号)

You may also use it for booleans and NoneType, although these have to be formatted accordingly for literal_eval() to pass. LibreOffice Calc displays booleans in capital letters, when in Python booleans are Capitalized. Also, you would have to replace empty strings with None (without quotes)

我正在为 mongodb 编写一个导入器来完成所有这些.以下是我目前编写的部分代码.

I'm writing an importer for mongodb that does all this. The following is part of the code I've written so far.

[注意:我的 csv 使用制表符作为字段分隔符.您可能还想添加一些异常处理]

[NOTE: My csv uses tab as field delimiter. You may want to add some exception handling too]

def getFieldnames(csvFile):
    """
    Read the first row and store values in a tuple
    """
    with open(csvFile) as csvfile:
        firstRow = csvfile.readlines(1)
        fieldnames = tuple(firstRow[0].strip('
').split("	"))
    return fieldnames

def writeCursor(csvFile, fieldnames):
    """
    Convert csv rows into an array of dictionaries
    All data types are automatically checked and converted
    """
    cursor = []  # Placeholder for the dictionaries/documents
    with open(csvFile) as csvFile:
        for row in islice(csvFile, 1, None):
            values = list(row.strip('
').split("	"))
            for i, value in enumerate(values):
                nValue = ast.literal_eval(value)
                values[i] = nValue
            cursor.append(dict(zip(fieldnames, values)))
    return cursor

这篇关于从 CSV 文件读取数据并将字符串转换为正确的数据类型,包括整数列列表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆