在导入的.csv中将字符串更改为Floats [英] Changing strings to Floats in an imported .csv

查看:255
本文介绍了在导入的.csv中将字符串更改为Floats的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我无法快速解决问题的快速提问:

Quick question for an issue I haven't managed to solve quickly:

我正在使用.csv文件,似乎找不到简单的问题将字符串转换为浮点数的方法。这是我的代码,

I'm working with a .csv file and can't seem to find a simple way to convert strings to floats. Here's my code,

import csv

def readLines():
    with open('testdata.csv', 'rU') as data:
        reader = csv.reader(data)
        row = list(reader)
        for x in row:
            for y in x:
                print type(float(y)),
readLines()

如您所见,它当前将打印变量行中x组列表中每个y元素的类型;这会产生一长串< type'flora'>。但这实际上并没有将每个元素更改为float,也没有将for循环设置为执行 float(y)(类型测试为每个元素返回'string')或者工作。

As you can see, it will currently print the type of every y element in x set of lists in the variable row; this produces a long list of "<type 'float'>". But this doesn't actually change each element to a float, nor does setting the for loop to execute float(y) (a type test returns 'string' for each element) work either.

我也试过了literal_eval,但也失败了。将列表元素更改为浮点数的唯一方法是使用列表推导或手动创建新列表,但会丢失每个列表的原始格式(作为一个较大列表中一组元素的列表)。

I also tried literal_eval, but that failed as well. The only way to change the list elements to floats is to create a new list, either with list comprehension or manually, but that loses the original formatting of each list (as lists of a set amount of elements within one larger list).

我认为整体问题实际上只是使用Python以.csv或excel格式读取,组织和合成数据的最简单方法是什么?

I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"

提前感谢那些有礼貌/知识渊博的人。

Thanks in advance to those courteous/knowledgeable enough to help.

推荐答案

你内置csv模块在处理混合数据类型时是非常原始的,在导入时进行所有类型转换是正确的,甚至在那里有一个非常有限的选项菜单,这会破坏大多数真实世界的数据集(不一致的引用)并且在布尔值和因子中转义,丢失或不完整的值,不匹配的Unicode编码导致字段内的幻像引用或转义字符,不完整的行将导致除外离子)。修复csv导入是 pandas 的无数好处之一。所以,你的最终答案确实是停止使用内置csv导入并开始使用pandas。但是,让我们从你的问题的字面答案开始。

You are correct that the builtin csv module is very primitive at handling mixed data-types, does all its type conversion at import-time, and even at that has a very restrictive menu of options, which will mangle most real-world datasets (inconsistent quoting and escaping, missing or incomplete values in Booleans and factors, mismatched Unicode encoding resulting in phantom quote or escape characters inside fields, incomplete lines will cause exception). Fixing csv import is one of countless benefits of pandas. So, your ultimate answer is indeed stop using builtin csv import and start using pandas. But let's start with the literal answer to your question.

首先你问 如何将字符串转换为浮点数,在csv import上 即可。答案是根据 csv.reader(...,quoting = csv.QUOTE_NONNUMERIC) .org / 3 / library / csv.htmlrel =noreferrer> csv doc

First you asked "How to convert strings to floats, on csv import". The answer to that is to open the csv.reader(..., quoting=csv.QUOTE_NONNUMERIC) as per the csv doc


csv.QUOTE_NONNUMERIC:指示读者将所有未引用的
字段转换为float类型。

csv.QUOTE_NONNUMERIC: Instructs the reader to convert all non-quoted fields to type float.

如果你没有引用所有引号字段(整数,浮点数,文本,布尔等)被转换为浮点数,这通常是一个坏主意,原因很多(布尔值中的缺失或NA值或因子将被静默压制)。而且它显然会在未加引号的文本字段上失败(抛出异常)。所以它很脆弱,需要用 try..catch 保护。

That works if you're ok with all unquoted fields (integer, float, text, Boolean etc.) being converted to float, which is generally a bad idea for many reasons (missing or NA values in Booleans or factors will get silently squelched). Moreover it will fail (throw exception) on unquoted text fields obviously. So it's brittle and needs to be protected with try..catch.

然后你问: '我想整个问题实际上只是使用Python以.csv或excel格式读取,组织和合成数据的最简单方法是什么?'
糟糕的csv.reader解决方案是打开 csv.reader(...,quoting = csv.QUOTE_NONNUMERIC)

但正如@geoffspear正确回复'你的整体问题的答案可能是熊猫,虽然它有点含糊不清。'

But as @geoffspear correctly replied 'The answer to your "overall question" may be "Pandas", although it's a bit vague.'

这篇关于在导入的.csv中将字符串更改为Floats的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆