在导入的 .csv 中将字符串更改为浮动 [英] Changing strings to floats in an imported .csv

查看:30
本文介绍了在导入的 .csv 中将字符串更改为浮动的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

针对我未能快速解决的问题的快速提问:

我正在处理 .csv 文件,但似乎找不到将字符串转换为浮点数的简单方法.这是我的代码,

导入csvdef readLines():以 open('testdata.csv', 'rU') 作为数据:阅读器 = csv.reader(数据)行 = 列表(读者)对于 x 行:对于 x 中的 y:打印类型(浮点(y)),读行()

如您所见,它当前将打印变量行中 x 组列表中每个 y 元素的类型;这会产生一长串 "".但这实际上并没有将每个元素更改为浮点数,也没有将 for 循环设置为执行 float(y)(类型测试为每个元素返回 'string')工作.

我也尝试了literal_eval,但也失败了.将列表元素更改为浮动的唯一方法是使用列表理解或手动创建一个新列表,但这会丢失每个列表的原始格式(作为一个更大列表中一组元素的列表).

我想整个问题实际上只是使用 Python 以 .csv 或 excel 格式读取、组织和合成数据的最简单方法是什么?"

预先感谢那些有礼貌/知识渊博的人提供帮助.

解决方案

Python 的内置 csv 模块在处理混合数据类型方面非常原始,在导入时进行所有类型转换,甚至在那里有一个非常严格的选项菜单,它将破坏大多数现实世界的数据集(不一致的引用和转义,布尔值和因子中的值缺失或不完整,Unicode 编码不匹配导致字段内的幻像引用或转义字符,不完整的行将导致例外).修复 csv 导入是 pandas 的无数好处之一. 所以,你的最终答案确实是停止使用内置csv 导入并开始使用熊猫.但是,让我们从对您问题的字面回答开始.

首先,您询问了如何在 csv 导入时将字符串转换为浮点数".答案是按照 csv 文档

<块引用>

csv.QUOTE_NONNUMERIC:指示阅读器转换所有未引用的用于输入 float 的字段.

如果您可以将所有未加引号的字段(整数、浮点数、文本、布尔值等)转换为浮点数,这将起作用,这通常是一个坏主意,原因有很多(布尔值或因子中的缺失值或 NA 值会得到)无声无息).此外,它显然会在未引用的文本字段上失败(抛出异常).所以它很脆弱,需要用 try..catch 来保护.

然后您问:'我想整个问题实际上只是使用 Python 以 .csv 或 excel 格式读取、组织和合成数据的最简单方法是什么?"csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)

打开糟糕的 csv.reader 解决方案

但正如@geoffspear 正确回答的那样'你的整体问题"的答案可能是熊猫",尽管它有点含糊.'

Quick question for an issue I haven't managed to solve quickly:

I'm working with a .csv file and can't seem to find a simple way to convert strings to floats. Here's my code,

import csv

def readLines():
    with open('testdata.csv', 'rU') as data:
        reader = csv.reader(data)
        row = list(reader)
        for x in row:
            for y in x:
                print type(float(y)),
readLines()

As you can see, it will currently print the type of every y element in x set of lists in the variable row; this produces a long list of "<type 'float'>". But this doesn't actually change each element to a float, nor does setting the for loop to execute float(y) (a type test returns 'string' for each element) work either.

I also tried literal_eval, but that failed as well. The only way to change the list elements to floats is to create a new list, either with list comprehension or manually, but that loses the original formatting of each list (as lists of a set amount of elements within one larger list).

I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"

Thanks in advance to those courteous/knowledgeable enough to help.

解决方案

You are correct that Python's builtin csv module is very primitive at handling mixed data-types, does all its type conversion at import-time, and even at that has a very restrictive menu of options, which will mangle most real-world datasets (inconsistent quoting and escaping, missing or incomplete values in Booleans and factors, mismatched Unicode encoding resulting in phantom quote or escape characters inside fields, incomplete lines will cause exception). Fixing csv import is one of countless benefits of pandas. So, your ultimate answer is indeed stop using builtin csv import and start using pandas. But let's start with the literal answer to your question.

First you asked "How to convert strings to floats, on csv import". The answer to that is to open the csv.reader(..., quoting=csv.QUOTE_NONNUMERIC) as per the csv doc

csv.QUOTE_NONNUMERIC: Instructs the reader to convert all non-quoted fields to type float.

That works if you're ok with all unquoted fields (integer, float, text, Boolean etc.) being converted to float, which is generally a bad idea for many reasons (missing or NA values in Booleans or factors will get silently squelched). Moreover it will fail (throw exception) on unquoted text fields obviously. So it's brittle and needs to be protected with try..catch.

Then you asked: 'I suppose the overall question is really just "What's the easiest way to read, organize, and synthesize data in .csv or excel format using Python?"' to which the crappy csv.reader solution is to open with csv.reader(..., quoting=csv.QUOTE_NONNUMERIC)

But as @geoffspear correctly replied 'The answer to your "overall question" may be "Pandas", although it's a bit vague.'

这篇关于在导入的 .csv 中将字符串更改为浮动的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆