pandas 阅读科学记数法和改变 [英] Pandas read scientific notation and change

查看:293
本文介绍了 pandas 阅读科学记数法和改变的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个数据帧在熊猫,我正在从一个csv。

I have a dataframe in pandas that i'm reading in from a csv.

我的一个列具有包含 NaN c>和科学记数法,即 5.3e-23

One of my columns has values that include NaN, floats, and scientific notation, i.e. 5.3e-23

我的麻烦是, ,pandas将这些数据视为对象dtype ,而不是 float32 ,它应该是。我猜想,因为它认为科学记数法是字符串。

My trouble is that as I read in the csv, pandas views these data as an object dtype, not the float32 that it should be. I guess because it thinks the scientific notation entries are strings.

我试图使用 df ['speed']。astype(float)转换dtype并尝试使用 df = pd.read_csv('path / test.csv',dtype = {'speed':np.float64},na_values = ['n / a'])。这会抛出错误 ValueError:不能安全地转换< f4的传递用户dtype为列中的对象类型数据...

I've tried to convert the dtype using df['speed'].astype(float) after it's been read in, and tried to specify the dtype as it's being read in using df = pd.read_csv('path/test.csv', dtype={'speed': np.float64}, na_values=['n/a']). This throws the error ValueError: cannot safely convert passed user dtype of <f4 for object dtyped data in column ...

到目前为止,这两种方法都没有奏效。我缺少一个令人难以置信的容易修复的东西?

So far neither of these methods have worked. Am I missing something that is an incredibly easy fix?

似乎建议我可以指定可能会抛出错误的已知数字,但我更喜欢转换

this question seems to suggest I can specify known numbers that might throw an error, but i'd prefer to convert the scientific notation back to a float if possible.

已编辑,可根据要求在CSV中显示数据

EDITED TO SHOW DATA FROM CSV AS REQUESTED IN COMMENTS

7425616,12375,28,2015-08-09 11:07:56,0,-8.18644,118.21463,2,0,2
7425615,12375,28,2015-08-09 11:04:15,0,-8.18644,118.21463,2,NaN,2
7425617,12375,28,2015-08-09 11:09:38,0,-8.18644,118.2145,2,0.14,2
7425592,12375,28,2015-08-09 10:36:34,0,-8.18663,118.2157,2,0.05,2
65999,1021,29,2015-01-30 21:43:26,0,-8.36728,118.29235,1,0.206836151554794,2
204958,1160,30,2015-02-03 17:53:37,2,-8.36247,118.28664,1,9.49242000872744e-05,7
384739,,32,2015-01-14 16:07:02,1,-8.36778,118.29206,2,Infinity,4
275929,1160,30,2015-02-17 03:13:51,1,-8.36248,118.28656,1,113.318511172611,5


推荐答案

我意识到是导致我的数据中的问题的 infinity 语句。删除这个用find和replace工作。

I realised it was the infinity statement causing the issue in my data. Removing this with a find and replace worked.

@Anton Protopopov的答案也像@ DSM的评论一样,我不打字 df ['speed'] = df ['speed']。 astype(float)

@Anton Protopopov answer also works as did @DSM's comment regarding me not typing df['speed'] = df['speed'].astype(float).

感谢您的帮助。

这篇关于 pandas 阅读科学记数法和改变的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆