什么时候在python中应用(pd.to_numeric)和何时astype(np.float64)? [英] When to apply(pd.to_numeric) and when to astype(np.float64) in python?
问题描述
我有一个名为xiv
的熊猫DataFrame对象,该对象有一列int64
体积测量值.
I have a pandas DataFrame object named xiv
which has a column of int64
Volume measurements.
In[]: xiv['Volume'].head(5)
Out[]:
0 252000
1 484000
2 62000
3 168000
4 232000
Name: Volume, dtype: int64
我还阅读了其他帖子(例如此和此),提出了以下解决方案.但是当我使用这两种方法时,它似乎都没有改变基础数据的dtype
:
I have read other posts (like this and this) that suggest the following solutions. But when I use either approach, it doesn't appear to change the dtype
of the underlying data:
In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
In[]: xiv['Volume'].dtypes
Out[]:
dtype('int64')
或者...
In[]: xiv['Volume'] = pd.to_numeric(xiv['Volume'])
Out[]: ###omitted for brevity###
In[]: xiv['Volume'].dtypes
Out[]:
dtype('int64')
In[]: xiv['Volume'] = xiv['Volume'].apply(pd.to_numeric)
In[]: xiv['Volume'].dtypes
Out[]:
dtype('int64')
我还尝试过制作一个单独的熊猫Series
,并使用该系列上面列出的方法,并将其重新分配给x['Volume']
对象,这是一个pandas.core.series.Series
对象.
I've also tried making a separate pandas Series
and using the methods listed above on that Series and reassigning to the x['Volume']
obect, which is a pandas.core.series.Series
object.
但是,我已经找到了使用numpy
包的float64
类型解决此问题的方法- 这可行,但我不知道为什么它与众不同
I have, however, found a solution to this problem using the numpy
package's float64
type - this works but I don't know why it's different.
In[]: xiv['Volume'] = xiv['Volume'].astype(np.float64)
In[]: xiv['Volume'].dtypes
Out[]:
dtype('float64')
有人可以解释如何使用pandas
库完成numpy
库使用其float64
类似乎可以轻松完成的工作;也就是说,将xiv
数据帧中的列转换为适当的float64
.
Can someone explain how to accomplish with the pandas
library what the numpy
library seems to do easily with its float64
class; that is, convert the column in the xiv
DataFrame to a float64
in place.
推荐答案
如果您已经拥有数字dtype(int8|16|32|64
,float64
,boolean
),则可以使用将其转换为另一个数字" dtype.熊猫 .astype()方法.
If you already have numeric dtypes (int8|16|32|64
,float64
,boolean
) you can convert it to another "numeric" dtype using Pandas .astype() method.
演示:
In [90]: df = pd.DataFrame(np.random.randint(10**5,10**7,(5,3)),columns=list('abc'), dtype=np.int64)
In [91]: df
Out[91]:
a b c
0 9059440 9590567 2076918
1 5861102 4566089 1947323
2 6636568 162770 2487991
3 6794572 5236903 5628779
4 470121 4044395 4546794
In [92]: df.dtypes
Out[92]:
a int64
b int64
c int64
dtype: object
In [93]: df['a'] = df['a'].astype(float)
In [94]: df.dtypes
Out[94]:
a float64
b int64
c int64
dtype: object
对于object
(字符串)dtypes无效,不能转换为数字:
It won't work for object
(string) dtypes, that can't be converted to numbers:
In [95]: df.loc[1, 'b'] = 'XXXXXX'
In [96]: df
Out[96]:
a b c
0 9059440.0 9590567 2076918
1 5861102.0 XXXXXX 1947323
2 6636568.0 162770 2487991
3 6794572.0 5236903 5628779
4 470121.0 4044395 4546794
In [97]: df.dtypes
Out[97]:
a float64
b object
c int64
dtype: object
In [98]: df['b'].astype(float)
...
skipped
...
ValueError: could not convert string to float: 'XXXXXX'
所以我们在这里要使用 pd.to_numeric()方法:
So here we want to use pd.to_numeric() method:
In [99]: df['b'] = pd.to_numeric(df['b'], errors='coerce')
In [100]: df
Out[100]:
a b c
0 9059440.0 9590567.0 2076918
1 5861102.0 NaN 1947323
2 6636568.0 162770.0 2487991
3 6794572.0 5236903.0 5628779
4 470121.0 4044395.0 4546794
In [101]: df.dtypes
Out[101]:
a float64
b float64
c int64
dtype: object
这篇关于什么时候在python中应用(pd.to_numeric)和何时astype(np.float64)?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!