将整个pandas数据帧转换为pandas中的整数(0.17.0) [英] convert entire pandas dataframe to integers in pandas (0.17.0)

查看:120
本文介绍了将整个pandas数据帧转换为pandas中的整数(0.17.0)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我的问题与此问题非常相似,但我需要转换我的整个数据框,而不仅仅是转换一系列数据. to_numeric函数一次只能在一个系列上使用,不能很好地替代不推荐使用的convert_objects命令.在新的Pandas版本中,有没有办法获得与convert_objects(convert_numeric=True)命令相似的结果?

My question is very similar to this one, but I need to convert my entire dataframe instead of just a series. The to_numeric function only works on one series at a time and is not a good replacement for the deprecated convert_objects command. Is there a way to get similar results to the convert_objects(convert_numeric=True) command in the new pandas release?

谢谢MikeMüller的例子.如果所有值都可以转换为整数,则df.apply(pd.to_numeric)效果很好.如果我在数据框中有无法转换为整数的字符串怎么办? 示例:

Thank you Mike Müller for your example. df.apply(pd.to_numeric) works very well if the values can all be converted to integers. What if in my dataframe I had strings that could not be converted into integers? Example:

df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
df.dtypes
Out[59]: 
Words    object
ints     object
dtype: object

然后我可以运行不赞成使用的函数并获取:

Then I could run the deprecated function and get:

df = df.convert_objects(convert_numeric=True)
df.dtypes
Out[60]: 
Words    object
ints      int64
dtype: object

运行apply命令会给我错误,即使尝试并处理也是如此.

Running the apply command gives me errors, even with try and except handling.

推荐答案

所有列均可转换

您可以将该功能应用于所有列:

All columns convertible

You can apply the function to all columns:

df.apply(pd.to_numeric)

示例:

>>> df = pd.DataFrame({'a': ['1', '2'], 
                       'b': ['45.8', '73.9'],
                       'c': [10.5, 3.7]})

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null object
b    2 non-null object
c    2 non-null float64
dtypes: float64(1), object(2)
memory usage: 64.0+ bytes

>>> df.apply(pd.to_numeric).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 3 columns):
a    2 non-null int64
b    2 non-null float64
c    2 non-null float64
dtypes: float64(2), int64(1)
memory usage: 64.0 bytes

并非所有列均可转换

pd.to_numeric具有关键字参数errors:

Not all columns convertible

pd.to_numeric has the keyword argument errors:

  Signature: pd.to_numeric(arg, errors='raise')
  Docstring:
  Convert argument to a numeric type.

Parameters
----------
arg : list, tuple or array of objects, or Series
errors : {'ignore', 'raise', 'coerce'}, default 'raise'
    - If 'raise', then invalid parsing will raise an exception
    - If 'coerce', then invalid parsing will be set as NaN
    - If 'ignore', then invalid parsing will return the input

将其设置为ignore时,如果无法将其转换为数字类型,则该列将保持不变.

Setting it to ignore will return the column unchanged if it cannot be converted into a numeric type.

正如安东·普罗托波波夫(Anton Protopopov)所指出的那样,最优雅的方法是将ignore作为关键字参数提供给apply():

As pointed out by Anton Protopopov, the most elegant way is to supply ignore as keyword argument to apply():

>>> df = pd.DataFrame({'ints': ['3', '5'], 'Words': ['Kobe', 'Bryant']})
>>> df.apply(pd.to_numeric, errors='ignore').info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

我以前建议的方式,使用 partial functools模块中的>,则更为冗长:

My previously suggested way, using partial from the module functools, is more verbose:

>>> from functools import partial
>>> df = pd.DataFrame({'ints': ['3', '5'], 
                       'Words': ['Kobe', 'Bryant']})
>>> df.apply(partial(pd.to_numeric, errors='ignore')).info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 2 columns):
Words    2 non-null object
ints     2 non-null int64
dtypes: int64(1), object(1)
memory usage: 48.0+ bytes

这篇关于将整个pandas数据帧转换为pandas中的整数(0.17.0)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆