在Pandas中声明列数据类型 [英] Asserting column(s) data type in Pandas

查看:146
本文介绍了在Pandas中声明列数据类型的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试找到一种更好的方法来断言给定数据框的Python/Pandas中的列数据类型.

I'm trying to find a better way to assert the column data type in Python/Pandas of a given dataframe.

例如:

import pandas as pd
t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer']})

我想断言数据框中的特定列是数字.这就是我所拥有的:

I would like to assert that specific columns in the data frame are numeric. Here's what I have:

numeric_cols = ['a', 'b']  # These will be given
assert [x in ['int64','float'] for x in [t[y].dtype for y in numeric_cols]]

这最后一个断言行感觉不是很pythonic.也许是这样,而我只是在难以理解的一行中塞满了所有内容.有没有更好的办法?我想写些类似的东西:

This last assert line doesn't feel very pythonic. Maybe it is and I'm just cramming it all in one hard to read line. Is there a better way? I would like to write something like:

assert t[numeric_cols].dtype.isnumeric()

虽然我似乎找不到类似的东西.

I can't seem to find something like that though.

推荐答案

您可以使用ptypes.is_numeric_dtype标识数字列,使用ptypes.is_string_dtype标识类似字符串的列,并使用ptypes.is_datetime64_any_dtype标识datetime64列:

You could use ptypes.is_numeric_dtype to identify numeric columns, ptypes.is_string_dtype to identify string-like columns, and ptypes.is_datetime64_any_dtype to identify datetime64 columns:

import pandas as pd
import pandas.api.types as ptypes

t = pd.DataFrame({'a':[1,2,3], 'b':[2,6,0.75], 'c':['foo','bar','beer'],
              'd':pd.date_range('2000-1-1', periods=3)})
cols_to_check = ['a', 'b']

assert all(ptypes.is_numeric_dtype(t[col]) for col in cols_to_check)
# True
assert ptypes.is_string_dtype(t['c'])
# True
assert ptypes.is_datetime64_any_dtype(t['d'])
# True


pandas.api.types模块(我别名为ptypes)同时具有is_datetime64_any_dtypeis_datetime64_dtype功能.区别在于他们如何处理时区感知的数组式对象:


The pandas.api.types module (which I aliased to ptypes) has both a is_datetime64_any_dtype and a is_datetime64_dtype function. The difference is in how they treat timezone-aware array-likes:

In [239]: ptypes.is_datetime64_any_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
Out[239]: True

In [240]: ptypes.is_datetime64_dtype(pd.DatetimeIndex([1, 2, 3], tz="US/Eastern"))
Out[240]: False

这篇关于在Pandas中声明列数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆