确定 pandas 列数据类型 [英] Determining Pandas Column DataType
问题描述
有时,当数据导入到Pandas Dataframe时,它始终以object
类型导入.对于大多数操作来说这很好并且很好,但是我正在尝试创建一个自定义导出功能,我的问题是:
Sometimes when data is imported to Pandas Dataframe, it always imports as type object
. This is fine and well for doing most operations, but I am trying to create a custom export function, and my question is this:
- 有没有办法强迫熊猫推断输入数据的数据类型?
- 如果不是,那么在加载数据后是否有办法以某种方式推断数据类型?
我知道我可以告诉Pandas这是int,str等类型的.但是我不想这样做,我希望当用户导入或输入数据时,pandas能够足够聪明地知道所有数据类型.添加一列.
I know I can tell Pandas that this is of type int, str, etc.. but I don't want to do that, I was hoping pandas could be smart enough to know all the data types when a user imports or adds a column.
编辑-导入示例
a = ['a']
col = ['somename']
df = pd.DataFrame(a, columns=col)
print(df.dtypes)
>>> somename object
dtype: object
类型应该是字符串吗?
The type should be string?
推荐答案
这只是部分答案,但是您可以在整个DataFrame上获取变量中元素数据类型的频率计数,如下所示:
This is only a partial answer, but you can get frequency counts of the data type of the elements in a variable over the entire DataFrame as follows:
dtypeCount =[df.iloc[:,i].apply(type).value_counts() for i in range(df.shape[1])]
这将返回
dtypeCount
[<class 'numpy.int32'> 4
Name: a, dtype: int64,
<class 'int'> 2
<class 'str'> 2
Name: b, dtype: int64,
<class 'numpy.int32'> 4
Name: c, dtype: int64]
打印效果不佳,但是您可以按位置提取任何变量的信息:
It doesn't print this nicely, but you can pull out information for any variable by location:
dtypeCount[1]
<class 'int'> 2
<class 'str'> 2
Name: b, dtype: int64
这应该让您开始寻找导致问题的数据类型以及其中的数量.
which should get you started in finding what data types are causing the issue and how many of them there are.
然后您可以使用
df[df.iloc[:,1].map(lambda x: type(x) == str)]
a b c
1 1 n 4
3 3 g 6
数据
df = DataFrame({'a': range(4),
'b': [6, 'n', 7, 'g'],
'c': range(3, 7)})
这篇关于确定 pandas 列数据类型的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!