如何从输入数据集中删除非数字列? [英] How do you delete a non-numeric column from an input dataset?
问题描述
例如,如果我要考虑花的种类,花瓣数量,发芽时间和用户ID
,则用户ID
将在其中带有连字符.因此,在数据分析中,我不想使用它.我知道我可以对其进行硬编码,但是我想这样做,所以当我输入任何数据集时,它将自动删除具有non-numeric
输入的列.
不清楚的问题.我正在使用熊猫从csv文件读取数据.
示例:
Species NPetals GermTime UserID
1 R. G 5 4 65-78
2 R. F 5 3 65-81
我想从dataset
中删除UserID
和Species
列.
从文档中,您可以使用 np dtype层次结构 >
For example, if I want to consider a flower species, number of petals, germination time and user ID
, the user ID
is going to have a hyphen in there. So in my data analysis, I don't want to use it. I'm aware that I can hard code it in, but I want to make it so when I input any dataset, it will automatically remove columns with non-numeric
inputs.
Edit: Unclear question. I'm reading in data from a csv file using pandas.
Example:
Species NPetals GermTime UserID
1 R. G 5 4 65-78
2 R. F 5 3 65-81
I want to remove the UserID
and Species
columns from the dataset
.
From the docs you can just select the numeric data by filtering using select_dtypes
:
In [5]:
df = pd.DataFrame({'a': np.random.randn(6).astype('f4'),'b': [True, False] * 3,'c': [1.0, 2.0] * 3})
df
Out[5]:
a b c
0 0.338710 True 1
1 1.530095 False 2
2 -0.048261 True 1
3 -0.505742 False 2
4 0.729667 True 1
5 -0.634482 False 2
In [15]:
df.select_dtypes(include=[np.number])
Out[15]:
a c
0 0.338710 1
1 1.530095 2
2 -0.048261 1
3 -0.505742 2
4 0.729667 1
5 -0.634482 2
You can pass any valid np dtype hierarchy
这篇关于如何从输入数据集中删除非数字列?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!