用 pandas 删除一列中的非数字行 [英] Remove non-numeric rows in one column with pandas
问题描述
有一个像下面这样的数据框,它有一个不干净的列"id",它应该是数字列
There is a dataframe like the following, and it has one unclean column 'id' which it sholud be numeric column
id, name
1, A
2, B
3, C
tt, D
4, E
5, F
de, G
由于tt和de不是数字值,因此是否有删除行的简洁方法
Is there a concise way to remove the rows because tt and de are not numeric values
tt,D
de,G
使数据框干净吗?
id, name
1, A
2, B
3, C
4, E
5, F
推荐答案
You could use standard method of strings isnumeric
and apply it to each value in your id
column:
import pandas as pd
from io import StringIO
data = """
id,name
1,A
2,B
3,C
tt,D
4,E
5,F
de,G
"""
df = pd.read_csv(StringIO(data))
In [55]: df
Out[55]:
id name
0 1 A
1 2 B
2 3 C
3 tt D
4 4 E
5 5 F
6 de G
In [56]: df[df.id.apply(lambda x: x.isnumeric())]
Out[56]:
id name
0 1 A
1 2 B
2 3 C
4 4 E
5 5 F
或者如果您想使用id
作为索引,则可以执行以下操作:
Or if you want to use id
as index you could do:
In [61]: df[df.id.apply(lambda x: x.isnumeric())].set_index('id')
Out[61]:
name
id
1 A
2 B
3 C
4 E
5 F
编辑.添加时间
尽管使用pd.to_numeric
的情况未使用apply
方法,但它几乎比对str
列应用np.isnumeric
的速度慢两倍.我也使用pandas添加选项 str.isnumeric
,它键入较少,但比使用pd.to_numeric
更快.但是pd.to_numeric
更通用,因为它可以与任何数据类型(不仅是字符串)一起使用.
Edit. Add timings
Although case with pd.to_numeric
is not using apply
method it is almost two times slower than with applying np.isnumeric
for str
columns. Also I add option with using pandas str.isnumeric
which is less typing and still faster then using pd.to_numeric
. But pd.to_numeric
is more general because it could work with any data types (not only strings).
df_big = pd.concat([df]*10000)
In [3]: df_big = pd.concat([df]*10000)
In [4]: df_big.shape
Out[4]: (70000, 2)
In [5]: %timeit df_big[df_big.id.apply(lambda x: x.isnumeric())]
15.3 ms ± 2.02 ms per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [6]: %timeit df_big[df_big.id.str.isnumeric()]
20.3 ms ± 171 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
In [7]: %timeit df_big[pd.to_numeric(df_big['id'], errors='coerce').notnull()]
29.9 ms ± 682 µs per loop (mean ± std. dev. of 7 runs, 10 loops each)
这篇关于用 pandas 删除一列中的非数字行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!