如何在 pandas DataFrame中获取所有带有无效np.datetime64日期的行 [英] How to get all rows with invalid np.datetime64 dates in a pandas DataFrame

查看:143
本文介绍了如何在 pandas DataFrame中获取所有带有无效np.datetime64日期的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个Pandas DataFrame,其中有一列带有日期字符串的 date_col。我想过滤所有行的DataFrame,如果通过 numpy.datetime64 ValueError $ c>。我正在寻找类似的东西:

I have a pandas DataFrame which has a column, "date_col" with date strings. I would like to filter the DataFrame for all rows where the date strings in this column would throw a ValueError if parsed by numpy.datetime64. I'm looking for something along the lines of:

bad_rows = df[numpy.datetime64(df["date_col"]) is False]

除了不检查 False ,我想检查 ValueError 是否引发。有什么方法可以在pandas DataFrame中进行这种过滤?

Except that instead of checking for False, I'd like to check whether or not a ValueError is raised. Is there some way to do this type of filtering in a pandas DataFrame?

我尝试执行以下操作:

df = pd.DataFrame({"date_col":("2015-04-31", "2015-04-30")})
result = pd.to_datetime(df["date_col"], errors='coerce')

但是我得到了:

>>> result
0    2015-04-31
1    2015-04-30

检查每个值的类型表明它们仍然是字符串。

Checking the type of each value reveals that they're still strings.

>>> result[0]
'2015-04-31'

>>> df.info()
<class 'pandas.core.frame.DataFrame'>
Int64Index: 2 entries, 0 to 1
Data columns (total 1 columns):
date_col    2 non-null object
dtypes: object(1)

如果我尝试:

>>> result = pd.to_datetime(df["date_col"], errors='coerce' ,format='%Y%m%d')

我得到:

Traceback (most recent call last):
  File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 330, in _convert_listlike
    values, tz = tslib.datetime_to_datetime64(arg)
  File "pandas/tslib.pyx", line 1371, in pandas.tslib.datetime_to_datetime64 (pandas/tslib.c:23790)
TypeError: Unrecognized value type: <class 'str'>

During handling of the above exception, another exception occurred:

Traceback (most recent call last):
  File "<stdin>", line 1, in <module>
  File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 340, in to_datetime
    values = _convert_listlike(arg.values, False, format)
  File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 333, in _convert_listlike
    raise e
  File "/Users/lib/python3.4/site-packages/pandas/tseries/tools.py", line 307, in _convert_listlike
    arg, format, exact=exact, coerce=coerce
  File "pandas/tslib.pyx", line 2347, in pandas.tslib.array_strptime (pandas/tslib.c:39562)
ValueError: time data '2015-04-31' does not match format '%Y%m%d' (match)

我的熊猫版本是0.16.1,我的numpy版本是1.9.2。

My pandas version is 0.16.1 and my numpy version is 1.9.2.

这有效(适用于熊猫0.16.1):

df = pd.DataFrame({"date_col":("2015-04-31", "2015-04-30")})
>>> pd.to_datetime(df['date_col'], coerce=True)
0          NaT
1   2015-04-30
Name: date_col, dtype: datetime64[ns]
>>> pd.to_datetime(df['date_col'], coerce=True).isnull()
0     True
1    False
Name: date_col, dtype: bool


推荐答案

只要做 pd.to_datetime(df ['date_col'], errors ='coerce')这将产生 NaT 其中字符串无效

just do pd.to_datetime(df['date_col'], errors='coerce') this will produce NaT where the strings are invalid

示例:

In [307]:
df = pd.DataFrame({'date':['2015-02-01', 'sausage', '2011-01-33']})
df

Out[307]:
         date
0  2015-02-01
1     sausage
2  2011-01-33

In [308]:
pd.to_datetime(df['date'], errors='coerce')

Out[308]:
0   2015-02-01
1          NaT
2          NaT
Name: date, dtype: datetime64[ns]

随后调用 isnull()将在值无效的地方产生 True

A subsequent call to isnull() will produce True where the values are invalid:

In [309]:
pd.to_datetime(df['date'], errors='coerce').isnull()

Out[309]:
0    False
1     True
2     True
Name: date, dtype: bool

编辑

看到您正在使用 0.16.1 api稍有不同,以下方法应该起作用:

Seeing as you're using 0.16.1 the api is a little different, the following should work:

result= pd.to_datetime(df['date_col'], coerce=True)

这篇关于如何在 pandas DataFrame中获取所有带有无效np.datetime64日期的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆