使用 pandas DateTimeIndex提取年份但出现错误 [英] Extracting year using pandas DateTimeIndex but getting error

查看:348
本文介绍了使用 pandas DateTimeIndex提取年份但出现错误的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用pandas版本0.16.2.我要提取日期列的年"和月".

I am using pandas version 0.16.2. I want to extract Year and Month of the date columns.

我读取了数据

df = pd.read_csv(raw_data.csv,
        parse_dates=['EOM_DEFAULT_DATE','RESOLUTION_DATE'], low_memory=False)

"EOM_DEFAULT_DATE"看起来像:

'EOM_DEFAULT_DATE' looks like:

    0    31-JAN-07 12.00.00.000000000 AM
    1    31-JAN-07 12.00.00.000000000 AM
    Name: EOM_DEFAULT_DATE, dtype: object

解决日期"看起来像:

    0   2008-03-31
    1   2008-03-31
    Name: RESOLUTION_DATE, dtype: datetime64[ns]

具体来说,我想用这种方式提取Year,但是会出现此错误:

Specifically, I want to extract Year this way, but get this error:

      df['YEAR']=pd.DatetimeIndex(df['RESOLUTION_DATE']).year

      --- 
      A value is trying to be set on a copy of a slice from a DataFrame.

      Try using .loc[row_indexer,col_indexer] = value instead

此外,尝试提取月份时出现错误:

Also, I get an error when trying to extract month:

      df['MNTH']=pd.DatetimeIndex(df['EOM_DEFAULT_DATE']).month

      ---
      File "<ipython-input-61-d7aec9a17a8f>", line 1, in <module>

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\util\decorators.py", line 88, in wrapper
return func(*args, **kwargs)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 292, in __new__
yearfirst=yearfirst)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 1936, in _str_to_dt_array
data = _algos.arrmap_object(arr, parser)

      File "pandas\src\generated.pyx", line 2295, in pandas.algos.arrmap_object (pandas\algos.c:77984)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\index.py", line 1932, in parser
yearfirst=yearfirst)

      File "C:\Continuum\Anaconda\lib\site-packages\pandas\tseries\tools.py", line 494, in parse_time_string
raise DateParseError(e)

      DateParseError: unknown string format

使用此确切的代码,我知道其他人可以很好地运行该代码,并提取年份和月份.我想念什么?

Using this exact code, I know others can run the code fine, and extract year and month. What am I missing?

推荐答案

您可以使用.dt访问器在值为datetime64pd.Series上获取年份和月份.

You can use a .dt accessor to get the year and month on a pd.Series whose values are datetime64.

df['YEAR'] = df['RESOLUTION_DATE'].dt.year 

要解析日期,您需要提供日期时间格式.

To parse the date, you need to supply your datetime format.

dt_str = '31-JAN-07 12.00.00.000000000 AM'

fmt = '%d-%b-%y %H.%M.%S.%f %p'
pd.to_datetime(dt_str, format=fmt)

#output: Timestamp('2007-01-31 12:00:00')

也许在读取csv时尝试不解析日期,因为您有两个日期列,并且它们具有不同的格式.只需读入原始字符串即可.然后将字符串转换为大熊猫中的datetime对象.

Maybe try not parsing the date when reading csv because you have two date columns and they have different formats. Just read-in the raw string. and then convert string to datetime object in pandas.

df['EOM_DEFAULT_DATE'] = pd.to_datetime(df['EOM_DEFAULT_DATE'], format='%d-%b-%y %H.%M.%S.%f %p')
df['RESOLUTION_DATE'] = pd.to_datetime(df['RESOLUTION_DATE'])

这篇关于使用 pandas DateTimeIndex提取年份但出现错误的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆