错误: pandas 数字列上的代码因字符串格式错误而中断 [英] error: code on pandas numerical column breaks with string formatting error

查看:100
本文介绍了错误: pandas 数字列上的代码因字符串格式错误而中断的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在读取带有pandas的表,并且其中一列的日期格式为YYYYMMMDD.到目前为止,在我所有的尝试中,它都作为数字列读取.

I am reading in a table with pandas, and one of the columns has dates in the format YYYYMMMDD. It is read in as a numerical column in all my attempts so far.

我可以先用笨拙的代码正确地(尽管是缓慢地)消化它,但随后的当前版本会以我不理解的方式出现故障.

I could digest it correctly (though slowly) with clunky code first, but then the current version hiccups in a way I don't understand.

所以,这可行:

treatments['month'] = treatments['INDATUMA'] % 10000
treatments['day'] = treatments['INDATUMA'] % 100
treatments['month'] = (treatments['month']-treatments['day'])/100  

(尽管这是最后一次在较小的数据帧中运行,但当前版本是在所有这些数据帧的连接上运行的.在较小的测试数据中,代码仍然可以正常运行,并破坏了整个数据.)

(Though this ran last time in smaller data frames, the current version ran on the concatenation of all of them. In smaller test data, the code still runs fine, and breaks on the entire data.)

此中断:

all_treatments['month'] = all_treatments.INDATUMA % 10000 // 100

这是错误消息:

File "treatments2_noiopro.py", line 92, in <module>
   all_treatments['month'] = all_treatments.INDATUMA % 10000 // 100
 File "/home/seidav/anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 532, in wrapper
   return left._constructor(wrap_results(na_op(lvalues, rvalues)),
 File "/home/seidav/anaconda/lib/python2.7/site-packages/pandas/core/ops.py", line 479, in na_op
   result[mask] = op(x[mask], y)
TypeError: not all arguments converted during string formatting

我正在Linux下使用pandas 0.16.2 np19py26_0和python 2.7.10 0版本.

I am using versions pandas 0.16.2 np19py26_0 and python 2.7.10 0 under Linux.

推荐答案

我认为最简单的方法是在最终的串联数据帧上使用pandas本机日期时间功能,例如

I think the easiest way to do this is to use pandas native datetime functionality on the final concatenated dataframe, e.g.

treatments['date'] = pandas.to_datetime(treatments['INDATUMA'])

#Now you can split up the date easy as pie
treatments['year'] = treatments['date'].dt.year
treatments['month'] = treatments['date'].dt.month
treatments['day'] = treatments['date'].dt.day

已更新

这篇关于错误: pandas 数字列上的代码因字符串格式错误而中断的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆