尝试将日志方法应用于Python中的 pandas 数据框列时出错 [英] Error when trying to apply log method to pandas data frame column in Python

查看:231
本文介绍了尝试将日志方法应用于Python中的 pandas 数据框列时出错的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

所以,我对Python和熊猫(和一般的编程)非常新鲜,但是在看似简单的功能方面遇到麻烦。所以我使用SQL查询拉出的数据创建了以下数据框(如果你需要查看SQL查询,让我知道,我会粘贴)

  spydata = pd.DataFrame(row,columns = ['date','ticker','close','iv1m','iv3m'])
tickerlist = unique(spydata [spydata ['date'] =='2013-05-31'])

之后,我有写了一个函数,使用已经存在的数据在数据框中创建一些新列:

  def demean(arr):
arr ['retlog'] = log(arr ['close'] / arr ['close']。shift(1))

arr ['10dvol'] = sqrt(252) * sqrt(pd.rolling_std(arr ['ret'],10))
arr ['60dvol'] = sqrt(252)* sqrt(pd.rolling_std(arr ['ret'],10))
arr ['90dvol'] = sqrt(252)* sqrt(pd.rolling_std(arr ['ret'],10))
arr ['1060rat'] = arr ['10dvol'] / arr ['60dvol']
arr ['1090rat'] = arr ['10dvol'] / arr ['90dvol' ]
arr ['60dis'] =(arr ['1060rat'] - arr ['1060rat']。mean())/ arr ['1060rat']。std()
arr ['90dis '] =(arr ['1090rat'] - arr ['1090rat']。mean())/ arr ['1090rat']。std()
return arr

我遇到问题的唯一部分是功能的第一行:

  arr ['retlog'] = log(arr ['close'] / arr ['close']。shift(1))

当我运行这个命令时,我收到一个错误:

  result = spydata.groupby(['ticker'])。apply(demean)

错误:

  --------------------- -------------------------------------------------- ---- 
AttributeError追溯(最近的最后一次调用)
< ipython-input-196-4a66225e12ea>在< module>()
----> 1 result = spydata.groupby(['ticker'])。apply(demean)
2 results2 = result [result.date == result.date.max()]
3

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self,func,* args,** kwargs)
323 func = _intercept_function(func)
324 f = lambda g:func(g,* args,** kwargs)
- > 325 return self._python_apply_general(f)
326
327 def _python_apply_general(self,f):

C:\Python27\lib\site-packages\pandas -0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in _python_apply_general(self,f)
326
327 def _python_apply_general(self,f):
- > 328键,值,mutated = self.grouper.apply(f,self.obj,self.axis)
329
330 return self._wrap_applied_output(keys,values,

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self,f,data,axis, keep_internal)
632#group可能被修改
633 group_axes = _get_axes(group)
- > 634 res = f(group)
635 if not _is_indexed_like(res,group_axes ):
636 mutated = True

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\\ < lambda>(g)中的\\core\groupby.pyc
322
323 func = _intercept_function(func)
- > 324 f = lambda g:func g,* args,** kwargs)
325 return self._python_apply_general(f)
326

< ipython-input-195-47b6fa a3f43c> in demean(arr)
1 def demean(arr):
----> 2 arr ['retlog'] = log(arr ['close'] / arr ['close']。shift(1))
3 arr ['10dvol'] = sqrt(252)* sqrt(pd。 rolling_std(arr ['ret'],10))
4 arr ['60dvol'] = sqrt(252)* sqrt(pd.rolling_std(arr ['ret'],10))
5 arr ['90dvol'] = sqrt(252)* sqrt(pd.rolling_std(arr ['ret'],10))

AttributeError:log

我已经尝试将功能更改为np.log以及math.log,在这种情况下,我收到错误

  TypeError:只有length-1数组可以转换为Python标量

我已经尝试了这个,但没有找到任何直接适用的。任何线索?

解决方案

当列的数据类型不是数字时,会发生这种情况。尝试

  arr ['retlog'] = log(arr ['close']。astype('float64')/ arr [ 'close']。astype('float64')。shift(1))

我怀疑数字存储为通用的对象类型,我知道导致日志抛出该错误。这是一个简单的例子:

 在[15]中:np.log(系列([1,2,3 ,4],dtype ='object'))
----------------------------------- ----------------------------------------
AttributeError Traceback(最近的调用最后)
< ipython-input-15-25deca6462b7>在< module>()
----> 1 np.log(系列([1,2,3,4],dtype ='object'))

AttributeError:log

在[16]中:np。 log(系列([1,2,3,4],dtype ='float64'))
输出[16]:
0 0.000000
1 0.693147
2 1.098612
3 1.386294
dtype:float64

您尝试使用数学.log 不起作用,因为该函数仅用于单个数字(标量),而不是列表或数组。



我认为这是一个令人困惑的错误信息;无论如何,它曾经让我忍受了一段时间。我想知道是否可以改善。


So, I am very new to Python and Pandas (and programming in general), but am having trouble with a seemingly simple function. So I created the following dataframe using data pulled with a SQL query (if you need to see the SQL query, let me know and I'll paste it)

spydata = pd.DataFrame(row,columns=['date','ticker','close', 'iv1m', 'iv3m'])
tickerlist = unique(spydata[spydata['date'] == '2013-05-31'])

After that, I have written a function to create some new columns in the dataframe using the data already held in it:

def demean(arr):
    arr['retlog'] = log(arr['close']/arr['close'].shift(1))

    arr['10dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
    arr['60dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
    arr['90dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))  
    arr['1060rat'] = arr['10dvol']/arr['60dvol']
    arr['1090rat'] = arr['10dvol']/arr['90dvol']
    arr['60dis'] = (arr['1060rat'] - arr['1060rat'].mean())/arr['1060rat'].std()
    arr['90dis'] = (arr['1090rat'] - arr['1090rat'].mean())/arr['1090rat'].std()
    return arr

The only part that I'm having a problem with is the first line of the function:

arr['retlog'] = log(arr['close']/arr['close'].shift(1))

Which, when I run, with this command, I get an error:

result = spydata.groupby(['ticker']).apply(demean)

Error:

    ---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-196-4a66225e12ea> in <module>()
----> 1 result = spydata.groupby(['ticker']).apply(demean)
      2 results2 = result[result.date == result.date.max()]
      3 

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self, func, *args, **kwargs)
    323         func = _intercept_function(func)
    324         f = lambda g: func(g, *args, **kwargs)
--> 325         return self._python_apply_general(f)
    326 
    327     def _python_apply_general(self, f):

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in _python_apply_general(self, f)
    326 
    327     def _python_apply_general(self, f):
--> 328         keys, values, mutated = self.grouper.apply(f, self.obj, self.axis)
    329 
    330         return self._wrap_applied_output(keys, values,

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in apply(self, f, data, axis, keep_internal)
    632             # group might be modified
    633             group_axes = _get_axes(group)
--> 634             res = f(group)
    635             if not _is_indexed_like(res, group_axes):
    636                 mutated = True

C:\Python27\lib\site-packages\pandas-0.11.0-py2.7-win32.egg\pandas\core\groupby.pyc in <lambda>(g)
    322         """
    323         func = _intercept_function(func)
--> 324         f = lambda g: func(g, *args, **kwargs)
    325         return self._python_apply_general(f)
    326 

<ipython-input-195-47b6faa3f43c> in demean(arr)
      1 def demean(arr):
----> 2     arr['retlog'] = log(arr['close']/arr['close'].shift(1))
      3     arr['10dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))
      4     arr['60dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))
      5     arr['90dvol'] = sqrt(252)*sqrt(pd.rolling_std(arr['ret'] , 10 ))

AttributeError: log

I have tried changing the function to np.log as well as math.log, in which case I get the error

TypeError: only length-1 arrays can be converted to Python scalars

I've tried looking this up, but haven't found anything directly applicable. Any clues?

解决方案

This happens when the datatype of the column is not numeric. Try

arr['retlog'] = log(arr['close'].astype('float64')/arr['close'].astype('float64').shift(1))

I suspect that the numbers are stored as generic 'object' types, which I know causes log to throw that error. Here is a simple illustration of the problem:

In [15]: np.log(Series([1,2,3,4], dtype='object'))
---------------------------------------------------------------------------
AttributeError                            Traceback (most recent call last)
<ipython-input-15-25deca6462b7> in <module>()
----> 1 np.log(Series([1,2,3,4], dtype='object'))

AttributeError: log

In [16]: np.log(Series([1,2,3,4], dtype='float64'))
Out[16]: 
0    0.000000
1    0.693147
2    1.098612
3    1.386294
dtype: float64

Your attempt with math.log did not work because that function is designed for single numbers (scalars) only, not lists or arrays.

For what it's worth, I think this is a confusing error message; it once stumped me for awhile, anyway. I wonder if it can be improved.

这篇关于尝试将日志方法应用于Python中的 pandas 数据框列时出错的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆