在 pandas 找到给定时间的最近的DataFrame行 [英] Find closest row of DataFrame to given time in Pandas

查看:254
本文介绍了在 pandas 找到给定时间的最近的DataFrame行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫数据框,由DatetimeIndex索引:

 < class'pandas.core.frame.DataFrame '> 
DatetimeIndex:53732条目,1993-01-07 12:23:58到2012-12-02 20:06:23
数据列:
日期(dd-mm-yy)_Time (hh-mm-ss)53732非空值
Julian_Day 53732非空值
AOT_870 53732非空值
440-870Angstrom 53732非空值
440- 675Angstrom 53732非空值
500-870Angstrom 53732非空值
Last_Processing_Date(dd / mm / yyyy)53732非空值
Solar_Zenith_Angle 53732非空值
时间53732非空值
dtypes:datetime64 [ns](2),float64(6),对象(1)

我想找到最接近特定时间的行:

  image_time = dateutil.parser .parse('2009-07-28 13:39:02')

并找到它的距离是的到目前为止,我已经尝试过各种各样的事情,基于从所有时间减去时间,找到最小的绝对值的想法,但没有一个似乎不起作用。



例如:

  aeronet.index  -  image_time 

给我一​​个错误,我认为是由于在datetime索引上的+/-移动的东西,所以我尝试将索引放在另一列,然后工作:

  aeronet ['time'] = aeronet.index 
aeronet.time - image_time

这似乎有效,但要做我想要的,我需要获得绝对时间差,而不是相对差异。但是,只要运行 abs np.abs 就会出现错误:

  abs(aeronet.time  -  image_time)

C:\Python27\lib\site- packages\pandas\core __repr __(self)中的\series.pyc
1061在Py2中生成Bytestring,py3中的Unicode字符串。
1062
- > 1063 return str(self)
1064
1065 def _tidy_repr(self,max_vals = 20):

C :_\\Python27\lib\site-packages\pandas\core\series.pyc in __str __(self)
1021 if py3compat.PY3:
1022 return self .__ unicode __()
- > 1023 return self .__ bytes __()
1024
1025 def __bytes __(self):

C:\Python27\lib\site-packages \pandas\core\series.pyc in __bytes __(self)
1031
1032 encoding = com.get_option(display.encoding)
- > 1033 return self .__ unicode __()。encode(encoding,'replace')
1034
1035 def __unicode __(self):

C:\Python27\lib\ __unicode __(self)中的site-packages\pandas\core\series.pyc
1044 else get_option(display.max_rows))
1045 if len(self.index)> (max_rows或1000):
- > 1046 result = self._tidy_repr(min(30,max_rows - 4))
1047 elif len(self.index)> 0:
1048 result = self._get_repr(print_header = True,

C:\Python27\lib\site-packages\pandas\core\series.pyc in _tidy_repr(self,max_vals)
1069
1070 num = max_vals // 2
- > 1071 head = self [:num] ._ get_repr(print_header = True,length = False ,
1072 name = False)
1073 tail = self [ - (max_vals - num):] ._ get_repr(print_header = False,

AttributeError:'numpy.ndarray'没有属性'_get_repr'

我接近这个方法吗?如果是这样,我该怎么做 abs 工作,所以我可以选择最小绝对时间差,从而获得最近的时间,如果没有,与熊猫做的最好的方法是什么时间序列?

解决方案

我想你可以尝试 DatetimeIndex.asof 找到最新的标签并包含输入,然后使用返回的datetime选择适当的行。
如果您只需要特定列的值, Series.asof 存在并将上述两个步骤组合成一个。



这假设你想要最接近的日期时间。如果您不关心日期,只需要每天同一时间,请在DataFrame中使用 at_time



跟进:



编辑:虚惊,本地有一个旧版本。最新的主人应该使用np.abs。

 在[10]中:np.abs(df.time  -  image_time) 
出[10]:
0 27天,13:39:02
1 26天,13:39:02
2 25天,13:39:02
3 24天,13:39:02
4 23天,13:39:02
5 22天,13:39:02

也只是为了澄清:



aeronet.index - image_time不起作用,因为Index上的减法一个差异(在当天的索引中被限制为唯一)。


I have a Pandas dataframe which is indexed by a DatetimeIndex:

<class 'pandas.core.frame.DataFrame'>
DatetimeIndex: 53732 entries, 1993-01-07 12:23:58 to 2012-12-02 20:06:23
Data columns:
Date(dd-mm-yy)_Time(hh-mm-ss)       53732  non-null values
Julian_Day                          53732  non-null values
AOT_870                             53732  non-null values
440-870Angstrom                     53732  non-null values
440-675Angstrom                     53732  non-null values
500-870Angstrom                     53732  non-null values
Last_Processing_Date(dd/mm/yyyy)    53732  non-null values
Solar_Zenith_Angle                  53732  non-null values
time                                53732  non-null values
dtypes: datetime64[ns](2), float64(6), object(1)

I want to find the row that is closest to a certain time:

image_time = dateutil.parser.parse('2009-07-28 13:39:02')

and find how close it is. So far, I have tried various things based upon the idea of subtracting the time I want from all of the times and finding the smallest absolute value, but none quite seem to work.

For example:

aeronet.index - image_time

Gives an error which I think is due to +/- on a Datetime index shifting things, so I tried putting the index into another column and then working on that:

aeronet['time'] = aeronet.index
aeronet.time - image_time

This seems to work, but to do what I want, I need to get the ABSOLUTE time difference, not the relative difference. However, just running abs or np.abs on it gives an error:

abs(aeronet.time - image_time)

C:\Python27\lib\site-packages\pandas\core\series.pyc in __repr__(self)
   1061         Yields Bytestring in Py2, Unicode String in py3.
   1062         """
-> 1063         return str(self)
   1064 
   1065     def _tidy_repr(self, max_vals=20):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __str__(self)
   1021         if py3compat.PY3:
   1022             return self.__unicode__()
-> 1023         return self.__bytes__()
   1024 
   1025     def __bytes__(self):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __bytes__(self)
   1031         """
   1032         encoding = com.get_option("display.encoding")
-> 1033         return self.__unicode__().encode(encoding, 'replace')
   1034 
   1035     def __unicode__(self):

C:\Python27\lib\site-packages\pandas\core\series.pyc in __unicode__(self)
   1044                     else get_option("display.max_rows"))
   1045         if len(self.index) > (max_rows or 1000):
-> 1046             result = self._tidy_repr(min(30, max_rows - 4))
   1047         elif len(self.index) > 0:
   1048             result = self._get_repr(print_header=True,

C:\Python27\lib\site-packages\pandas\core\series.pyc in _tidy_repr(self, max_vals)
   1069         """
   1070         num = max_vals // 2
-> 1071         head = self[:num]._get_repr(print_header=True, length=False,
   1072                                     name=False)
   1073         tail = self[-(max_vals - num):]._get_repr(print_header=False,

AttributeError: 'numpy.ndarray' object has no attribute '_get_repr'

Am I approaching this the right way? If so, how should I get abs to work, so that I can then select the minimum absolute time difference, and thus get the closest time. If not, what is the best way to do this with a Pandas time-series?

解决方案

I think you can try DatetimeIndex.asof to find the most recent label up to and including the input. Then use the returned datetime to select the appropriate row. If you only need values for a particular column, Series.asof exists and combines the two steps above into one.

This assumes you want the closest datetime. If you don't care about the date and just want the same time every day, use at_time in DataFrame.

Follow up:

Edit: false alarm, I had an older version locally. The latest on master should work with np.abs.

In [10]: np.abs(df.time - image_time)
Out[10]: 
0    27 days, 13:39:02
1    26 days, 13:39:02
2    25 days, 13:39:02
3    24 days, 13:39:02
4    23 days, 13:39:02
5    22 days, 13:39:02

Also just to clarify:

aeronet.index - image_time doesn't work because subtraction on Index is a set difference (back in the day Index used to be constrained to be unique).

这篇关于在 pandas 找到给定时间的最近的DataFrame行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆