pandas 在"datetimeIndex"中的"datetime"或"datetime"上合并 [英] Pandas merge on `datetime` or `datetime` in `datetimeIndex`

查看：209 发布时间：2020/5/24 0:59:21 python pandas

本文介绍了 pandas 在"datetimeIndex"中的"datetime"或"datetime"上合并的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

当前，我有两个表示excel电子表格的数据框.我希望加入日期相等的数据.这是一个一对多的联接，因为一个电子表格具有一个日期，然后我需要添加具有相同日期的多行数据

Currently I have two data frames representing excel spreadsheets. I wish to join the data where the dates are equal. This is a one to many join as one spread sheet has a date then I need to add data which has multiple rows with the same date

一个例子:

            A                  B
     date     data       date                 data
0    2015-0-1 ...     0  2015-0-1 to 2015-0-2 ...
1    2015-0-2 ...     1  2015-0-1 to 2015-0-2 ...

在这种情况下，A的两行都将接收B的行0和1，因为它们都在该范围内.

In this case both rows from A would recieve rows 0 and 1 from B because they are in that range.

我尝试使用

df3 = pandas.merge(df2, df1, how='right', validate='1:m', left_on='Travel Date/Range', right_on='End')

要完成此操作，但收到此错误.

to accomplish this but received this error.

Traceback (most recent call last):
  File "<pyshell#61>", line 1, in <module>
    df3 = pandas.merge(df2, df1, how='right', validate='1:m', left_on='Travel Date/Range', right_on='End')
  File "C:\Users\M199449\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\reshape\merge.py", line 61, in merge
    validate=validate)
  File "C:\Users\M199449\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\reshape\merge.py", line 555, in __init__
    self._maybe_coerce_merge_keys()
  File "C:\Users\M199449\AppData\Local\Programs\Python\Python36\lib\site-packages\pandas\core\reshape\merge.py", line 990, in _maybe_coerce_merge_keys
    raise ValueError(msg)
ValueError: You are trying to merge on object and datetime64[ns] columns. If you wish to proceed you should use pd.concat

我当然可以根据需要添加更多信息

I can add more information as needed of course

推荐答案

所以这是合并的选项:

假设您有两个数据框:

import pandas as pd
df1 = pd.DataFrame({'date': ['2015-01-01', '2015-01-02', '2015-01-03'], 
                    'data': ['A', 'B', 'C']})
df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03'], 
                    'data': ['E', 'F', 'G']})

现在进行一些清洁工作以获取所需的所有日期，并确保它们为datetime

Now do some cleaning to get all of the dates you need and make sure they are datetime

df1['date'] = pd.to_datetime(df1.date)

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2['start'] = pd.to_datetime(df2.start)
df2['end'] = pd.to_datetime(df2.end)
# No need for this anymore
df2 = df2.drop(columns='date')

现在将所有内容合并在一起.您将获得99x10K的行.

Now merge it all together. You'll get 99x10K rows.

df = df1.assign(dummy=1).merge(df2.assign(dummy=1), on='dummy').drop(columns='dummy')

以及属于这些范围之间的日期的子集:

And subset to the dates that fall in between the ranges:

df[(df.date >= df.start) & (df.date <= df.end)]
#        date data_x data_y      start        end
#0 2015-01-01      A      E 2015-01-01 2015-01-02
#1 2015-01-01      A      F 2015-01-01 2015-01-02
#3 2015-01-02      B      E 2015-01-01 2015-01-02
#4 2015-01-02      B      F 2015-01-01 2015-01-02
#5 2015-01-02      B      G 2015-01-02 2015-01-03
#8 2015-01-03      C      G 2015-01-02 2015-01-03

例如，如果df2中的某些日期是单个日期，由于我们使用的是.str.split，因此第二个日期将得到None.然后只需使用.loc进行适当设置即可.

If for instance, some dates in df2 were a single date, since we're using .str.split we will get None for the second date. Then just use .loc to set it appropriately.

df2 = pd.DataFrame({'date': ['2015-01-01 to 2015-01-02', '2015-01-01 to 2015-01-02', '2015-01-02 to 2015-01-03',
                             '2015-01-03'], 
                    'data': ['E', 'F', 'G', 'H']})

df2[['start', 'end']] = df2['date'].str.split(' to ', expand=True)
df2.loc[df2.end.isnull(), 'end'] = df2.loc[df2.end.isnull(), 'start']
#  data      start        end
#0    E 2015-01-01 2015-01-02
#1    F 2015-01-01 2015-01-02
#2    G 2015-01-02 2015-01-03
#3    H 2015-01-03 2015-01-03

现在其余的保持不变

这篇关于 pandas 在"datetimeIndex"中的"datetime"或"datetime"上合并的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

pandas 在"datetimeIndex"中的"datetime"或"datetime"上合并 [英] Pandas merge on `datetime` or `datetime` in `datetimeIndex`

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

pandas 在"datetimeIndex"中的"datetime"或"datetime"上合并 [英] Pandas merge on `datetime` or `datetime` in `datetimeIndex`

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭