合并其他列匹配的数据帧的值 [英] Merge values of a dataframe where other columns match

查看:46
本文介绍了合并其他列匹配的数据帧的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个存储日期、汽车品牌、颜色和城市的数据框:

I have a dataframe storing a date, car_brand, color and a city:

 date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      prosche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london

我现在想通过以下方式从该数据帧创建:将连续几天汽车品牌、颜色和城市匹配的行合并在一起.所以在这个例子中,我想以一个数据帧结束

I now want to create from that a dataframe in the following way: Merge rows together where for consecutive days the car_brand, color and city match. So in the example I want to end up with a dataframe

 date                             car_brand    color     city
 ["2020-01-01","2020-01-02"]      porsche      red       paris
 ["2020-01-03"]                   porsche      red       london
 ["2020-01-04"]                   porsche      red       paris
 ["2020-01-05"]                   porsche      red       london
 ["2020-01-01"]                   audi         blue      munich
 ["2020-01-02"]                   audi         red       munich
 ["2020-01-03","2020-01-05"]      audi         red       london

我怎样才能做到这一点?我尝试使用 pd.concat 和 pd.merge 但到目前为止没有任何效果.谢谢!

How can I achieve that? I tried with pd.concat and pd.merge but nothing worked so far. Thanks!

推荐答案

如果连续很重要可以检查列表理解.这是从组上的 lambda 函数获取 list 的技术的扩展.

If consecutive is important can check in list comprehension. This is an extension of technique to get a list from a lambda function on a group.

df = pd.read_csv(io.StringIO(""" date              car_brand    color     city
 "2020-01-01"      porsche      red       paris
 "2020-01-02"      porsche      red       paris
 "2020-01-03"      porsche      red       london
 "2020-01-04"      porsche      red       paris
 "2020-01-05"      porsche      red       london
 "2020-01-01"      audi         blue      munich
 "2020-01-02"      audi         red       munich
 "2020-01-03"      audi         red       london
 "2020-01-04"      audi         red       london
 "2020-01-05"      audi         red       london"""), sep="\s+")
df["date"] = pd.to_datetime(df["date"])
df = (
    df
    .groupby([c for c in df.columns if c!="date"])["date"]
    # only include if first date or if it's a consequetive date
    .agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
    .reset_index()
)

输出

car_brand color   city                                                            date
     audi  blue munich                                           [2020-01-01 00:00:00]
     audi   red london [2020-01-03 00:00:00, 2020-01-04 00:00:00, 2020-01-05 00:00:00]
     audi   red munich                                           [2020-01-02 00:00:00]
  porsche   red london                                           [2020-01-03 00:00:00]
  porsche   red  paris                      [2020-01-01 00:00:00, 2020-01-02 00:00:00]

这篇关于合并其他列匹配的数据帧的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆