合并其他列匹配的数据帧的值 [英] Merge values of a dataframe where other columns match
本文介绍了合并其他列匹配的数据帧的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个存储日期、汽车品牌、颜色和城市的数据框:
I have a dataframe storing a date, car_brand, color and a city:
date car_brand color city
"2020-01-01" porsche red paris
"2020-01-02" prosche red paris
"2020-01-03" porsche red london
"2020-01-04" porsche red paris
"2020-01-05" porsche red london
"2020-01-01" audi blue munich
"2020-01-02" audi red munich
"2020-01-03" audi red london
"2020-01-04" audi red london
"2020-01-05" audi red london
我现在想通过以下方式从该数据帧创建:将连续几天汽车品牌、颜色和城市匹配的行合并在一起.所以在这个例子中,我想以一个数据帧结束
I now want to create from that a dataframe in the following way: Merge rows together where for consecutive days the car_brand, color and city match. So in the example I want to end up with a dataframe
date car_brand color city
["2020-01-01","2020-01-02"] porsche red paris
["2020-01-03"] porsche red london
["2020-01-04"] porsche red paris
["2020-01-05"] porsche red london
["2020-01-01"] audi blue munich
["2020-01-02"] audi red munich
["2020-01-03","2020-01-05"] audi red london
我怎样才能做到这一点?我尝试使用 pd.concat 和 pd.merge 但到目前为止没有任何效果.谢谢!
How can I achieve that? I tried with pd.concat and pd.merge but nothing worked so far. Thanks!
推荐答案
如果连续很重要可以检查列表理解.这是从组上的 lambda
函数获取 list
的技术的扩展.
If consecutive is important can check in list comprehension. This is an extension of technique to get a list
from a lambda
function on a group.
df = pd.read_csv(io.StringIO(""" date car_brand color city
"2020-01-01" porsche red paris
"2020-01-02" porsche red paris
"2020-01-03" porsche red london
"2020-01-04" porsche red paris
"2020-01-05" porsche red london
"2020-01-01" audi blue munich
"2020-01-02" audi red munich
"2020-01-03" audi red london
"2020-01-04" audi red london
"2020-01-05" audi red london"""), sep="\s+")
df["date"] = pd.to_datetime(df["date"])
df = (
df
.groupby([c for c in df.columns if c!="date"])["date"]
# only include if first date or if it's a consequetive date
.agg(lambda x: [xx for i,xx in enumerate(x) if i==0 or xx==(list(x)[i-1]+pd.DateOffset(1))])
.reset_index()
)
输出
car_brand color city date
audi blue munich [2020-01-01 00:00:00]
audi red london [2020-01-03 00:00:00, 2020-01-04 00:00:00, 2020-01-05 00:00:00]
audi red munich [2020-01-02 00:00:00]
porsche red london [2020-01-03 00:00:00]
porsche red paris [2020-01-01 00:00:00, 2020-01-02 00:00:00]
这篇关于合并其他列匹配的数据帧的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文