如何格式化每周基于彼此的python中的列日期? [英] How to format columns dates in python that they are weekly based on eachother?
问题描述
我有一个数据框 df
看起来与此类似:
I have a dataframe df
that looks similar to this:
identity Start End week
E 6/18/2020 7/2/2020 1
E 6/18/2020 7/2/2020 2
2D 7/18/2020 8/1/2020 1
2D 7/18/2020 8/1/2020 2
A1 9/6/2020 9/20/2020 1
A1 9/6/2020 9/20/2020 2
问题是,当我提取数据时,替换的每个身份我都只有开始日期和结束日期,但是我有按周记录的数据,所有身份都具有相同的周数,有时所有身份可以包含5或6周,但它们始终相同.我想让Stata每周结束一次,所以当第一周结束时我要增加7天.当一周再次开始时,它从一周结束的地方开始.一个表示将是
The problem is that when I extracted the data I only had Start date and End date for every identity it replaced, but I have the data by weeks all identitys have the same amount of weeks some times all identitys can have 5 or 6 weeks but they are always the same. I want to make Stata and end be weekly so when the first week end I add 7 days. And when the week starts again it starts where week ended. A representation would be
identity Start End week
E 6/18/2020 6/25/2020 1
E 6/25/2020 7/2/2020 2
2D 7/18/2020 7/25/2020 1
2D 7/25/2020 8/1/2020 2
A1 9/6/2020 9/13/2020 1
A1 9/13/2020 9/20/2020 2
我尝试了一个简单的方法,该方法创建了一个Sevens列,并进行加总运算以得到一周的结束时间,并且错误不再支持使用Timestamp对整数和整数数组进行加减.不要使用n * obj.freq
来加/减n然后我会从负七点开始,但是我不知道如何解决这个问题.任何帮助都将是巨大的.
I tried a simple method that was creating a sevens column and making the sum to get the end of the week I get and error Addition/subtraction of integers and integer-arrays with Timestamp is no longer supported. Instead of adding/subtracting n, use n * obj.freq
Then I would concat start over minus seven but I don't know how to get around this problem. Any help would be magnificent.
推荐答案
类似于您的其他问题:
首先转换为日期时间:
df.loc[:, ["Start", "End"]] = (df.loc[:, ["Start", "End"]]
.transform(pd.to_datetime, format="%m/%d/%Y"))
df
identity Start End week
0 E 2020-06-18 2020-07-02 1
1 E 2020-06-18 2020-07-02 2
2 2D 2020-07-18 2020-08-01 1
3 2D 2020-07-18 2020-08-01 2
4 A1 2020-09-06 2020-09-20 1
5 A1 2020-09-06 2020-09-20 2
您的身份分为两组,因此从date_range中选择日期时会用到它:
Your identity is in groups of two, so I'll use that when selecting dates from the date_range:
from itertools import chain
result = df.drop_duplicates(subset="identity")
date_range = (
pd.date_range(start, end, freq="7D")[:2]
for start, end in zip(result.Start, result.End)
)
date_range = chain.from_iterable(date_range)
End = lambda df: df.Start.add(pd.Timedelta("7 days"))
创建新的数据框:
df.assign(Start=list(date_range), End=End)
identity Start End week
0 E 2020-06-18 2020-06-25 1
1 E 2020-06-25 2020-07-02 2
2 2D 2020-07-18 2020-07-25 1
3 2D 2020-07-25 2020-08-01 2
4 A1 2020-09-06 2020-09-13 1
5 A1 2020-09-13 2020-09-20 2
这篇关于如何格式化每周基于彼此的python中的列日期?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!