在 pandas 中将日期列转换为工作日名称的更快方法 [英] Faster way of converting Date column to weekday name in Pandas

查看:76
本文介绍了在 pandas 中将日期列转换为工作日名称的更快方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我通过pd.read_csv()读取的输入csv文件

Here is my input csv file that i read via pd.read_csv()

ProductCode,Date,Receipt,Total
x1,07/29/15,101790,17.35
x2,07/29/15,103601,8.89
x3,07/29/15,103601,8.58
x4,07/30/15,101425,11.95
x5,07/29/15,101422,1.09
x6,07/29/15,101422,0.99
x7,07/29/15,101422,3
y7,08/05/15,100358,7.29
x8,08/05/15,100358,2.6
z3,08/05/15,100358,2.99


import pandas as pd
df = pd.read_csv('product.csv')

#I have to add some columns to the data:

df['Receipt_Count'] = df.groupby(['Date','Receipt'])['Receipt'].transform('count')
df['Day_of_Week'] = pd.to_datetime(df['Date']).dt.weekday_name

我的csv文件中大约有80万行.当我运行将日期转换为weekday_name的代码行时,大约需要2分钟.我知道我首先将日期"列转换为日期时间,因为它被视为来自csv的字符串,然后将其转换为等效的工作日.有什么办法可以缩短转换时间?

I have around 800K of lines in my csv file. When I run the line of code for the conversion of date to weekday_name, it takes me around 2 minutes. I know that Im converting my 'Date' column to datetime first because it is treated as a string from the csv then it gets converted to its weekday equivalent. Is there any way I can shorten the conversion time?

我对Pandas/Python还是很陌生,所以我不确定是否错过了这里.

I'm fairly new to Pandas/Python, so I'm not sure if i missed something here.

推荐答案

指定日期字符串的格式将大大加快转换速度:

Specifying the format of your date strings will speed up the conversion considerably:

df['Day_of_Week'] = pd.to_datetime(df['Date'], format='%m/%d/%y').dt.weekday_name

以下是一些基准:

import io
import pandas as pd

data = io.StringIO('''\
ProductCode,Date,Receipt,Total
x1,07/29/15,101790,17.35
x2,07/29/15,103601,8.89
x3,07/29/15,103601,8.58
x4,07/30/15,101425,11.95
x5,07/29/15,101422,1.09
x6,07/29/15,101422,0.99
x7,07/29/15,101422,3
y7,08/05/15,100358,7.29
x8,08/05/15,100358,2.6
z3,08/05/15,100358,2.99
''')

df = pd.read_csv(data)
%timeit pd.to_datetime(df['Date']).dt.weekday_name
# => 100 loops, best of 3: 2.48 ms per loop
%timeit pd.to_datetime(df['Date'], format='%m/%d/%y').dt.weekday_name
# => 1000 loops, best of 3: 507 µs per loop

large_df = pd.concat([df] * 1000)
%timeit pd.to_datetime(large_df['Date']).dt.weekday_name
# => 1 loop, best of 3: 1.62 s per loop
%timeit pd.to_datetime(large_df['Date'], format='%m/%d/%y').dt.weekday_name
# => 10 loops, best of 3: 45.9 ms per loop

即使对于您在OP中提供的小样本,性能也会提高5倍-对于更大的数据帧,它会变得好得多.

Even for the small sample you provided in the OP, performance improves by a factor of 5 — for a larger dataframe it gets much, much better.

这篇关于在 pandas 中将日期列转换为工作日名称的更快方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆