在python中解析csv [英] Parsing csv in python
问题描述
我正在尝试用python解析一个csv文件,并每天打印 order_total
的总和。以下是示例csv文件
I'm trying to parse a csv file in python and print the sum of order_total
for each day. Below is the sample csv file
order_total created_datetime
24.99 2015-06-01 00:00:12
0 2015-06-01 00:03:15
164.45 2015-06-01 00:04:05
24.99 2015-06-01 00:08:01
0 2015-06-01 00:08:23
46.73 2015-06-01 00:08:51
0 2015-06-01 00:08:58
47.73 2015-06-02 00:00:25
101.74 2015-06-02 00:04:11
119.99 2015-06-02 00:04:35
38.59 2015-06-02 00:05:26
73.47 2015-06-02 00:06:50
34.24 2015-06-02 00:07:36
27.24 2015-06-03 00:01:40
82.2 2015-06-03 00:12:21
23.48 2015-06-03 00:12:35
我的目标是打印每天 sum(order_total)
。例如,结果应为
My objective here is to print the sum(order_total)
for each day. For example the result should be
2015-06-01 -> 261.16
2015-06-02 -> 415.75
2015-06-03 -> 132.92
我编写了以下代码-它尚未执行逻辑,但是我正在尝试
I have written the below code - its does not perform the logic yet, but I'm trying to see if its able to parse and loop as required by printing some sample statements.
def sum_orders_test(self,start_date,end_date):
initial_date = datetime.date(int(start_date.split('-')[0]),int(start_date.split('-')[1]),int(start_date.split('-')[2]))
final_date = datetime.date(int(end_date.split('-')[0]),int(end_date.split('-')[1]),int(end_date.split('-')[2]))
day = datetime.timedelta(days=1)
with open("file1.csv", 'r') as data_file:
next(data_file)
reader = csv.reader(data_file, delimiter=',')
if initial_date <= final_date:
for row in reader:
if str(initial_date) in row[1]:
print 'initial_date : ' + str(initial_date)
print 'Date : ' + row[1]
else:
print 'Else'
initial_date = initial_date + day
根据我当前的逻辑,我遇到了这个问题-
based on my current logic I'm running into this issue -
- 如您所见示例csv中
2015-06-01
有7行,2015-06-02
有6行和3行对于2015-06-03
。 - 我上面代码的输出是为
2015-06打印7个值-01
,2015-06-02
为5,2015-06-03
- As you can see in the sample csv there are 7 rows for
2015-06-01
, 6 rows for2015-06-02
and 3 rows for2015-06-03
. - My output of above code is printing 7 values for
2015-06-01
, 5 for2015-06-02
and 2 for2015-06-03
使用 sum_orders_test('2015-06-01','2015-06-03 ');
我知道这是一个愚蠢的逻辑问题,但是对于编程和python来说是新手,我无法弄清楚。
I know there is some silly logical issue, but being new to programming and python I'm unable to figure it out.
推荐答案
我已经重新阅读了问题,并且您的数据是否实际上是制表符分隔的,以下是执行此操作的来源(使用 pandas
):
I've re-read the question, and if your data is really tab-separated, here's the following source to do the job (using pandas
):
import pandas as pd
df = pd.DataFrame(pd.read_csv('file.csv', names=['order_total', 'created_datetime'], sep='\t'))
df['created_datetime'] = pd.to_datetime(df.created_datetime).dt.date
df = df.groupby(['created_datetime']).sum()
print(df)
给出以下结果:
order_total
created_datetime
2015-06-01 261.16
2015-06-02 415.76
2015-06-03 132.92
更少的代码,可能会降低算法复杂度。
Less codes, and probably lower algorithm complexity.
这篇关于在python中解析csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!