计算 csv 中双参数匹配的出现次数,将结果附加到 csv [英] Calculating number of occurrences of dual parameter match in a csv appending the results to csv

查看:28
本文介绍了计算 csv 中双参数匹配的出现次数,将结果附加到 csv的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要做的是计算以下内容:

某个人在第 10 列中具有相同 t 数据且第 7 列中出现 1 的行中指定日期之前的日期出现在列表(第 8 列)中的次数.

某个人(第 8 列)在该行中指定日期之前的日期出现在列表中的次数,并且第 10 列中的 t 数据相同(请注意,它们是按时间顺序排序的.)

用来自 csv 的原始数据的例子来证明这一点可能更容易.

02/01/2005,Data,Class xpv,4,11yo+,4,1,George Smith,data,15t02/01/2005,Data,Class xpv,4,11yo+,4,2,Ted James,data,22t02/01/2005,Data,Class xpv,4,11yo+,4,3,Emma Lilly,data,22t02/01/2005,Data,Class xpv,4,11yo+,4,5,George Smith,data,25t02/01/2005,Data,Class tn2,4,10yo+,6,4,Tom Phillips,data,15t03/01/2005,Data,Class tn2,4,10yo+,6,2,Tom Phillips,data,25t03/01/2005,Data,Class tn2,4,10yo+,6,5,George Smith,data,22t03/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,25t03/01/2005,Data,Class tn2,4,10yo+,6,1,Emma Lilly,data,25t03/01/2005,Data,Class tn2,4,10yo+,6,6,George Smith,data,15t04/01/2005,Data,Class tn2,4,10yo+,6,6,Ted James,data,25t04/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,22t04/01/2005,Data,Class tn2,4,10yo+,6,2,George Smith,data,22t04/01/2005,Data,Class tn2,4,10yo+,6,4,George Smith,data,25t04/01/2005,Data,Class tn2,4,10yo+,6,1,George Smith,data,15t04/01/2005,Data,Class tn2,4,10yo+,6,5,Tom Phillips,data,25t05/01/2005,Data,Class 22zn,2,10yo+,5,3,Emma Lilly,data,25t05/01/2005,Data,Class 22zn,2,10yo+,5,1,Ted James,data,22t05/01/2005,Data,Class 22zn,2,10yo+,5,2,George Smith,data,22t05/01/2005,Data,Class 22zn,2,10yo+,5,4,Emma Lilly,data,25t05/01/2005,Data,Class 22zn,2,10yo+,5,5,Tom Phillips,data,15t

按照所描述的说明,我需要 csv 的样子:

02/01/2005,Data,Class xpv,4,11yo+,4,1,George Smith,data,15t,0,002/01/2005,Data,Class xpv,4,11yo+,4,2,Ted James,data,22t,0,002/01/2005,Data,Class xpv,4,11yo+,4,3,Emma Lilly,data,22t,0,002/01/2005,Data,Class xpv,4,11yo+,4,5,George Smith,data,25t,0,002/01/2005,Data,Class tn2,4,10yo+,6,4,Tom Phillips,data,15t,0,003/01/2005,Data,Class tn2,4,10yo+,6,2,Tom Phillips,data,25t,0,003/01/2005,Data,Class tn2,4,10yo+,6,5,George Smith,data,22t,0,003/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,25t,0,003/01/2005,Data,Class tn2,4,10yo+,6,1,Emma Lilly,data,25t,0,003/01/2005,Data,Class tn2,4,10yo+,6,6,George Smith,data,15t,1,104/01/2005,Data,Class tn2,4,10yo+,6,6,Ted James,data,25t,0,004/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,22t,0,004/01/2005,Data,Class tn2,4,10yo+,6,2,George Smith,data,22t,0,104/01/2005,Data,Class tn2,4,10yo+,6,4,George Smith,data,25t,0,104/01/2005,Data,Class tn2,4,10yo+,6,1,George Smith,data,15t,1,204/01/2005,Data,Class tn2,4,10yo+,6,5,Tom Phillips,data,25t,0,205/01/2005,Data,Class 22zn,2,10yo+,5,3,Emma Lilly,data,25t,1,105/01/2005,Data,Class 22zn,2,10yo+,5,1,Ted James,data,22t,0,105/01/2005,Data,Class 22zn,2,10yo+,5,2,George Smith,data,22t,0,205/01/2005,Data,Class 22zn,2,10yo+,5,4,Emma Lilly,data,25t,1,105/01/2005,Data,Class 22zn,2,10yo+,5,5,Tom Phillips,data,15t,0,1

因此您可以看到,在最后一行,Tom Phillips 与 15t 在此之前的几天(第 10 列)出现了 1 次,并且在这 1 次出现中,第 7 列是1"的出现次数为零.

我的 csv 数据显然比这大得多,因此也将不胜感激有效的技术和建议.如果需要更多说明,请说出来,很难说这个例子是否可以理解.

亲切的问候 AEA

解决方案

非常小的变化:

导入csv导入日期时间导入副本从集合导入 defaultdict将 open(r"C:\Temp\test2.csv") 作为 i,将 open(r"C:\Temp\results2.csv", "wb") 作为 o:rdr = csv.reader(i)wrt = csv.writer(o)# data 是一个字典,我们将在其中保存当前和以前的数据,例如:# {#(乔治·史密斯,15 吨):[# 第 7 列 = 1 时的前一个日期计数,# 之前的日期计数,# 当第 7 列 = 1 时的当前日期计数,# 当前日期计数#]数据,currdate = defaultdict(lambda:[0, 0, 0, 0]),无对于 rdr 中的行:date = datetime.datetime.strptime(line[0], '%d/%m/%Y')# 字典的键看起来像元组#(乔治·史密斯,15 吨)名称 = (行 [7], 行 [9])# 如果日期改变了,我们必须把当前值放到以前的值中# 通过复制列表的一部分##(乔治·史密斯,15 吨):[# 第 7 列 = 1 时的前一个日期计数,# 之前的日期计数,# 当第 7 列 = 1 时的当前日期计数,# 当前日期计数#]## 变成##(乔治·史密斯,15 吨):[# 当第 7 列 = 1 时的当前日期计数,# 当前日期计数# 当第 7 列 = 1 时的当前日期计数,# 当前日期计数#]# 然后我们将 currdate 变量更改为当前变量如果日期 != currdate 或 not currdate:对于 data.itervalues() 中的 v: v[:2] = v[2:]当前日期 = 日期# 写入当前行 + 列表中的前 2 个元素(之前的计数)wrt.writerow(line + data[name][:2])# 更新当前计数数据[名称][3] += 1如果行[6] == "1": 数据[名称][2] += 1

What I need to do is calculate the following:

The number of times a person appears in the list (column 8) on dates prior to the date specified in the row with the same t data in column 10 and a 1 occurs in column 7.

The number of times a person (column 8) appears in the list on dates prior to the date specified in the row and with the same t data in column 10 (note they are sorted chronologically.)

It might be easier to demonstrate this with an example, raw data from csv.

02/01/2005,Data,Class xpv,4,11yo+,4,1,George Smith,data,15t
02/01/2005,Data,Class xpv,4,11yo+,4,2,Ted James,data,22t
02/01/2005,Data,Class xpv,4,11yo+,4,3,Emma Lilly,data,22t
02/01/2005,Data,Class xpv,4,11yo+,4,5,George Smith,data,25t
02/01/2005,Data,Class tn2,4,10yo+,6,4,Tom Phillips,data,15t
03/01/2005,Data,Class tn2,4,10yo+,6,2,Tom Phillips,data,25t
03/01/2005,Data,Class tn2,4,10yo+,6,5,George Smith,data,22t
03/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,25t
03/01/2005,Data,Class tn2,4,10yo+,6,1,Emma Lilly,data,25t
03/01/2005,Data,Class tn2,4,10yo+,6,6,George Smith,data,15t
04/01/2005,Data,Class tn2,4,10yo+,6,6,Ted James,data,25t
04/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,22t
04/01/2005,Data,Class tn2,4,10yo+,6,2,George Smith,data,22t
04/01/2005,Data,Class tn2,4,10yo+,6,4,George Smith,data,25t
04/01/2005,Data,Class tn2,4,10yo+,6,1,George Smith,data,15t
04/01/2005,Data,Class tn2,4,10yo+,6,5,Tom Phillips,data,25t
05/01/2005,Data,Class 22zn,2,10yo+,5,3,Emma Lilly,data,25t
05/01/2005,Data,Class 22zn,2,10yo+,5,1,Ted James,data,22t
05/01/2005,Data,Class 22zn,2,10yo+,5,2,George Smith,data,22t
05/01/2005,Data,Class 22zn,2,10yo+,5,4,Emma Lilly,data,25t
05/01/2005,Data,Class 22zn,2,10yo+,5,5,Tom Phillips,data,15t

What I need the csv to look like as a result of following the described instructions:

02/01/2005,Data,Class xpv,4,11yo+,4,1,George Smith,data,15t,0,0
02/01/2005,Data,Class xpv,4,11yo+,4,2,Ted James,data,22t,0,0
02/01/2005,Data,Class xpv,4,11yo+,4,3,Emma Lilly,data,22t,0,0
02/01/2005,Data,Class xpv,4,11yo+,4,5,George Smith,data,25t,0,0
02/01/2005,Data,Class tn2,4,10yo+,6,4,Tom Phillips,data,15t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,2,Tom Phillips,data,25t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,5,George Smith,data,22t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,25t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,1,Emma Lilly,data,25t,0,0
03/01/2005,Data,Class tn2,4,10yo+,6,6,George Smith,data,15t,1,1
04/01/2005,Data,Class tn2,4,10yo+,6,6,Ted James,data,25t,0,0
04/01/2005,Data,Class tn2,4,10yo+,6,3,Tom Phillips,data,22t,0,0
04/01/2005,Data,Class tn2,4,10yo+,6,2,George Smith,data,22t,0,1
04/01/2005,Data,Class tn2,4,10yo+,6,4,George Smith,data,25t,0,1
04/01/2005,Data,Class tn2,4,10yo+,6,1,George Smith,data,15t,1,2
04/01/2005,Data,Class tn2,4,10yo+,6,5,Tom Phillips,data,25t,0,2
05/01/2005,Data,Class 22zn,2,10yo+,5,3,Emma Lilly,data,25t,1,1
05/01/2005,Data,Class 22zn,2,10yo+,5,1,Ted James,data,22t,0,1
05/01/2005,Data,Class 22zn,2,10yo+,5,2,George Smith,data,22t,0,2
05/01/2005,Data,Class 22zn,2,10yo+,5,4,Emma Lilly,data,25t,1,1
05/01/2005,Data,Class 22zn,2,10yo+,5,5,Tom Phillips,data,15t,0,1

So you can see that on the last row Tom Phillips with 15t had occurred 1 times on days previous to this one (column 10) and of those 1 occurrences there had been zero occurrences of column 7 being "1".

My csv data is obviously much larger than this, so efficient techniques and suggestions would also be appreciated. If more clarification is required please say so, its hard to tell if this example is understandable.

Kind regards AEA

解决方案

very minor change:

import csv
import datetime
import copy
from collections import defaultdict

with open(r"C:\Temp\test2.csv") as i, open(r"C:\Temp\results2.csv", "wb") as o:
    rdr = csv.reader(i)
    wrt = csv.writer(o)

    # data is a dictionary where we will save current and previous data like:
    # {
    #    (George Smith, 15t): [
    #         previous date count when column 7 = 1,
    #         previous date count,
    #         current date count when column 7 = 1,
    #         current date count
    #    ]
    data, currdate = defaultdict(lambda:[0, 0, 0, 0]), None
    for line in rdr:
        date = datetime.datetime.strptime(line[0], '%d/%m/%Y')

        # key of dictionary would be tuple looking like
        # (George Smith, 15t)
        name = (line[7], line[9])

        # if date is changed, we have to put current values into previous
        # by copying part of the list
        #
        #    (George Smith, 15t): [
        #         previous date count when column 7 = 1,
        #         previous date count,
        #         current date count when column 7 = 1,
        #         current date count
        #    ]
        #
        # becomes
        #
        #    (George Smith, 15t): [
        #         current date count when column 7 = 1,
        #         current date count
        #         current date count when column 7 = 1,
        #         current date count
        #    ]
        # and then we change currdate variable to current one

        if date != currdate or not currdate:
            for v in data.itervalues(): v[:2] = v[2:]
            currdate = date

        # writing current line + first 2 elements from list (previous counts)
        wrt.writerow(line + data[name][:2])

        # updating current counts
        data[name][3] += 1
        if line[6] == "1": data[name][2] += 1

这篇关于计算 csv 中双参数匹配的出现次数,将结果附加到 csv的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆