如何在Python中比较两个csv文件 [英] How to compare two csv files in Python

查看:548
本文介绍了如何在Python中比较两个csv文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个csv文件.一个叫做"Standard reg.csv",另一个叫"Driver Details.csv"

I have two csv files. One is called 'Standard reg.csv', the other is 'Driver Details.csv'

在标准reg.csv"中,前两行是:

In 'Standard reg.csv' the first two lines are:

['Day', 'Month', 'Year', 'Reg Plate', 'Hour', 'Minute', 'Second', 'Speed over limit']
['1', '1', '2016', 'NU16REG', '1', '1', '1', '5816.1667859699355']

Driver Details.csv中的前两行是:

The first two lines in Driver Details.csv are:

['FirstName', 'LastName', 'StreetAddress', 'City', 'Region', 'Country', 'PostCode', 'Registration']
['Violet', 'Kirby', '585-4073 Convallis Street', 'Balfour', 'Orkney', 'United Kingdom', 'OC1X 6QE', 'NU16REG']

我的代码是这样的:

import csv
file_1 = csv.reader(open('Standard Reg.csv', 'r'), delimiter=',')
file_2 = csv.reader(open('Driver Details.csv', 'r'), delimiter=',')
for row in file_1:
    reg = row[3]
    avgspeed = row[7]
    for row in file_2:
        firstname = row[0]
        lastname = row[1]
        address = row[2]
        city = row[3]
        region = row[4]
        reg2 = row[7]
if reg  == reg2:
    print('Match found')
else:
    print('No match found')

这是一个进行中的工作,但是我似乎无法获得比最后一行更多的代码来进行比较.

It's a work-in-progress, but I can't seem to get the code to compare more than just the last line.

在此行之后带有print(reg):reg2 = row[7]

它表明它已经读完了整个专栏.当我在reg2 = row[7]

it shows it has read that whole column. The entire column is also printed when I do print(reg2) after:reg2 = row[7]

但是在if reg == reg2: 它只读取两列的最后几行并进行比较,我不确定如何解决此问题.

But at if reg == reg2: it only reads the last lines of both columns and compares them and I'm not sure how to fix this.

谢谢.

推荐答案

我建议您首先使用注册号作为密钥,将Driver Details.csv中的所有详细信息加载到字典中.这样一来,您就可以轻松查找给定的条目,而不必继续从文件中读取所有行:

I suggest you first load all of the details from the Driver Details.csv into a dictionary, using the registration number as the key. This would then allow you to easily look up a given entry without having to keep reading all of the lines from the file again:

import csv

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - {} {}'.format(driver[0], driver[1]))
        except KeyError as e:
            print('No match found')

您拥有的代码将循环遍历file_2,并将文件指针保留在末尾(如果找不到匹配项)或匹配项的位置(可能早于下一个条目的匹配项丢失).为了使您的方法有效,您必须从每个循环的开头开始读取文件,这会非常慢.

The code as you have it will iterate through file_2 and leave the file pointer either at the end (if no match is found) or at the location of a match (potentially missing matches earlier on for the next entry). For your approach to work you would have to start reading the file from the start for each loop, which would be very slow.

要添加输出csv并显示完整地址,您可以执行以下操作:

To add an output csv and display the full address you could do something like the following:

import csv

speed = 74.3
fine = 35

driver_details = {}

with open('Driver Details.csv') as f_driver_details:
    csv_driver_details = csv.reader(f_driver_details)
    header = next(csv_driver_details)       # skip the header

    for row in csv_driver_details:
        driver_details[row[7]] = row

with open('Standard Reg.csv') as f_standard_reg, open('Output log.csv', 'w', newline='') as f_output:
    csv_standard_reg = csv.reader(f_standard_reg)
    header = next(csv_standard_reg)     # skip the header
    csv_output = csv.writer(f_output)

    for row in csv_standard_reg:
        try:
            driver = driver_details[row[3]]
            print('Match found - Fine {}, Speed {}\n{} {}\n{}'.format(fine, speed, driver[0], driver[1], '\n'.join(driver[2:7])))
            csv_output.writerow(driver[0:7] + [speed, fine])
        except KeyError as e:
            print('No match found')

这将打印以下内容:

Match found - Fine 35, Speed 74.3
Violet Kirby
585-4073 Convallis Street
Balfour
Orkney
United Kingdom
OC1X 6QE

并生成一个包含以下内容的输出文件:

And produce an output file containing:

Violet,Kirby,585-4073 Convallis Street,Balfour,Orkney,United Kingdom,OC1X 6QE,74.3,35

这篇关于如何在Python中比较两个csv文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆