Python比较2个CSV文件的列并写入新的CSV [英] Python Comparing columns of 2 csv files and writing to a new csv

查看:212
本文介绍了Python比较2个CSV文件的列并写入新的CSV的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有2个csv文件,每个文件只有1列,如下所示:

I have 2 csv files each with only 1 column as below:

csv文件1:adam3us.csv

created_at
6/7/2018 19:00
6/6/2018 12:00
6/6/2018 9:00
6/6/2018 9:00
6/6/2018 5:00
6/5/2018 16:00
6/5/2018 7:00
6/4/2018 16:00

csv文件2:基于每小时的比特币

created_at
1/8/2017 0:00
1/8/2017 1:00
1/8/2017 2:00
1/8/2017 3:00
1/8/2017 4:00
1/8/2017 5:00
1/8/2017 6:00
6/7/2018 19:00

我正在尝试编写一个Python脚本,该脚本将使用循环将csv文件2的每个值与csv文件1中的每个条目进行比较,如果条目匹配,则应增加一个称为的变量计数,然后应写入一个新的csv文件,其中一列created_at包含两个csv文件的时间,另一列包含count值。

I am trying to write a python script that will compare each value of the csv file 2 with every entry in the csv file 1 using a loop and if the entries match, should increment a declared variable called count and then should write to a new csv file, with one column created_at containing the time that is same for both csv files and a second column with the value of count.

例如,第一次迭代将获取csv文件2的第一行,即6/7/2018 19:00,并将其值与csv文件1中存在的每一行进行比较。如果csv文件2的行与csv文件1的任何行匹配,则count变量应增加。在这种情况下,它将csv文件2的第一行与csv文件1的最后一行匹配,并将count从0递增到1,并将created_at的值和count的值写入新的单独的csv文件中,称为output 。此示例的输出文件应如下所示:

For example, the 1st iteration will take the 1st row of csv file 2, i.e. 6/7/2018 19:00 and compare its value with every row present in the csv file 1. If the 1st row of the csv file 2 matches any row of csv file 1 then count variable should be incremented. In this case it will match the 1st row of the csv file 2 with the last row of the csv file 1 and would increment count from 0 to 1 and would write the value of created_at and the value of count to new separate csv file called output. The output file for this example should look like as below :

output.csv

created_at        count
6/7/2018 19:00      1

应该将每次迭代的count变量设置为0,并且每次迭代都应重复该过程。

The count variable should be set to 0 for every iteration and the process should repeat for every iteration.

我的代码如下:

 import csv

 count=0

path1 = r'C:\Users\Ahmed Ismail Khalid\Desktop\Bullcrap Testing Delete Later\Bitcoin Prices Hourly Based.csv'
path2 = r'C:\Users\Ahmed Ismail Khalid\Desktop\Bullcrap Testing Delete Later\adam3us.csv'
path3 = r'C:\Users\Ahmed Ismail Khalid\Desktop\output.csv'



with open(path1,'rt',encoding='utf-8') as csvin:
reader1 = csv.reader(csvin)
for row in reader1:
    b=row[0]
    with open(path2,'rt',encoding='utf-8') as csvinpu:
        with open(path3, 'w', newline='',encoding='utf-8') as csvoutput:
            writer = csv.writer(csvoutput, lineterminator='\n')
            reader2 = csv.reader(csvinpu)
            all = []
            row = next(reader2)
            row.append('count')
            all.append(row)
            for row in reader2:
               d=row[0]
               if(b==d) :
                   count+=1
                   row.append(count)
                   all.append(row)
               else:
                   row.append(count)
                   all.append(row)
                   writer.writerows(all)



Any and all help would be appreciated.

谢谢

推荐答案

使用熊猫进行此类操作。

使用熊猫在两个数据框中加载两个csv文件,并取两者的交集柱。熊猫具有内置功能。
pd.merge

Load both csv file in two data-frame using pandas and take the intersection of both column. Pandas have inbuilt features. pd.merge

Import pandas as pd
df1 = pd.read_csv(file1)
df2 = pd.read_csv(file2)
output = pd.merge(df1, df2, how="inner", on="column_name") #column_name should be common in both dataframe
#how represents type of intersection. In your case it will be inner(INNER JOIN)
output['count'] = output.groupby('column_name')['column_name'].transform('size') #pandas query
final_output = output.drop_duplicates() #It will remove duplicate rows 

希望,会有所帮助。

这篇关于Python比较2个CSV文件的列并写入新的CSV的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆