使用 Python 合并两个 CSV 文件 [英] Merging two CSV files using Python
问题描述
好的,我已经阅读了 Stack Overflow 上的几个主题.我认为这对我来说很容易做到,但我发现我仍然没有很好地掌握 Python.我尝试了位于 如何将 2 个具有公共列值的 csv 文件结合起来,但两个文件的行数不同 这很有帮助,但我仍然没有得到我希望达到的结果.
本质上我有 2 个 csv 文件,其中有一个共同的第一列.我想合并 2. 即
文件.csv
<前>标题,舞台,一月,二月该死,3.001,0.421,0.532好的,2.829,1.036,0.751三、1.115、1.146、2.921fileb.csv
<前>标题,三月,四月,五月,君,该死,0.631,1.321,0.951,1.751好的,1.001,0.247,2.456,0.3216三,0.285,1.283,0.924,956output.csv(不是我得到的,而是我想要的)
<前>标题,阶段,一月,二月,三月,四月,五月,六月织补,3.001,0.421,0.532,0.631,1.321,0.951,1.751好的,2.829,1.036,0.751,1.001,0.247,2.456,0.3216三、1.115、1.146、2.921、0.285、1.283、0.924、956output.csv(我实际得到的输出)
<前>标题,二月,五月好的,0.751,2.456三、2.921、0.924该死,0.532,0.951我正在尝试的代码:
'''测试合并 2 个 csv 文件'''导入 csv导入数组导入操作系统with open('Z:\Desktop\test\filea.csv') as f:r = csv.reader(f, delimiter=',')dict1 = {row[0]: row[3] for row in r}with open('Z:\Desktop\test\fileb.csv') as f:r = csv.reader(f, delimiter=',')#dict2 = {row[0]: row[3] for row in r}dict2 = {row[0:3] for row in r}打印 str(dict1)打印 str(dict2)键 = 设置(dict1.keys() + dict2.keys())with open('Z:\Desktop\test\output.csv', 'wb') as f:w = csv.writer(f, delimiter=',')w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])
非常感谢任何帮助.
当我处理 csv
文件时,我经常使用 熊猫 图书馆.它使这样的事情变得非常容易.例如:
将pandas导入为pda = pd.read_csv("filea.csv")b = pd.read_csv("fileb.csv")b = b.dropna(轴=1)合并 = a.merge(b, on='title')merge.to_csv("output.csv", index=False)
<小时>
下面是一些解释.首先,我们读入 csv 文件:
<预><代码>>>>a = pd.read_csv("filea.csv")>>>b = pd.read_csv("fileb.csv")>>>一种冠军阶段一月二月0 可恶 3.001 0.421 0.5321 好 2.829 1.036 0.7512 三 1.115 1.146 2.921>>>乙标题 mar apr may jun 无名:50 可恶 0.631 1.321 0.951 1.7510 NaN1 正常 1.001 0.247 2.456 0.3216 NaN2 三 0.285 1.283 0.924 956.0000 NaN我们看到还有一列额外的数据(注意fileb.csv
的第一行——title,mar,apr,may,jun,
——最后有一个额外的逗号).我们可以很容易地摆脱它:
现在我们可以在标题栏合并a
和b
:
最后写出来:
<预><代码>>>>merge.to_csv("output.csv", index=False)制作:
title,stage,jan,feb,mar,apr,may,jun织补,3.001,0.421,0.532,0.631,1.321,0.951,1.751好的,2.829,1.036,0.751,1.001,0.247,2.456,0.3216三、1.115、1.146、2.921、0.285、1.283、0.924、956.0
OK I have read several threads here on Stack Overflow. I thought this would be fairly easy for me to do but I find that I still do not have a very good grasp of Python. I tried the example located at How to combine 2 csv files with common column value, but both files have different number of lines and that was helpful but I still do not have the results that I was hoping to achieve.
Essentially I have 2 csv files with a common first column. I would like to merge the 2. i.e.
filea.csv
title,stage,jan,feb darn,3.001,0.421,0.532 ok,2.829,1.036,0.751 three,1.115,1.146,2.921
fileb.csv
title,mar,apr,may,jun, darn,0.631,1.321,0.951,1.751 ok,1.001,0.247,2.456,0.3216 three,0.285,1.283,0.924,956
output.csv (not the one I am getting but what I want)
title,stage,jan,feb,mar,apr,may,jun darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751 ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216 three,1.115,1.146,2.921,0.285,1.283,0.924,956
output.csv (the output that I actually got)
title,feb,may ok,0.751,2.456 three,2.921,0.924 darn,0.532,0.951
The code I was trying:
'''
testing merging of 2 csv files
'''
import csv
import array
import os
with open('Z:\Desktop\test\filea.csv') as f:
r = csv.reader(f, delimiter=',')
dict1 = {row[0]: row[3] for row in r}
with open('Z:\Desktop\test\fileb.csv') as f:
r = csv.reader(f, delimiter=',')
#dict2 = {row[0]: row[3] for row in r}
dict2 = {row[0:3] for row in r}
print str(dict1)
print str(dict2)
keys = set(dict1.keys() + dict2.keys())
with open('Z:\Desktop\test\output.csv', 'wb') as f:
w = csv.writer(f, delimiter=',')
w.writerows([[key, dict1.get(key, "''"), dict2.get(key, "''")] for key in keys])
Any help is greatly appreciated.
When I'm working with csv
files, I often use the pandas library. It makes things like this very easy. For example:
import pandas as pd
a = pd.read_csv("filea.csv")
b = pd.read_csv("fileb.csv")
b = b.dropna(axis=1)
merged = a.merge(b, on='title')
merged.to_csv("output.csv", index=False)
Some explanation follows. First, we read in the csv files:
>>> a = pd.read_csv("filea.csv")
>>> b = pd.read_csv("fileb.csv")
>>> a
title stage jan feb
0 darn 3.001 0.421 0.532
1 ok 2.829 1.036 0.751
2 three 1.115 1.146 2.921
>>> b
title mar apr may jun Unnamed: 5
0 darn 0.631 1.321 0.951 1.7510 NaN
1 ok 1.001 0.247 2.456 0.3216 NaN
2 three 0.285 1.283 0.924 956.0000 NaN
and we see there's an extra column of data (note that the first line of fileb.csv
-- title,mar,apr,may,jun,
-- has an extra comma at the end). We can get rid of that easily enough:
>>> b = b.dropna(axis=1)
>>> b
title mar apr may jun
0 darn 0.631 1.321 0.951 1.7510
1 ok 1.001 0.247 2.456 0.3216
2 three 0.285 1.283 0.924 956.0000
Now we can merge a
and b
on the title column:
>>> merged = a.merge(b, on='title')
>>> merged
title stage jan feb mar apr may jun
0 darn 3.001 0.421 0.532 0.631 1.321 0.951 1.7510
1 ok 2.829 1.036 0.751 1.001 0.247 2.456 0.3216
2 three 1.115 1.146 2.921 0.285 1.283 0.924 956.0000
and finally write this out:
>>> merged.to_csv("output.csv", index=False)
producing:
title,stage,jan,feb,mar,apr,may,jun
darn,3.001,0.421,0.532,0.631,1.321,0.951,1.751
ok,2.829,1.036,0.751,1.001,0.247,2.456,0.3216
three,1.115,1.146,2.921,0.285,1.283,0.924,956.0
这篇关于使用 Python 合并两个 CSV 文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!