希望通过ID将两个Excel文件合并为一个使用Python 2.7的Excel文件 [英] Looking to merge two Excel files by ID into one Excel file using Python 2.7

查看:469
本文介绍了希望通过ID将两个Excel文件合并为一个使用Python 2.7的Excel文件的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是Python系列的新手,一直在努力解决合并两个Excel文件的日子。我已经研究了无休止的合并,并试图调整我的代码来适应我的需要,但它并没有奏效。我想知道我是否可以得到任何帮助,为什么我的代码不工作。我觉得这对于使用Python的人来说可能是一个常见的问题,所以希望这样也能帮助别人。我感谢任何意见!



我有两个excel文件,中文Scores3.csv和中文Scores4.csv,我试图通过ID合并,是每个公司独有的。除了公司ID,每个excel文件没有匹配的列。另外,并不是所有的公司都列在这两个文件上。有些列出了这两个,但是其他的列在任何一个或另一个。我想将一个公司ID的所有信息一起附加在excel表上。即第一个excel文件列是ID,JanSales,FebSales等,第二个excel文件列是ID,CreditScore,EMMAScore等。我想创建的excel文件有:ID,JanSales,FebSales,CreditScore,EMMAScore
全部根据公司ID。



这是否有意义?这就像在excel中使用VLOOKUP,但是我想使用Python来做到这一点。无论如何,这里是我的编码,这是不工作的。我尝试操纵它,但它不工作。我希望得到反馈!

  import sys 
import csv

def main )
headers = []

for中文Scores3.csv中的arg:
with open(arg)as f:
curr ='Chinese Scores3。 csv'.reader(f).next()
headers.append(curr)
try:
keys = list(set(keys)& set(curr))
除了NameError:
键= curr


标题=列表(键)
标题中的h:
标题+ = [k表示k在h如果k不在键中]

data = {}
for中文Scores4.csv中的arg:
with open(arg)as f:
reader = '$ s
data_key = tuple([line [k] for k in keys])
如果不是data_key在数据中:data [data_key] = {}
标题中的k:
try:
data [data_key] [k] = line [k]
除了KeyError:
pass

for data.keys()中的
标题中的col:
如果键入数据而不是数据[key]:
del(data [key])

print','。join(header)
对于排序(数据)中的键:
row = [data [key] [col] for col in header]
print','。join(row)

如果__name__ =='__main__':
sys.exit(main(sys.argv [1:]))


解决方案

虽然我们可以修复你的代码,但我强烈建议您查看 pandas 图书馆,如果你要做这样的工作。例如,如果我们有两个csv文件(虽然我们可以直接从Excel文件开始,如果我们想要的话),这样可以使生活变得更加容易,而且常常是微不足道的。



< :

 >>> !cat scores3.csv 
ID,JanSales,FebSales
1,100,200
2,200,500
3,300,400
>>> !cat scores4.csv
ID,CreditScore,EMMAScore
2,好,沃森
3,好的,汤普森
4,不太好,NA

我们可以将它们读入名为DataFrames的对象(想像它们像Excel表格):

 >>>将大熊猫导入为pd 
>>> s3 = pd.read_csv(scores3.csv)
>>> s4 = pd.read_csv(scores4.csv)
>>> s3
ID JanSales FebSales
0 1 100 200
1 2 200 500
2 3 300 400
>>> s4
ID CreditScore EMMAScore
0 2好的沃森
1 3好的汤普森
2 4不太好NaN

然后我们可以在ID列中合并它们:

  >>> merged = s3.merge(s4,on =ID,how =outer)
>>>合并
ID JanSales FebSales CreditScore EMMAScore
0 1 100 200 NaN NaN
1 2 200 500好沃森
2 3 300 400好的汤普森
3 4 NaN NaN not-好的NaN

之后我们可以将其保存到csv文件或Excel文件中: / p>

 >>> merged.to_csv(merged.csv)
>>> merged.to_excel(merged.xlsx)


I am new to the Python family and have been trying to solve merge two Excel files for days. I have researched merging endlessly and tried to adapt my code to fit my needs, but it hasn't been working. I was wondering if I could get any help of why my code isn't working. I feel that this could be a common problem to others using Python, so hopefully this will help out others as well. I appreciate any comments!

I have two excel files, 'Chinese Scores3.csv' and 'Chinese Scores4.csv' which I am trying to merge by an ID, which is unique to each company. Other than the company ID, there are no matching columns for each excel file. Also, not all companies are listed on both files. Some are listed both, but others are listed on either one or the other. I would like to attach all the information for a company ID together in one row on an excel sheet. i.e. the first excel file columns are ID, JanSales, FebSales, etc. and the second excel file columns are ID, CreditScore, EMMAScore, etc. The excel file I would like to create has columns: ID, JanSales, FebSales, CreditScore, EMMAScore all according to company ID.

Is this making sense? It's like using VLOOKUP in excel, but I would like to do this using Python. Anyway, here is my coding, which isn't working. I try manipulating it, but it isn't working. I hope to get feedback!

import sys
import csv

def main(arg):
    headers= []

    for arg in 'Chinese Scores3.csv':
        with open(arg) as f:
            curr = 'Chinese Scores3.csv'.reader(f).next()
            headers.append(curr)
            try:
                keys=list( set(keys) & set (curr))
            except NameError:
                keys = curr


    header = list(keys)
    for h in headers:
        header += [ k for k in h if k not in keys ]

    data = {}
    for arg in 'Chinese Scores4.csv':
        with open(arg) as f:
            reader = 'Chinese Scores4.csv'.DictReader(f)
            for line in reader:
                data_key = tuple([ line[k] for k in keys ])
                if not data_key in data: data[data_key] = {}
                for k in header:
                    try:
                        data[data_key][k] = line[k]
                    except KeyError:
                        pass

    for key in data.keys():
        for col in header:
            if key in data and not col in data[key]:
                del( data[key] )

    print ','.join(header)
    for key in sorted(data):
        row = [ data[key][col] for col in header ]
        print ','.join(row)

if __name__ == '__main__':
    sys.exit( main( sys.argv[1:]) )

解决方案

While we could fix your code, I'd strongly recommend looking into the pandas library if you're going to be doing this sort of work instead. It makes life a lot easier, and often borderline trivial.

For example, if we had two csv files (although we could have started straight from Excel files if we wanted):

>>> !cat scores3.csv
ID,JanSales,FebSales
1,100,200
2,200,500
3,300,400
>>> !cat scores4.csv
ID,CreditScore,EMMAScore
2,good,Watson
3,okay,Thompson
4,not-so-good,NA

We could read these into objects called DataFrames (think of them sort of like Excel sheets):

>>> import pandas as pd
>>> s3 = pd.read_csv("scores3.csv")
>>> s4 = pd.read_csv("scores4.csv")
>>> s3
   ID  JanSales  FebSales
0   1       100       200
1   2       200       500
2   3       300       400
>>> s4
   ID  CreditScore EMMAScore
0   2         good    Watson
1   3         okay  Thompson
2   4  not-so-good       NaN

And then we can merge them on the ID column:

>>> merged = s3.merge(s4, on="ID", how="outer")
>>> merged
   ID  JanSales  FebSales  CreditScore EMMAScore
0   1       100       200          NaN       NaN
1   2       200       500         good    Watson
2   3       300       400         okay  Thompson
3   4       NaN       NaN  not-so-good       NaN

After which we could save it to a csv file or to an Excel file:

>>> merged.to_csv("merged.csv")
>>> merged.to_excel("merged.xlsx")

这篇关于希望通过ID将两个Excel文件合并为一个使用Python 2.7的Excel文件的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆