将openpyxl数据传递给 pandas [英] pass openpyxl data to pandas

查看:108
本文介绍了将openpyxl数据传递给 pandas 的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要从excel文件中的数据中将全名"字段分为名",中间名"和姓"字段.我不知道如何在熊猫中做到这一点,所以我转向openpyxl.我可以根据需要拆分变量.但是,由于在其中添加了列对于新字段,openpyxl并不容易,我想我会将这些值传递给熊猫.

I am splitting "full name" fields into "first name", middle name" and "last name" fields from data from an excel file. I couldn't figure out how to do that in pandas, so I turned to openpyxl. I got the variables split as I desired. But, since adding columns to openpyxl for the new fields is not easy, I thought I would pass the values to pandas.

运行代码时,我正在生成所需的数据框,但是一旦将df发送到ExcelWriter,则仅将最后一行添加到Excel文件中.数据在正确的位置.

I'm generating the dataframe that I need when I run the code, but once I send the df to ExcelWriter, only the last row is added to the Excel file. The data is in the right places, though.

代码如下:

for cellObj in range(2, sheet.max_row+1):
    #print cellObj
    id = sheet['A' + str(cellObj)].value
    fullname = sheet['B' + str(cellObj)].value.strip()
    namelist = fullname.split(' ')  
    for i in namelist:
        firstname = namelist[0]
        if len(namelist) == 2:
            lastname = namelist[1]
            middlename = ''
        elif len(namelist) == 3:
            middlename = namelist[1]
            lastname = namelist[2]
        elif len(namelist) == 4:
            middlename = namelist[1]
            lastname = namelist[2] + " " + namelist[3]
        if (namelist[1] == 'Del') | (namelist[1] == 'El') | (namelist[1] == 'Van'):
            middlename = ''
            lastname = namelist[1] + " " + namelist[2]
    df = pd.DataFrame({'personID':id,'lastName':lastname,'firstName':firstname,'middleName':middlename}, index=[id])

    writer = pd.ExcelWriter('output.xlsx')
    df.to_excel(writer,'Sheet1', columns=['ID','lastName','firstName','middleName'])
    writer.save()

有什么想法吗?

谢谢

推荐答案

几件事.首先,您的代码只会让您一行,因为每次它通过if测试时,您都会覆盖它们的值.例如,

A couple of things. First, your code is only ever going to get you one line, because you overwrite the values every time it passes an if test. for example,

  if len(namelist) == 2:
        lastname = namelist[1]

这为变量lastname分配了一个字符串.您没有追加到列表,只是分配了一个字符串.然后,当您创建数据框时, df = pd.DataFrame({'personID':id,'lastName':lastname,...您使用此值,因此数据框将仅保存该字符串.有道理?如果必须使用openpyexcel进行此操作,请尝试以下操作:

This assigns a string to the variable lastname. You are not appending to a list, you are just assigning a string. Then when you make your dataframe, df = pd.DataFrame({'personID':id,'lastName':lastname,... your using this value, so the dataframe will only ever hold that string. Make sense? If you must do this using openpyexcel, try something like:

lastname = [] #create an empty list
if len(namelist) == 2:
    lastname.append(namelist[1]) #add the name to the list

但是,我认为,只要您弄清楚如何用熊猫来做到这一点,您的生活将最终变得更加轻松.实际上,这很容易.尝试这样的事情:

However, I think your life will ultimately be much easier if you just figure out how to do this with pandas. It is in fact quite easy. Try something like this:

import pandas as pd
#read excel
df = pd.read_excel('myInputFilename.xlsx', encoding = 'utf8')
#write to excel
df.to_excel('MyOutputFile.xlsx')

这篇关于将openpyxl数据传递给 pandas 的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆