使用glob/merge删除NaN行,特定excel文件中的某些列 [英] Dropping NaN rows, certain columns in specific excel files using glob/merge

查看:104
本文介绍了使用glob/merge删除NaN行,特定excel文件中的某些列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想在Excel文件的for循环加载中将最终文件中的NaN行删除,并删除所有公司,电子邮件,除了最终加载到excel文件之外的所有文件,创建重复的列.

I would like to drop NaN rows in the final file in a for loop loading in excel files, and dropping all company, emails, created duplicated columns from all but the final loaded in excel file.

这是我的for循环(以及随后合并为一个DF),目前:

Here is my for loop (and subsequent merging into a single DF), currently:

for f in glob.glob("./gowall-users-export-*.xlsx"):
    df = pd.read_excel(f)
    all_users_sheets_hosts.append(df)
    j = re.search('(\d+)', f)
    df.columns = df.columns.str.replace('.*Hosted Meetings.*', 'Hosted Meetings' + ' ' + j.group(1))

all_users_sheets_hosts = reduce(lambda left,right: pd.merge(left,right,on=['First Name', 'Last Name'], how='outer'), all_users_sheets_hosts)

以下是生成的DF的前几行:

Here are the first few rows of the resulting DF:

Company_x   First Name  Last Name   Emails_x    Created_x   Hosted Meetings 03112016    Facilitated Meetings_x  Attended Meetings_x Company_y   Emails_y    ... Created_x   Hosted Meetings 04122016    Facilitated Meetings_x  Attended Meetings_x Company_y   Emails_y    Created_y   Hosted Meetings 04212016    Facilitated Meetings_y  Attended Meetings_y
0   TS  X Y X@Y.com 03/10/2016  0.0 0.0 0.0 TS  X@Y.com ... 03/10/2016  0.0 0.0 2.0 NaN NaN NaN NaN NaN NaN
1   TS  X Y X@Y.com 03/10/2016  0.0 0.0 0.0 TS  X@Y.com ... 01/25/2016  0.0 0.0 0.0 NaN NaN NaN NaN NaN NaN
2   TS  X Y X@Y.com 03/10/2016  0.0 0.0 0.0 TS  X@Y.com ... 04/06/2015  9.0 10.0    17.0    NaN NaN NaN NaN NaN NaN

推荐答案

要防止出现多个CompanyEmailsCreatedFacilitated MeetingsAttended Meetings列,请将其从right DataFrame中删除.要删除具有所有NaN值的行,请使用result.dropna(how='all', axis=0):

To prevent multiple Company, Emails, Created, Facilitated Meetings and Attended Meetings columns, drop them from the right DataFrame. To remove rows with all NaN values, use result.dropna(how='all', axis=0):

import pandas as pd
import functools

for f in glob.glob("./gowall-users-export-*.xlsx"):
    df = pd.read_excel(f)
    all_users_sheets_hosts.append(df)
    j = re.search('(\d+)', f)
    df.columns = df.columns.str.replace('.*Hosted Meetings.*', 
                                        'Hosted Meetings' + ' ' + j.group(1))

# Drop rows of all NaNs from the final DataFrame in `all_users_sheets_hosts`
all_users_sheets_hosts[-1] = all_users_sheets_hosts[-1].dropna(how='all', axis=0)

def mergefunc(left, right):
    cols = ['Company', 'Emails', 'Created', 'Facilitated Meetings', 'Attended Meetings']
    right = right.drop(cols, axis=1)
    result = pd.merge(left, right, on=['First Name', 'Last Name'], how='outer')
    return result

all_users_sheets_hosts = functools.reduce(mergefunc, all_users_sheets_hosts)

由于Company等. al.列将仅存在于left DataFrame中,这些列将不会扩散.但是请注意,如果leftright数据框在这些列中具有不同的值,则将仅保留all_users_sheets_hosts中的第一个数据框中的值.

Since the Company et. al. columns will only exist in the left DataFrame, there will be no proliferation of those columns. Note, however, that if the left and right DataFrames have different values in those columns, only the values in the first DataFrame in all_users_sheets_hosts will be kept.

或者,如果leftright数据帧具有与Company等相同的值. al.列,那么另一种选择是也可以简单地合并这些列:

Alternative, if the left and right DataFrames have the same values for the Company et. al. columns, then another option would be to simple merge on those columns too:

def mergefunc(left, right):
    cols = ['First Name', 'Last Name', 'Company', 'Emails', 'Created', 
            'Facilitated Meetings', 'Attended Meetings']
    result = pd.merge(left, right, on=cols, how='outer')
    return result
all_users_sheets_hosts = functools.reduce(mergefunc, all_users_sheets_hosts)

这篇关于使用glob/merge删除NaN行,特定excel文件中的某些列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆