pandas ,如果其他列为空,则将某些列连接 [英] Pandas, concatenate certain columns if other columns are empty

查看:56
本文介绍了 pandas ,如果其他列为空,则将某些列连接的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个看起来像这样的CSV文件:

I've got a CSV file that is supposed to look like this:

ID, years_active, issues
-------------------------------
'Truck1', 8, 'In dire need of a paintjob'
'Car 5', 3,  'To small for large groups'

但是,CSV格式有些不正确,目前看起来像这样.

However, the CSV is somewhat malformed and currently looks like this.

ID, years_active, issues
------------------------
'Truck1', 8, 'In dire need'
'','', 'of a'
'','', 'paintjob'
'Car 5', 3, 'To small for'
'', '', 'large groups'

现在,我能够通过缺少'ID'和'years_active'值来识别错误的行,并希望将'该行的问题的值附加到具有'ID'和' years_active的值.

Now, I am able to identify faulty rows by the lack of an 'ID' and 'years_active' value and would like to append the value of 'issues of that row to the last preceding row that had 'ID' and 'years_active' values.

我对熊猫没有很好的经验,但是想出了以下代码:

I am not very experienced with pandas, but came up with the following code:

for index, row in df.iterrows():
        if row['years_active'] == None:
            df.loc[index-1]['issues'] += row['issues']

但是-IF条件无法触发. 我想做的事情有可能吗?如果是这样,有人知道我在做什么错吗?

Yet - the IF condition fails to trigger. Is the thing I am trying to do possible? And if so, does anyone have an idea what I am doing wrong?

推荐答案

给出示例输入:

df = pd.DataFrame({
    'ID': ['Truck1', '', '', 'Car 5', ''],
    'years_active': [8, '', '', 3, ''],
    'issues': ['In dire need', 'of a', 'paintjob', 'To small for', 'large groups']
})

您可以使用:

new_df = df.groupby(df.ID.replace('', method='ffill')).agg({'years_active': 'first', 'issues': ' '.join})

会给你的:

        years_active                      issues
ID                                              
Car 5              3   To small for large groups
Truck1             8  In dire need of a paintjob

因此,我们在这里所做的工作是将非空白ID向前填充到后续的空白ID中,并使用这些ID对相关行进行分组.然后,我们汇总以获取 years_active 的第一个匹配项,并按照出现单个结果的顺序将 issues 列合并在一起.

So what we're doing here is forward filling the non-blank IDs into subsequent blank IDs and using those to group the related rows. We then aggregate to take the first occurrence of the years_active and join together the issues columns in the order they appear to create a single result.

这篇关于 pandas ,如果其他列为空,则将某些列连接的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆