应用函数返回多个新列时的 pandas 形状问题 [英] pandas shape issues when applying function returning multiple new columns

查看:57
本文介绍了应用函数返回多个新列时的 pandas 形状问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要为熊猫数据框的每一行返回多个计算列.

I need to return multiple calculated columns for each row of a pandas dataframe.

在以下代码段中执行apply函数时,引发此错误:ValueError: Shape of passed values is (4, 2), indices imply (4, 3):

This error: ValueError: Shape of passed values is (4, 2), indices imply (4, 3) is raised when the apply function is executed in the following code snippet:

import pandas as pd

my_df = pd.DataFrame({
  'datetime_stuff': ['2012-01-20', '2012-02-16', '2012-06-19', '2012-12-15'],
  'url': ['http://www.something', 'http://www.somethingelse', 'http://www.foo', 'http://www.bar' ],
  'categories': [['foo', 'bar'], ['x', 'y', 'z'], ['xxx'], ['a123', 'a456']],   
})

my_df['datetime_stuff'] = pd.to_datetime(my_df['datetime_stuff'])
my_df.sort_values(['datetime_stuff'], inplace=True)

print(my_df.head())

def calculate_stuff(row):
  if row['url'].startswith('http'):
    categories = row['categories'] if type(row['categories']) == list else []
    calculated_column_x = row['url'] + '_other_stuff_'
  else:
    calculated_column_x = None
  another_column = 'deduction_from_fields'
  return calculated_column_x, another_column

print(my_df.shape)

my_df['calculated_column_x'], my_df['another_column'] = zip(*my_df.apply(calculate_stuff, axis=1))

我正在处理的数据帧的每一行都比上面的示例复杂,我正在应用的函数calculate_stuff对每一行使用许多不同的列,然后返回多个新列.

Each row of the dataframe I am working on is more complicated than the example above, and the function calculate_stuff I am applying is using many different columns for each row, then returning multiple new columns.

但是,前面的示例仍然提出了与我无法理解如何修复的数据框的shape相关的ValueError.

However, the previous example still raises this ValueError related to the shape of the dataframe that I am not able to understand how to fix.

如何创建可以从现有列开始计算的多个新列(每行)?

How to create multiple new columns (for each row) that can be calculated starting from the existing columns?

推荐答案

当您从正在应用的函数中返回列表或元组时,pandas会尝试将其拖回运行的数据框中.而是返回一个序列.

When you return a list or tuple from a function that is being applied, pandas attempts to shoehorn it back into the dataframe you ran apply over. Instead, return a series.

重新配置的代码

my_df = pd.DataFrame({
  'datetime_stuff': ['2012-01-20', '2012-02-16', '2012-06-19', '2012-12-15'],
  'url': ['http://www.something', 'http://www.somethingelse', 'http://www.foo', 'http://www.bar' ],
  'categories': [['foo', 'bar'], ['x', 'y', 'z'], ['xxx'], ['a123', 'a456']],   
})

my_df['datetime_stuff'] = pd.to_datetime(my_df['datetime_stuff'])
my_df.sort_values(['datetime_stuff'], inplace=True)

def calculate_stuff(row):
  if row['url'].startswith('http'):
    categories = row['categories'] if type(row['categories']) == list else []
    calculated_column_x = row['url'] + '_other_stuff_'
  else:
    calculated_column_x = None
  another_column = 'deduction_from_fields'

  # I changed this VVVV
  return pd.Series((calculated_column_x, another_column), ['calculated_column_x', 'another_column'])

my_df.join(my_df.apply(calculate_stuff, axis=1))

     categories datetime_stuff                       url                    calculated_column_x         another_column
0    [foo, bar]     2012-01-20      http://www.something      http://www.something_other_stuff_  deduction_from_fields
1     [x, y, z]     2012-02-16  http://www.somethingelse  http://www.somethingelse_other_stuff_  deduction_from_fields
2         [xxx]     2012-06-19            http://www.foo            http://www.foo_other_stuff_  deduction_from_fields
3  [a123, a456]     2012-12-15            http://www.bar            http://www.bar_other_stuff_  deduction_from_fields

这篇关于应用函数返回多个新列时的 pandas 形状问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆