如何在 pandas 数据框中移动列 [英] How to move a column in a pandas dataframe

查看:82
本文介绍了如何在 pandas 数据框中移动列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想要一个索引为长度"的列,并将其作为第二列.当前作为第5列存在.我已经尝试过:

I want to take a column indexed 'length' and make it my second column. It currently exists as the 5th column. I have tried:

colnames = big_df.columns.tolist()

# make index "length" the second column in the big_df
colnames = colnames[0] + colnames[4] + colnames[:-1] 

big_df = big_df[colnames]

我看到以下错误:

TypeError:必须为str,而不是列表

TypeError: must be str, not list

我不确定如何解释此错误,因为它实际上应该是list,对吧?

I'm not sure how to interpret this error because it actually should be a list, right?

还有,是否有一种通用方法可以将标签中的任何列移动到指定位置?我的列只有一个级别,即不涉及任何MultiIndex.

Also, is there a general method to move any column by label to a specified position? My columns only have one level, i.e. no MultiIndex involved.

推荐答案

更正错误

我不确定如何解释此错误,因为它实际上应该是 列表,对吧?

I'm not sure how to interpret this error because it actually should be a list, right?

否:colnames[0]colnames[4]是标量,不是列表.您不能将标量与列表连接在一起.要使它们成为列表,请使用方括号:

No: colnames[0] and colnames[4] are scalars, not lists. You can't concatenate a scalar with a list. To make them lists, use square brackets:

colnames = [colnames[0]] + [colnames[4]] + colnames[:-1]

您可以使用df[[colnames]]df.reindex(columns=colnames):都都必须触发复制操作,因为无法正确处理此转换.

You can either use df[[colnames]] or df.reindex(columns=colnames): both necessarily trigger a copy operation as this transformation cannot be processed in place.

但是将数组转换为列表,然后手动将列表连接起来不仅昂贵,而且容易出错. 相关答案具有许多基于列表的解决方案,但是基于NumPy的解决方案值得一提,因为pd.Index对象存储为NumPy数组.

But converting arrays to lists and then concatenating lists manually is not only expensive, but prone to error. A related answer has many list-based solutions, but a NumPy-based solution is worthwhile since pd.Index objects are stored as NumPy arrays.

此处的关键是通过切片而非串联来修改NumPy数组.只有两种情况需要处理:当所需位置位于当前位置之后,反之亦然.

The key here is to modify the NumPy array via slicing rather than concatenation. There are only 2 cases to handle: when the desired position exists after the current position, and vice versa.

import pandas as pd, numpy as np
from string import ascii_uppercase

df = pd.DataFrame(columns=list(ascii_uppercase))

def shifter(df, col_to_shift, pos_to_move):
    arr = df.columns.values
    idx = df.columns.get_loc(col_to_shift)
    if idx == pos_to_move:
        pass
    elif idx > pos_to_move:
        arr[pos_to_move+1: idx+1] = arr[pos_to_move: idx]
    else:
        arr[idx: pos_to_move] = arr[idx+1: pos_to_move+1]
    arr[pos_to_move] = col_to_shift
    df = df.reindex(columns=arr)
    return df

df = df.pipe(shifter, 'J', 1)

print(df.columns)

Index(['A', 'J', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N',
       'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'],
      dtype='object')

性能基准测试

与基于列表的方法相比,使用NumPy切片在具有大量列的情况下效率更高:

Performance benchmarking

Using NumPy slicing is more efficient with a large number of columns versus a list-based method:

n = 10000
df = pd.DataFrame(columns=list(range(n)))

def shifter2(df, col_to_shift, pos_to_move):
    cols = df.columns.tolist()
    cols.insert(pos_to_move, cols.pop(df.columns.get_loc(col_to_shift)))
    df = df.reindex(columns=cols)
    return df

%timeit df.pipe(shifter, 590, 5)   # 381 µs
%timeit df.pipe(shifter2, 590, 5)  # 1.92 ms

这篇关于如何在 pandas 数据框中移动列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆