如何在 pandas 数据框中移动列 [英] How to move a column in a pandas dataframe
问题描述
我想要一个索引为长度"的列,并将其作为第二列.当前作为第5列存在.我已经尝试过:
I want to take a column indexed 'length' and make it my second column. It currently exists as the 5th column. I have tried:
colnames = big_df.columns.tolist()
# make index "length" the second column in the big_df
colnames = colnames[0] + colnames[4] + colnames[:-1]
big_df = big_df[colnames]
我看到以下错误:
TypeError:必须为str,而不是列表
TypeError: must be str, not list
我不确定如何解释此错误,因为它实际上应该是list
,对吧?
I'm not sure how to interpret this error because it actually should be a list
, right?
还有,是否有一种通用方法可以将标签中的任何列移动到指定位置?我的列只有一个级别,即不涉及任何MultiIndex
.
Also, is there a general method to move any column by label to a specified position? My columns only have one level, i.e. no MultiIndex
involved.
推荐答案
更正错误
我不确定如何解释此错误,因为它实际上应该是 列表,对吧?
I'm not sure how to interpret this error because it actually should be a list, right?
否:colnames[0]
和colnames[4]
是标量,不是列表.您不能将标量与列表连接在一起.要使它们成为列表,请使用方括号:
No: colnames[0]
and colnames[4]
are scalars, not lists. You can't concatenate a scalar with a list. To make them lists, use square brackets:
colnames = [colnames[0]] + [colnames[4]] + colnames[:-1]
您可以使用df[[colnames]]
或df.reindex(columns=colnames)
:都都必须触发复制操作,因为无法正确处理此转换.
You can either use df[[colnames]]
or df.reindex(columns=colnames)
: both necessarily trigger a copy operation as this transformation cannot be processed in place.
但是将数组转换为列表,然后手动将列表连接起来不仅昂贵,而且容易出错. 相关答案具有许多基于列表的解决方案,但是基于NumPy的解决方案值得一提,因为pd.Index
对象存储为NumPy数组.
But converting arrays to lists and then concatenating lists manually is not only expensive, but prone to error. A related answer has many list-based solutions, but a NumPy-based solution is worthwhile since pd.Index
objects are stored as NumPy arrays.
此处的关键是通过切片而非串联来修改NumPy数组.只有两种情况需要处理:当所需位置位于当前位置之后,反之亦然.
The key here is to modify the NumPy array via slicing rather than concatenation. There are only 2 cases to handle: when the desired position exists after the current position, and vice versa.
import pandas as pd, numpy as np
from string import ascii_uppercase
df = pd.DataFrame(columns=list(ascii_uppercase))
def shifter(df, col_to_shift, pos_to_move):
arr = df.columns.values
idx = df.columns.get_loc(col_to_shift)
if idx == pos_to_move:
pass
elif idx > pos_to_move:
arr[pos_to_move+1: idx+1] = arr[pos_to_move: idx]
else:
arr[idx: pos_to_move] = arr[idx+1: pos_to_move+1]
arr[pos_to_move] = col_to_shift
df = df.reindex(columns=arr)
return df
df = df.pipe(shifter, 'J', 1)
print(df.columns)
Index(['A', 'J', 'B', 'C', 'D', 'E', 'F', 'G', 'H', 'I', 'K', 'L', 'M', 'N',
'O', 'P', 'Q', 'R', 'S', 'T', 'U', 'V', 'W', 'X', 'Y', 'Z'],
dtype='object')
性能基准测试
与基于列表的方法相比,使用NumPy切片在具有大量列的情况下效率更高:
Performance benchmarking
Using NumPy slicing is more efficient with a large number of columns versus a list-based method:
n = 10000
df = pd.DataFrame(columns=list(range(n)))
def shifter2(df, col_to_shift, pos_to_move):
cols = df.columns.tolist()
cols.insert(pos_to_move, cols.pop(df.columns.get_loc(col_to_shift)))
df = df.reindex(columns=cols)
return df
%timeit df.pipe(shifter, 590, 5) # 381 µs
%timeit df.pipe(shifter2, 590, 5) # 1.92 ms
这篇关于如何在 pandas 数据框中移动列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!