自然排序 Pandas DataFrame [英] Naturally sorting Pandas DataFrame

查看:26
本文介绍了自然排序 Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个带有我想要自然排序的索引的 Pandas DataFrame.Natsort 似乎不起作用.在构建 DataFrame 之前对索引进行排序似乎没有帮助,因为我对 DataFrame 所做的操作似乎在过程中弄乱了排序.关于如何自然地使用索引的任何想法?

I have a pandas DataFrame with indices I want to sort naturally. Natsort doesn't seem to work. Sorting the indices prior to building the DataFrame doesn't seem to help because the manipulations I do to the DataFrame seem to mess up the sorting in the process. Any thoughts on how I can resort the indices naturally?

from natsort import natsorted
import pandas as pd

# An unsorted list of strings
a = ['0hr', '128hr', '72hr', '48hr', '96hr']
# Sorted incorrectly
b = sorted(a)
# Naturally Sorted 
c = natsorted(a)

# Use a as the index for a DataFrame
df = pd.DataFrame(index=a)
# Sorted Incorrectly
df2 = df.sort()
# Natsort doesn't seem to work
df3 = natsorted(df)

print(a)
print(b)
print(c)
print(df.index)
print(df2.index)
print(df3.index)

推荐答案

如果要对 df 进行排序,只需对索引或数据进行排序并直接分配给 df 的索引,而不是尝试将 df 作为arg 因为它产生一个空列表:

If you want to sort the df, just sort the index or the data and assign directly to the index of the df rather than trying to pass the df as an arg as that yields an empty list:

In [7]:

df.index = natsorted(a)
df.index
Out[7]:
Index(['0hr', '48hr', '72hr', '96hr', '128hr'], dtype='object')

注意 df.index = natsorted(df.index) 也有效

如果您将 df 作为 arg 传递,它会产生一个空列表,在这种情况下,因为 df 是空的(没有列),否则它将返回已排序的列,这不是您想要的:

if you pass the df as an arg it yields an empty list, in this case because the df is empty (has no columns), otherwise it will return the columns sorted which is not what you want:

In [10]:

natsorted(df)
Out[10]:
[]

编辑

如果您想对索引进行排序以便数据与索引一起重新排序,请使用 reindex:

If you want to sort the index so that the data is reordered along with the index then use reindex:

In [13]:

df=pd.DataFrame(index=a, data=np.arange(5))
df
Out[13]:
       0
0hr    0
128hr  1
72hr   2
48hr   3
96hr   4
In [14]:

df = df*2
df
Out[14]:
       0
0hr    0
128hr  2
72hr   4
48hr   6
96hr   8
In [15]:

df.reindex(index=natsorted(df.index))
Out[15]:
       0
0hr    0
48hr   6
72hr   4
96hr   8
128hr  2

请注意,您必须将 reindex 的结果分配给新的 df 或它本身,它不接受 inplace 参数.

Note that you have to assign the result of reindex to either a new df or to itself, it does not accept the inplace param.

这篇关于自然排序 Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆