自然排序Pandas DataFrame [英] Naturally sorting Pandas DataFrame

查看:96
本文介绍了自然排序Pandas DataFrame的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个熊猫DataFrame,它的索引要自然排序. Natsort似乎不起作用.在构建DataFrame之前对索引进行排序似乎无济于事,因为我对DataFrame所做的操作似乎使过程中的排序混乱了.关于如何自然使用索引的任何想法?

I have a pandas DataFrame with indices I want to sort naturally. Natsort doesn't seem to work. Sorting the indices prior to building the DataFrame doesn't seem to help because the manipulations I do to the DataFrame seem to mess up the sorting in the process. Any thoughts on how I can resort the indices naturally?

from natsort import natsorted
import pandas as pd

# An unsorted list of strings
a = ['0hr', '128hr', '72hr', '48hr', '96hr']
# Sorted incorrectly
b = sorted(a)
# Naturally Sorted 
c = natsorted(a)

# Use a as the index for a DataFrame
df = pd.DataFrame(index=a)
# Sorted Incorrectly
df2 = df.sort()
# Natsort doesn't seem to work
df3 = natsorted(df)

print(a)
print(b)
print(c)
print(df.index)
print(df2.index)
print(df3.index)

推荐答案

如果要对df进行排序,只需对索引或数据进行排序,然后直接将其分配给df的索引,而不是尝试将df作为arg,因为它会产生一个空列表:

If you want to sort the df, just sort the index or the data and assign directly to the index of the df rather than trying to pass the df as an arg as that yields an empty list:

In [7]:

df.index = natsorted(a)
df.index
Out[7]:
Index(['0hr', '48hr', '72hr', '96hr', '128hr'], dtype='object')

请注意,df.index = natsorted(df.index)也可以使用

如果将df作为arg传递,则会产生一个空列表,在这种情况下,因为df为空(没有列),否则它将返回排序后的列,而不是您想要的:

if you pass the df as an arg it yields an empty list, in this case because the df is empty (has no columns), otherwise it will return the columns sorted which is not what you want:

In [10]:

natsorted(df)
Out[10]:
[]

编辑

如果要对索引进行排序,以便数据与索引一起重新排序,请使用

If you want to sort the index so that the data is reordered along with the index then use reindex:

In [13]:

df=pd.DataFrame(index=a, data=np.arange(5))
df
Out[13]:
       0
0hr    0
128hr  1
72hr   2
48hr   3
96hr   4
In [14]:

df = df*2
df
Out[14]:
       0
0hr    0
128hr  2
72hr   4
48hr   6
96hr   8
In [15]:

df.reindex(index=natsorted(df.index))
Out[15]:
       0
0hr    0
48hr   6
72hr   4
96hr   8
128hr  2

请注意,您必须将reindex的结果分配给新的df或它本身,它不接受inplace参数.

Note that you have to assign the result of reindex to either a new df or to itself, it does not accept the inplace param.

这篇关于自然排序Pandas DataFrame的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆