排序数据框的行 [英] Sort the rows of a data frame

查看:63
本文介绍了排序数据框的行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有以下数据框(adjusted_RFC_df):

I have the following data frame (adjusted_RFC_df):

     Node               Feature Indicator  Scaled     Class    Direction True_False
0       0                   km        <=   0.181   class_4      0 -> 1         NA
125   125                  gini         =   0.000   class_2    0 -> 126       FALSE
1       1                   WPS        <=   0.074   class_5      1 -> 2        TRUE
52     52                  gini         =   0.000   class_2     1 -> 53       FALSE
105   105                  gini         =   0.492   class_3  102 -> 106       FALSE
102   102           weird_words        <=   0.042   class_4  102 -> 103        TRUE
104   104                  gini         =   0.488   class_4  103 -> 105       FALSE
103   103              funktion        <=   0.290   class_4  103 -> 104        TRUE
107   107                  gini         =   0.000   class_5  106 -> 108       FALSE
106   106           Nb_of_verbs        <=   0.094   class_5  106 -> 107        TRUE
110   110                  gini         =   0.000   class_4  109 -> 111       FALSE
109   109                signal        <=   0.320   class_4  109 -> 110        TRUE
112   112          Flesch_Index        <=   0.627   class_1  112 -> 113        TRUE
115   115                  gini         =   0.000   class_3  112 -> 116       FALSE
114   114                  gini         =   0.000   class_1  113 -> 115       FALSE
113   113       Nb_of_auxiliary        <=   0.714   class_1  113 -> 114        TRUE
..    ...                   ...       ...     ...       ...          ...        ... 

我试图根据方向列中的值对行进行排序(0-> 1,这意味着我试图根据第一个数字0进行排序)。我正在尝试使用以下方法做到这一点:

I am trying to sort the rows based on the value in the 'Direction' column (0 -> 1, means I am trying to sort based on the first number 0). I am trying to do this by using:

   ## Sort rows based on first int of Direction column ##
   # create a column['key'] to sort df
   adjusted_RFC_df['key'] = Adjusted_RFC_df['Direction'].apply(lambda    x: x.split()[0])

   # Create new Dataframe with sorted values based on first number of 'Direction' col 
   class_determiner_df = Adjusted_RFC_df.sort_values('key')

这可以按'->'(左侧)之前的第一个值进行排序,但是我需要排序以使数字与' ->'

This works in sorting by the first value before the '->' (the left hand side), however I need the sorting to keep order with the number on the right side of the '->'

所以它应该看起来像这样:

So it should look like this:

     Node               Feature Indicator  Scaled     Class    Direction True_False
0       0                   km        <=   0.181   class_4      0 -> 1         NA
125   125                  gini         =   0.000   class_2    0 -> 126       FALSE
1       1                   WPS        <=   0.074   class_5      1 -> 2        TRUE
52     52                  gini         =   0.000   class_2     1 -> 53       FALSE
105   105           weird_words         =   0.492   class_3  102 -> 103       FALSE
102   102                  gini        <=   0.042   class_4  102 -> 103        TRUE
104   104              funktion         =   0.488   class_4  103 -> 104       FALSE
103   103                  gini        <=   0.290   class_4  103 -> 105        TRUE
107   107           Nb_of_verbs         =   0.000   class_5  106 -> 107       FALSE
106   106                  gini        <=   0.094   class_5  106 -> 108        TRUE
110   110                signal         =   0.000   class_4  109 -> 110       FALSE
109   109                  gini        <=   0.320   class_4  109 -> 111        TRUE
112   112          Flesch_Index        <=   0.627   class_1  112 -> 113        TRUE
115   115                  gini         =   0.000   class_3  112 -> 116       FALSE
114   114        Nb_of_auxiliary        =   0.000   class_1  113 -> 114       FALSE
113   113                  gini        <=   0.714   class_1  113 -> 115        TRUE
..    ...                   ...       ...     ...       ...          ...        ... 

这让我感到困惑,因为有时它确实使顺序保持在右侧数字之间,但是大多数时候却不在。

It is confusing me as sometimes it does keep the order between the right hand side numbers however most off the time it doesn't.

我以为col的方向可能是string类型,所以可能对字符串进行排序是一个问题。因此,我尝试执行以下操作:

I thought that maybe it was a problem with sorting strings as the direction col is of type string. So I tried to do the following:

adjusted_RFC_df['key'] = adjusted_RFC_df['key'].astype(np.int64)

但这会导致以下错误:

ValueError: invalid literal for int() with base 10: 'NA'

因此,似乎正在尝试将['TRUE / FALSE']列转换为int以及仅将 ['key'] 列转换为int

So it seems like it is trying to convert the ['TRUE/FALSE'] column to int as well as just the ['key'] column.

Direction col可能是字符串类型的问题吗?

Is it likely a problem with the Direction col being type string?

或者是否有一种方法可以根据'->'之前的第一个数字进行排序,同时确保第二个数字也按顺序排列(从最小到最大) ?

Or is there a way of sorting based on the first number before the '->' whilst ensuring the the second number is also in order (sorted from smallest to biggest)?

推荐答案

如果方向始终为字符串类型,并且具有格式 int space'->'space int 1-> 2 ,那么您可以获得另一个排序键

If Direction is always of type string and has also this format int space '->' space int like 1 -> 2 then you can get another key to the sorting

df['key1'] = df['Direction'].apply(lambda x: x.split()[0])
df['key2'] = df['Direction'].apply(lambda x: x.split()[2])

然后根据这两个键进行排序

and then sort based on these 2 keys

df.sort_values(['key1', 'key2'])

编辑:
这是获取 key1 和'key2'

df['key1'] = df['Direction'].apply(lambda x: int(x.split('->')[0]))
df['key2'] = df['Direction'].apply(lambda x: int(x.split('->')[1]))

这篇关于排序数据框的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆