排序数据框的行 [英] Sort the rows of a data frame
问题描述
我有以下数据框(adjusted_RFC_df):
I have the following data frame (adjusted_RFC_df):
Node Feature Indicator Scaled Class Direction True_False
0 0 km <= 0.181 class_4 0 -> 1 NA
125 125 gini = 0.000 class_2 0 -> 126 FALSE
1 1 WPS <= 0.074 class_5 1 -> 2 TRUE
52 52 gini = 0.000 class_2 1 -> 53 FALSE
105 105 gini = 0.492 class_3 102 -> 106 FALSE
102 102 weird_words <= 0.042 class_4 102 -> 103 TRUE
104 104 gini = 0.488 class_4 103 -> 105 FALSE
103 103 funktion <= 0.290 class_4 103 -> 104 TRUE
107 107 gini = 0.000 class_5 106 -> 108 FALSE
106 106 Nb_of_verbs <= 0.094 class_5 106 -> 107 TRUE
110 110 gini = 0.000 class_4 109 -> 111 FALSE
109 109 signal <= 0.320 class_4 109 -> 110 TRUE
112 112 Flesch_Index <= 0.627 class_1 112 -> 113 TRUE
115 115 gini = 0.000 class_3 112 -> 116 FALSE
114 114 gini = 0.000 class_1 113 -> 115 FALSE
113 113 Nb_of_auxiliary <= 0.714 class_1 113 -> 114 TRUE
.. ... ... ... ... ... ... ...
我试图根据方向列中的值对行进行排序(0-> 1,这意味着我试图根据第一个数字0进行排序)。我正在尝试使用以下方法做到这一点:
I am trying to sort the rows based on the value in the 'Direction' column (0 -> 1, means I am trying to sort based on the first number 0). I am trying to do this by using:
## Sort rows based on first int of Direction column ##
# create a column['key'] to sort df
adjusted_RFC_df['key'] = Adjusted_RFC_df['Direction'].apply(lambda x: x.split()[0])
# Create new Dataframe with sorted values based on first number of 'Direction' col
class_determiner_df = Adjusted_RFC_df.sort_values('key')
这可以按'->'(左侧)之前的第一个值进行排序,但是我需要排序以使数字与' ->'
This works in sorting by the first value before the '->' (the left hand side), however I need the sorting to keep order with the number on the right side of the '->'
所以它应该看起来像这样:
So it should look like this:
Node Feature Indicator Scaled Class Direction True_False
0 0 km <= 0.181 class_4 0 -> 1 NA
125 125 gini = 0.000 class_2 0 -> 126 FALSE
1 1 WPS <= 0.074 class_5 1 -> 2 TRUE
52 52 gini = 0.000 class_2 1 -> 53 FALSE
105 105 weird_words = 0.492 class_3 102 -> 103 FALSE
102 102 gini <= 0.042 class_4 102 -> 103 TRUE
104 104 funktion = 0.488 class_4 103 -> 104 FALSE
103 103 gini <= 0.290 class_4 103 -> 105 TRUE
107 107 Nb_of_verbs = 0.000 class_5 106 -> 107 FALSE
106 106 gini <= 0.094 class_5 106 -> 108 TRUE
110 110 signal = 0.000 class_4 109 -> 110 FALSE
109 109 gini <= 0.320 class_4 109 -> 111 TRUE
112 112 Flesch_Index <= 0.627 class_1 112 -> 113 TRUE
115 115 gini = 0.000 class_3 112 -> 116 FALSE
114 114 Nb_of_auxiliary = 0.000 class_1 113 -> 114 FALSE
113 113 gini <= 0.714 class_1 113 -> 115 TRUE
.. ... ... ... ... ... ... ...
这让我感到困惑,因为有时它确实使顺序保持在右侧数字之间,但是大多数时候却不在。
It is confusing me as sometimes it does keep the order between the right hand side numbers however most off the time it doesn't.
我以为col的方向可能是string类型,所以可能对字符串进行排序是一个问题。因此,我尝试执行以下操作:
I thought that maybe it was a problem with sorting strings as the direction col is of type string. So I tried to do the following:
adjusted_RFC_df['key'] = adjusted_RFC_df['key'].astype(np.int64)
但这会导致以下错误:
ValueError: invalid literal for int() with base 10: 'NA'
因此,似乎正在尝试将['TRUE / FALSE']列转换为int以及仅将 ['key']
列转换为int
So it seems like it is trying to convert the ['TRUE/FALSE'] column to int as well as just the ['key']
column.
Direction col可能是字符串类型的问题吗?
Is it likely a problem with the Direction col being type string?
或者是否有一种方法可以根据'->'之前的第一个数字进行排序,同时确保第二个数字也按顺序排列(从最小到最大) ?
Or is there a way of sorting based on the first number before the '->' whilst ensuring the the second number is also in order (sorted from smallest to biggest)?
推荐答案
如果方向
始终为字符串类型,并且具有格式 int space'->'space int
像 1-> 2
,那么您可以获得另一个排序键
If Direction
is always of type string and has also this format int space '->' space int
like 1 -> 2
then you can get another key to the sorting
df['key1'] = df['Direction'].apply(lambda x: x.split()[0])
df['key2'] = df['Direction'].apply(lambda x: x.split()[2])
然后根据这两个键进行排序
and then sort based on these 2 keys
df.sort_values(['key1', 'key2'])
编辑:
这是获取 key1
和'key2'
df['key1'] = df['Direction'].apply(lambda x: int(x.split('->')[0]))
df['key2'] = df['Direction'].apply(lambda x: int(x.split('->')[1]))
这篇关于排序数据框的行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!