pandas sort_values无法正确对数字进行排序 [英] Pandas sort_values does not sort numbers correctly
问题描述
我是熊猫的新手,正在编程环境中使用表格数据.我已经按特定的列对数据框进行了排序,但是熊猫吐出来的答案并不完全正确.
这是我使用的代码:
league_dataframe.sort_values('overall_league_position')
排序方法在总体联赛排名"列中产生值的结果未按升序或顺序进行排序,这是该方法的默认设置.
我做错了什么?感谢您的耐心等待!
无论出于何种原因,您似乎都在使用一列字符串,并且 sort_values
返回的是按字母顺序排序的结果.>
这是一个例子.
df = pd.DataFrame({"Col":['1','2','3','10','20','19']})df上校0 11 22 33 104 205 19df.sort_values('Col')上校0 13 105 191 24 202 3
补救方法是使用 .astype
或 pd.to_numeric
将其转换为数字.
df.Col = df.Col.astype(float)
或者,
df.Col = pd.to_numeric(df.Col,errors ='coerce')
df.sort_values('Col')上校0 11 22 33 105 194 20
b/w astype
和 pd.to_numeric
的唯一区别是,后者在处理非数字字符串时更健壮(它们被强制为NaN
),并且如果不需要强制浮点(在这种情况下就是如此),则会尝试保留整数.
I'm new to pandas and working with tabular data in a programming environment. I have sorted a dataframe by a specific column but the answer that panda spits out is not exactly correct.
Here is the code I have used:
league_dataframe.sort_values('overall_league_position')
The result that the sort method yields values in column 'overall league position' are not sorted in ascending or order which is the default for the method.
What am I doing wrong? Thanks for your patience!
For whatever reason, you seem to be working with a column of strings, and sort_values
is returning you a lexsorted result.
Here's an example.
df = pd.DataFrame({"Col": ['1', '2', '3', '10', '20', '19']})
df
Col
0 1
1 2
2 3
3 10
4 20
5 19
df.sort_values('Col')
Col
0 1
3 10
5 19
1 2
4 20
2 3
The remedy is to convert it to numeric, either using .astype
or pd.to_numeric
.
df.Col = df.Col.astype(float)
Or,
df.Col = pd.to_numeric(df.Col, errors='coerce')
df.sort_values('Col')
Col
0 1
1 2
2 3
3 10
5 19
4 20
The only difference b/w astype
and pd.to_numeric
is that the latter is more robust at handling non-numeric strings (they're coerced to NaN
), and will attempt to preserve integers if a coercion to float is not necessary (as is seen in this case).
这篇关于 pandas sort_values无法正确对数字进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!