大 pandas :分别对每一列进行排序 [英] pandas: sort each column individually

查看:93
本文介绍了大 pandas :分别对每一列进行排序的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

  d = {'Col_1':pd.Series([ 'A','B']),
'Col_2':pd.Series(['B','A','C']),
'Col_3':pd.Series 'B','A']),
'Col_4':pd.Series(['C','A','B','D']),
'Col_5':pd .Series(['A','C']),}
df = pd.DataFrame(d)

Col_1 Col_2 Col_3 Col_4 Col_5
ABBCA
BAAAC
NaN C NaN B NaN
NaN NaN NaN D NaN

首先,我我试图分别对每一列进行排序。我试过玩过如下的东西: df.sort([lambda x:x in df.columns],axis = 1,ascending = True,inplace = True)然而,最终只有错误。如何单独排序每一列,最终得到以下结果:

  Col_1 Col_2 Col_3 Col_4 Col_5 
AAAAA
BBBBC
NaN C NaN C NaN
NaN NaN NaN D NaN

其次,我希望连接列中的行

  df = pd.concat([df,pd.DataFrame( df.sum(axis = 0),columns = ['Concatenation'])。T])

我可以将所有内容与上面的代码结合起来,用替换np.nan,但是结果会一起砸碎('AB'),需要一个额外的步骤来清理(像A:B)。 p>

解决方案

这是一种方式:

 >>> pandas.concat([df [col] .order().dd]中的col的reset_index(drop = True),axis = 1,ignore_index = True)
11:0 1 2 3 4
0 AAAAA
1 BBBBC
2 NaN C NaN C NaN
3 NaN NaN NaN D NaN

[4行x 5列]

然而,你在做什么有点奇怪。 DataFrames不仅仅是不相关列的集合。在DataFrame中,每个表示记录,因此一列中的值语义链接到同一行中其他列中的值。通过独立排序列,您将丢弃此信息,因此行现在无意义。这就是我的例子中需要 reset_index 的原因。另外,正因为如此,没有办法这样做,你的例子表明你想要的。


My dataframe looks something like this, only much larger.

d = {'Col_1' : pd.Series(['A', 'B']),
 'Col_2' : pd.Series(['B', 'A', 'C']),
 'Col_3' : pd.Series(['B', 'A']),
 'Col_4' : pd.Series(['C', 'A', 'B', 'D']),
 'Col_5' : pd.Series(['A', 'C']),}
df = pd.DataFrame(d)

Col_1  Col_2  Col_3  Col_4  Col_5
  A      B      B      C      A
  B      A      A      A      C
  NaN    C      NaN    B      NaN
  NaN    NaN    NaN    D      NaN

First, I'm trying to sort each column individually. I've tried playing around with something like: df.sort([lambda x: x in df.columns], axis=1, ascending=True, inplace=True) however have only ended up with errors. How do I sort each column individually to end up with something like:

Col_1  Col_2  Col_3  Col_4  Col_5
  A      A      A      A      A
  B      B      B      B      C
  NaN    C      NaN    C      NaN
  NaN    NaN    NaN    D      NaN

Second, I'm looking to concatenate the rows within the columns

 df = pd.concat([df,pd.DataFrame(df.sum(axis=0),columns=['Concatenation']).T])

I can combine everything with the line above after replacing np.nan with '', but the result comes out smashed ('AB') together and would require an additional step to clean (into something like 'A:B').

解决方案

Here is one way:

>>> pandas.concat([df[col].order().reset_index(drop=True) for col in df], axis=1, ignore_index=True)
11:      0    1    2  3    4
0    A    A    A  A    A
1    B    B    B  B    C
2  NaN    C  NaN  C  NaN
3  NaN  NaN  NaN  D  NaN

[4 rows x 5 columns]

However, what you're doing is somewhat strange. DataFrames aren't just collections of unrelated columns. In a DataFrame, each row represents a record, so the value in one column is semantically linked to the values in other columns in that same row. By sorting the columns independently, you're discarding this information, so the rows are now meaningless. That's why the reset_index is needed in my example. Also, because of this, there's no way to do this in-place, which your example suggests you want.

这篇关于大 pandas :分别对每一列进行排序的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆