使用Pandas DataFrame将唯一值及其在一个数据框中的出现变为一个新数据框 [英] Get unique values and their occurrence out of one dataframe into a new dataframe using Pandas DataFrame

查看:231
本文介绍了使用Pandas DataFrame将唯一值及其在一个数据框中的出现变为一个新数据框的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想将我的数据框的每个列标题下面的值都不同,变成一个数据框,在每个列标题下面的值都不同,并在它们旁边的特定列中出现它们.一个例子:

I want to turn my dataframe with non-distinct values underneath each column header into a dataframe with distinct values underneath each column header with next to it their occurrence in their particular column. An example:

我的初始数据框在下面可见:

My initial dataframe is visible underneath:

A       B       C       D
0       CEN     T2      56
2       DECEN   T2      45
3       ONBEK   T2      84
NaN     CEN     T1      59
3       NaN     T1      87
NaN     NaN     T2      NaN
0       NaN     NaN     98
NaN     CEN     NaN     23
NaN     CEN     T1      65

其中A,B,C和D是列标题,其下方各有9个值(包括空格).

where A, B, C and D are the column headers with each 9 values underneath it (blanks included).

我首选的输出数据帧应如下所示:(首先是原始数据帧中每列的唯一值列,然后是它们在该特定列中的出现)

My preferred output dataframe should look like: (first a column of unique values for each column in the original dataframe and next to it their occurrence in that particular column)

A       B       C       D       A       B       C       D
0       CEN     T2      56      2       4       4       1
2       DECEN   T1      45      1       1       3       1
3       ONBEK   NaN     84      2       1       NaN     1
Nan     NaN     NaN     59      NaN     NaN     NaN     1
NaN     NaN     NaN     87      NaN     NaN     NaN     1
NaN     NaN     NaN     98      NaN     NaN     NaN     1
NaN     NaN     NaN     23      NaN     NaN     NaN     1
NaN     NaN     NaN     65      NaN     NaN     NaN     1

其中A,B,C和D是列标题,在它们的下方,首先是原始.csv文件中各列的不同值,其次是每个元素在其特定列中的出现.

where A, B, C and D are the column headers with underneath them first the distinct values for each column from the original .csv-file and next to it the occurence of each element in their particular column.

有人有想法吗?

下面的代码用于将每一列中的唯一值获取到一个新的数据框中.我尝试使用.value_counts进行操作以获取每一列中的出现,但在那里我无法再次使用唯一值将其放入一个数据帧中..

The code below is used to get the unique values out of each column into a new dataframe. I tried to do something with .value_counts to get the occurrence in each column but there I failed to get it into one dataframe again with the unique values..

df
new_df=pd.concat([pd.Series(df[i].unique()) for i in df.columns], axis=1)
new_df.columns=df.columns
new_df

推荐答案

困难的部分是保持每一行中列的值对齐.为此,您需要从unique构造一个新的数据框,并在pd.concat上将value_counts映射到该新数据框的每一列.

The difficult part is keeping values of columns in each row aligned. To do this, you need to construct a new dataframe from unique, and pd.concat on with value_counts map to each column of this new dataframe.

new_df = (pd.DataFrame([df[c].unique() for c in df], index=df.columns).T
            .dropna(how='all'))

df_final = pd.concat([new_df, *[new_df[c].map(df[c].value_counts()).rename(f'{c}_Count') 
                                   for c in  df]], axis=1).reset_index(drop=True)

Out[1580]:
     A      B    C   D  A_Count  B_Count  C_Count  D_Count
0    0    CEN   T2  56      2.0      4.0      4.0        1
1    2  DECEN   T1  45      1.0      1.0      3.0        1
2    3  ONBEK  NaN  84      2.0      1.0      NaN        1
3  NaN    NaN  NaN  59      NaN      NaN      NaN        1
4  NaN    NaN  NaN  87      NaN      NaN      NaN        1
5  NaN    NaN  NaN  98      NaN      NaN      NaN        1
6  NaN    NaN  NaN  23      NaN      NaN      NaN        1
7  NaN    NaN  NaN  65      NaN      NaN      NaN        1


如果只需要保持每对列及其计数之间的对齐,例如A-A_CountB-B_Count ...,则只需将value_countsreset_index一些更改轴名称的命令


If you only need to keep alignment between each pair of column and its count such as A - A_Count, B - B_Count..., it simply just use value_counts with reset_index some commands to change axis names

cols = df.columns.tolist() + (df.columns + '_Count').tolist()
new_df = pd.concat([df[col].value_counts(sort=False).rename_axis(col).reset_index(name=f'{col}_Count') 
                        for col in df], axis=1).reindex(new_cols, axis=1)

Out[1501]:
     A      B    C     D  A_Count  B_Count  C_Count  D_Count
0  0.0  ONBEK   T2  56.0      2.0      1.0      4.0        1
1  2.0    CEN   T1  45.0      1.0      4.0      3.0        1
2  3.0  DECEN  NaN  84.0      2.0      1.0      NaN        1
3  NaN    NaN  NaN  59.0      NaN      NaN      NaN        1
4  NaN    NaN  NaN  87.0      NaN      NaN      NaN        1
5  NaN    NaN  NaN  98.0      NaN      NaN      NaN        1
6  NaN    NaN  NaN  23.0      NaN      NaN      NaN        1
7  NaN    NaN  NaN  65.0      NaN      NaN      NaN        1

这篇关于使用Pandas DataFrame将唯一值及其在一个数据框中的出现变为一个新数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆