使用Pandas DataFrame将唯一值及其在一个数据框中的出现变为一个新数据框 [英] Get unique values and their occurrence out of one dataframe into a new dataframe using Pandas DataFrame
问题描述
我想将我的数据框的每个列标题下面的值都不同,变成一个数据框,在每个列标题下面的值都不同,并在它们旁边的特定列中出现它们.一个例子:
I want to turn my dataframe with non-distinct values underneath each column header into a dataframe with distinct values underneath each column header with next to it their occurrence in their particular column. An example:
我的初始数据框在下面可见:
My initial dataframe is visible underneath:
A B C D
0 CEN T2 56
2 DECEN T2 45
3 ONBEK T2 84
NaN CEN T1 59
3 NaN T1 87
NaN NaN T2 NaN
0 NaN NaN 98
NaN CEN NaN 23
NaN CEN T1 65
其中A,B,C和D是列标题,其下方各有9个值(包括空格).
where A, B, C and D are the column headers with each 9 values underneath it (blanks included).
我首选的输出数据帧应如下所示:(首先是原始数据帧中每列的唯一值列,然后是它们在该特定列中的出现)
My preferred output dataframe should look like: (first a column of unique values for each column in the original dataframe and next to it their occurrence in that particular column)
A B C D A B C D
0 CEN T2 56 2 4 4 1
2 DECEN T1 45 1 1 3 1
3 ONBEK NaN 84 2 1 NaN 1
Nan NaN NaN 59 NaN NaN NaN 1
NaN NaN NaN 87 NaN NaN NaN 1
NaN NaN NaN 98 NaN NaN NaN 1
NaN NaN NaN 23 NaN NaN NaN 1
NaN NaN NaN 65 NaN NaN NaN 1
其中A,B,C和D是列标题,在它们的下方,首先是原始.csv文件中各列的不同值,其次是每个元素在其特定列中的出现.
where A, B, C and D are the column headers with underneath them first the distinct values for each column from the original .csv-file and next to it the occurence of each element in their particular column.
有人有想法吗?
下面的代码用于将每一列中的唯一值获取到一个新的数据框中.我尝试使用.value_counts进行操作以获取每一列中的出现,但在那里我无法再次使用唯一值将其放入一个数据帧中..
The code below is used to get the unique values out of each column into a new dataframe. I tried to do something with .value_counts to get the occurrence in each column but there I failed to get it into one dataframe again with the unique values..
df
new_df=pd.concat([pd.Series(df[i].unique()) for i in df.columns], axis=1)
new_df.columns=df.columns
new_df
推荐答案
困难的部分是保持每一行中列的值对齐.为此,您需要从unique
构造一个新的数据框,并在pd.concat
上将value_counts
映射到该新数据框的每一列.
The difficult part is keeping values of columns in each row aligned. To do this, you need to construct a new dataframe from unique
, and pd.concat
on with value_counts
map to each column of this new dataframe.
new_df = (pd.DataFrame([df[c].unique() for c in df], index=df.columns).T
.dropna(how='all'))
df_final = pd.concat([new_df, *[new_df[c].map(df[c].value_counts()).rename(f'{c}_Count')
for c in df]], axis=1).reset_index(drop=True)
Out[1580]:
A B C D A_Count B_Count C_Count D_Count
0 0 CEN T2 56 2.0 4.0 4.0 1
1 2 DECEN T1 45 1.0 1.0 3.0 1
2 3 ONBEK NaN 84 2.0 1.0 NaN 1
3 NaN NaN NaN 59 NaN NaN NaN 1
4 NaN NaN NaN 87 NaN NaN NaN 1
5 NaN NaN NaN 98 NaN NaN NaN 1
6 NaN NaN NaN 23 NaN NaN NaN 1
7 NaN NaN NaN 65 NaN NaN NaN 1
如果只需要保持每对列及其计数之间的对齐,例如A
-A_Count
,B
-B_Count
...,则只需将value_counts
与reset_index
一些更改轴名称的命令
If you only need to keep alignment between each pair of column and its count such as A
- A_Count
, B
- B_Count
..., it simply just use value_counts
with reset_index
some commands to change axis names
cols = df.columns.tolist() + (df.columns + '_Count').tolist()
new_df = pd.concat([df[col].value_counts(sort=False).rename_axis(col).reset_index(name=f'{col}_Count')
for col in df], axis=1).reindex(new_cols, axis=1)
Out[1501]:
A B C D A_Count B_Count C_Count D_Count
0 0.0 ONBEK T2 56.0 2.0 1.0 4.0 1
1 2.0 CEN T1 45.0 1.0 4.0 3.0 1
2 3.0 DECEN NaN 84.0 2.0 1.0 NaN 1
3 NaN NaN NaN 59.0 NaN NaN NaN 1
4 NaN NaN NaN 87.0 NaN NaN NaN 1
5 NaN NaN NaN 98.0 NaN NaN NaN 1
6 NaN NaN NaN 23.0 NaN NaN NaN 1
7 NaN NaN NaN 65.0 NaN NaN NaN 1
这篇关于使用Pandas DataFrame将唯一值及其在一个数据框中的出现变为一个新数据框的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!