按一列分组并显示来自另一列的特定值的可用性 [英] Group by one column and show the availability of specific values from another column
问题描述
我有这个数据框:
df1:
drug_id疾病
lexapro。 1 HD
lexapro.1 MS
lexapro.2 HDED
lexapro.2 MS
lexapro.2 MS
lexapro.3 CD
lexapro.3汗水
lexapro.4 HD
lexapro.5 WD
lexapro.5 FN
我将首先根据drug_id对数据进行分组,然后在疾病列中搜索HD,MS和FN的可用性。然后像这样填写第二个数据框:
df2:
drug_id HD MS FN
lexapro。 1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
这是我的分组代码。
df1.groupby('drug_id',sort = False).isin('HD')
但我不知道如何将1分配给 F2 ['HD'] $ c如果
'HD'
可用于中的
。 drug_id
,则为每个drug_id $ c> df1
谢谢。
选项1
交叉表
pd.crosstab(df.drug_id,df.illness)[['HD','MS','FN']]。ge(1).astype(int)
疾病HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
选项2
groupby
+ value_counts
+ sterack
df.groupby('drug_id')。illness.value_counts()\
.unstac ge(1).astype(int)
疾病HD MS FN
drug_id
lexapro(1)k()[['HD','MS','FN']]。 .1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
选项3
get_dummies
+ sum
df.set_index('drug_id')。illness.str.get_dummies()\
.sum(level = 0)[['HD','MS','FN']]。ge (1).astype(int)
HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro。 3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
感谢斯科特波士顿的改进!
I have this dataframe:
df1:
drug_id illness
lexapro.1 HD
lexapro.1 MS
lexapro.2 HDED
lexapro.2 MS
lexapro.2 MS
lexapro.3 CD
lexapro.3 Sweat
lexapro.4 HD
lexapro.5 WD
lexapro.5 FN
I am going to first group the data based on drug_id, and search for availability of HD, MS, and FN in the illness column. Then fill in the second data frame like this:
df2:
drug_id HD MS FN
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
This is my code for grouping.
df1.groupby('drug_id', sort=False).isin('HD')
but I do not know how I can assign 1 to the F2['HD']
for each drug_id, if the 'HD'
was available for that drug_id
in df1
.
Thank you.
Option 1
crosstab
pd.crosstab(df.drug_id, df.illness)[['HD', 'MS', 'FN']].ge(1).astype(int)
illness HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
Option 2
groupby
+ value_counts
+ unstack
df.groupby('drug_id').illness.value_counts()\
.unstack()[['HD', 'MS', 'FN']].ge(1).astype(int)
illness HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
Option 3
get_dummies
+ sum
df.set_index('drug_id').illness.str.get_dummies()\
.sum(level=0)[['HD', 'MS', 'FN']].ge(1).astype(int)
HD MS FN
drug_id
lexapro.1 1 1 0
lexapro.2 0 1 0
lexapro.3 0 0 0
lexapro.4 1 0 0
lexapro.5 0 0 1
Thanks to Scott Boston for the improvement!
这篇关于按一列分组并显示来自另一列的特定值的可用性的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!