Python#2-跟进中的分组特征矩阵 [英] Grouped Feature Matrix in Python #2- Follow Up

查看:87
本文介绍了Python#2-跟进中的分组特征矩阵的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

与以前没有太大不同.我们可以从示例数据开始:

It's not too different from before. We can start with the sample data:

DataFrame1:

DataFrame1:

Name         No.        Comment    
Bob        2123320     Doesn't Matter   
Joe        2832883     Whatever           
John       2139300     Irrelevant        
Bob        2123320     Something          
John       2234903     Regardless

DataFrame2:

DataFrame2:

Name          No.          Report    
Bob        2123320         Great 
Joe        2832883         Solid           
John       2139300        Awesome        
Bob        2123320         Good          
John       2234903        Perfect

我正在寻找一种制作新的excel文件的方法,如下所示(预期结果):

I am looking for a way to make a new excel file that looks like this (Expected Outcome):

     -----------------------2139300---------------------  2234903----                    
Name Irrelevant Whatever Regardless Awesome Solid Perfect  Irrelevant \
John    1          0       0          1      0     0         0
      --------------------2234903-------------
Name Whatever Regardless Awesome Solid Perfect  
John    0       1         0        0      1

(注意:它不需要具有No.的标题,我只是为了清楚和以后的解释而做).

(Note: It doesn't need to have the head-titles of the No., I just did that for clarity and later explanation).

基本上,我所做的是,与其他人非常相似,将查找每个名称,然后针对每个名称查看它具有多少个不同的编号.然后,它为具有一定数量的不同编号的人员进行选择. 现在,我要查找一组评论"和报告" ({{Irrelevant,Whatever,Regardless}和{Awesome,Solid,Perfect}分别[注:这只是评论/报告的一个子集]),对于这些,我想使用1或0(但仅针对每个否)换句话说,我希望每个编号都有一组"{Irrelevant,Whatever,Regardless}"和"{Awesome,Solid,Perfect}"列,并且对于每个值我都希望有一个1(如果出现在此人身上)该特定编号,如果没有则为0.

Basically what I have done is, very similar to the other, looks for each name, and then for each name it looks to see how many distinct No.'s it has. It then selects for people who have a certain amount of distinct No.'s. Now, I have a set of "Comments" and "Reports" I wish to look for ({Irrelevant, Whatever, Regardless} and {Awesome, Solid, Perfect} respectively [note: this is only a subset of Comments/Reports]) and for these I want to have a 1 or 0 if it appears but only for each No. Put another way, I want for each No. to have a "group" of columns titled {Irrelevant, Whatever, Regardless} and {Awesome, Solid, Perfect} and for each value I want a 1 if it appeared for the person for that Specific No. and a 0 if it didn't.

例如,在此矩阵中,我们仅看到John,因为他是唯一拥有超过1个不同编号的人.在第一组列中,只有无关和真值的值为1,而其余列的值为0,第二列仅适用于小组,无论完美是1分.它所做的是只列出了我想要的所有评论/报告({Irrelevant,无论如何,无论如何)和{Awesome,Solid,Perfect}),然后找出每一个是否出现(1或0).然后,它在新的组"列中为新编号重复了所有所需的注释/报告,并为这个新编号找出了现在出现的注释/报告.

In this matrix, for example, we only see John because he is the only one with more than 1 distinct No. In the first group of columns only Irrelevant and Awesome have values of 1 whereas the rest have 0 and in the second group only Regardless and Perfect will have 1s. What it did was it listed all of my desired Comments/Reports ({Irrelevant, Whatever, Regardless} and {Awesome, Solid, Perfect}) for only one No. and then found out if each appeared or not (1 or 0). It then repeated all the desired Comments/Reports in a new "group" of columns for a new No. and for this new No. found out which Comments/Reports now appeared.

让我知道是否有任何不清楚的地方,我真的感谢您的帮助.

Let me know if anything is unclear and I truly do appreciate your help.

谢谢.

推荐答案

尝试:

df_out = df_out[df_out.groupby(['Name'])['No.'].transform(lambda x: x.nunique() > 1)]\
   .set_index(['Name','No.'])['Comment'].str.get_dummies()\
   .reindex(df_out.Comment, fill_value=0, axis=1)\
   .sum(level=[0,1])\
   .unstack()\
   .swaplevel(0,1,axis=1)\
   .sort_index(1)

print(df_out)

输出:

No.     2139300                                                                \
Comment Awesome Doesn't Matter Good Great Irrelevant Perfect Regardless Solid   
Name                                                                            
John          1              0    0     0          1       0          0     0   

No.                        2234903                                       \
Comment Something Whatever Awesome Doesn't Matter Good Great Irrelevant   
Name                                                                      
John            0        0       0              0    0     0          0   

No.                                                  
Comment Perfect Regardless Solid Something Whatever  
Name                                                 
John          1          1     0         0        0  

​

这篇关于Python#2-跟进中的分组特征矩阵的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆