groupby不识别数字列大 pandas python的问题 [英] groupby issues of not recognizing numeric column pandas python

查看:213
本文介绍了groupby不识别数字列大 pandas python的问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个excel数据,我读入 pd.read_excel

I have an excel data that i read in by pd.read_excel:

Block   Concentration       Name            Replicate
  1                      Array Marker   
  1                      Array Marker   
  1       100.0        Man5GlcNAc2  
  1       33.0         Man5GlcNAc2  
  1       10.0         Man5GlcNAc2  
  1       100.0        Man6GlcNAc2  
  1       33.0         Man6GlcNAc2  
  1        10.0        Man6GlcNAc2  
  1        100.0      Man7GlcNAc2 D1    
  1        33.0       Man7GlcNAc2 D1    
  1        10.0       Man7GlcNAc2 D1    
  1        100.0     Man7GlcNAc2 D3 
  1         33.0    Man7GlcNAc2 D3  
  1         10.0    Man7GlcNAc2 D3  
...
...
  2        100.0    Man8GlcNAc2 D1D3    
  2         33.0    Man8GlcNAc2 D1D3    
  2         10.0    Man8GlcNAc2 D1D3    
  2         100.0   Man9GlcNAc2 
  2        33.0     Man9GlcNAc2 
  2        10.0     Man9GlcNAc2 
...

所需的输出是:

Block   Concentration       Name            Replicate
  1                      Array Marker         1
  1                      Array Marker         2
  1       100.0        Man5GlcNAc2            1
  1       33.0         Man5GlcNAc2            2
  1       10.0         Man5GlcNAc2            3
  1       100.0        Man6GlcNAc2            1
  1       33.0         Man6GlcNAc2            2
  1        10.0        Man6GlcNAc2            3
  1        100.0      Man7GlcNAc2 D1          1
  1        33.0       Man7GlcNAc2 D1          2
  1        10.0       Man7GlcNAc2 D1          3
  1        100.0     Man7GlcNAc2 D3           1
  1         33.0    Man7GlcNAc2 D3            2
  1         10.0    Man7GlcNAc2 D3            3
...
...
  2        100.0    Man8GlcNAc2 D1D3          1
  2         33.0    Man8GlcNAc2 D1D3          2
  2         10.0    Man8GlcNAc2 D1D3          3
  2         100.0   Man9GlcNAc2               1
  2        33.0     Man9GlcNAc2               2
  2        10.0     Man9GlcNAc2               3
...

我的代码是

data["Replicate"] = data.groupby(["Block", "Name", "Concentration"]).cumcount()+1 

我认为有道理,但我得到的输出不是所需的输出,它在下面:

which i think makes sense, but the output i got is not the desired output, it is below:

Block   Concentration       Name            Replicate
  1                      Array Marker         1
  1                      Array Marker         2
  1       100.0        Man5GlcNAc2            1
  1       33.0         Man5GlcNAc2            1
  1       10.0         Man5GlcNAc2            1
  1       100.0        Man6GlcNAc2            1
  1       33.0         Man6GlcNAc2            1
  1        10.0        Man6GlcNAc2            1
  1        100.0      Man7GlcNAc2 D1          1
  1        33.0       Man7GlcNAc2 D1          1
  1        10.0       Man7GlcNAc2 D1          1
  1        100.0     Man7GlcNAc2 D3           1
  1         33.0    Man7GlcNAc2 D3            1
  1         10.0    Man7GlcNAc2 D3            1
...
...
  1        100.0    Man8GlcNAc2 D1D3          1
  1         33.0    Man8GlcNAc2 D1D3          1
  1         10.0    Man8GlcNAc2 D1D3          1
  1         100.0   Man9GlcNAc2               1
  1        33.0     Man9GlcNAc2               1
  1        10.0     Man9GlcNAc2               1
...
  1         100.0   Man5GlcNAc2               2
  1        33.0     Man5GlcNAc2               2
  1        10.0     Man5GlcNAc2               2
 ....

复制列是' 1'直到下一行,我不知道ho当它选择哪些行来分配数字。应该共有3个块|集合|名称组合是相同的,所以我需要分配'1,2,3'到每个分离它们以后,当我使用数据透视表。我已经将'浓度'列列为字符串类型,因此数字不应该是一个问题。

the replicate columns is '1' until later down the rows, and I am not sure how it's picking which rows to assign the numbers to. There should be a total of 3 block|concentration|name combos that are the same, so I need to assign '1,2,3' to each to separate them for later when I use pivot table. I have made the 'concentration' column as string type so being numeric should not be a problem.

推荐答案

而不是函数 cumcount()+ 1 可以使用滚动计数移动窗口= 3

Instead of function cumcount()+1 can be used rolling count with moving window=3:

#groupby and set rolling count from column Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3) 

格式化很奇怪。如果复制数据不成问题,可以通过调用列浓度来修复它,以便在列中标记空格, / code>从文本的开始和结束。

Formatting is very strange. If it isn't problem with copy data to question, you can repair it by casting column Concentration to float and striping white-spaces in column Name from start and end of text.

Block   Concentration       Name            Replicate
  1                      Array Marker   
  1                      Array Marker   
  1       100.0        Man5GlcNAc2  
  1       33.0         Man5GlcNAc2  
  1       10.0         Man5GlcNAc2  
  1       100.0        Man6GlcNAc2  
  1       33.0         Man6GlcNAc2  
  1        10.0        Man6GlcNAc2  
  1        100.0      Man7GlcNAc2 D1    
  1        33.0       Man7GlcNAc2 D1    
  1        10.0       Man7GlcNAc2 D1    
  1        100.0     Man7GlcNAc2 D3 
  1         33.0    Man7GlcNAc2 D3  
  1         10.0    Man7GlcNAc2 D3  





#convert column Concentration to float
data['Concentration'] = data['Concentration'].astype(float)
#strip first and last whitespaces
data['Name'] = data['Name'].str.strip()

#groupby and set rolling count from column Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3) 





    Block Concentration            Name  Replicate
0       1                  Array Marker          1
1       1                  Array Marker          2
2       1           100     Man5GlcNAc2          1
3       1            33     Man5GlcNAc2          2
4       1            10     Man5GlcNAc2          3
5       1           100     Man6GlcNAc2          1
6       1            33     Man6GlcNAc2          2
7       1            10     Man6GlcNAc2          3
8       1           100  Man7GlcNAc2 D1          1
9       1            33  Man7GlcNAc2 D1          2
10      1            10  Man7GlcNAc2 D1          3
11      1           100  Man7GlcNAc2 D3          1
12      1            33  Man7GlcNAc2 D3          2
13      1            10  Man7GlcNAc2 D3          3

这篇关于groupby不识别数字列大 pandas python的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆