groupby不识别数字列大 pandas python的问题 [英] groupby issues of not recognizing numeric column pandas python
问题描述
我有一个excel数据,我读入 pd.read_excel
:
I have an excel data that i read in by pd.read_excel
:
Block Concentration Name Replicate
1 Array Marker
1 Array Marker
1 100.0 Man5GlcNAc2
1 33.0 Man5GlcNAc2
1 10.0 Man5GlcNAc2
1 100.0 Man6GlcNAc2
1 33.0 Man6GlcNAc2
1 10.0 Man6GlcNAc2
1 100.0 Man7GlcNAc2 D1
1 33.0 Man7GlcNAc2 D1
1 10.0 Man7GlcNAc2 D1
1 100.0 Man7GlcNAc2 D3
1 33.0 Man7GlcNAc2 D3
1 10.0 Man7GlcNAc2 D3
...
...
2 100.0 Man8GlcNAc2 D1D3
2 33.0 Man8GlcNAc2 D1D3
2 10.0 Man8GlcNAc2 D1D3
2 100.0 Man9GlcNAc2
2 33.0 Man9GlcNAc2
2 10.0 Man9GlcNAc2
...
所需的输出是:
Block Concentration Name Replicate
1 Array Marker 1
1 Array Marker 2
1 100.0 Man5GlcNAc2 1
1 33.0 Man5GlcNAc2 2
1 10.0 Man5GlcNAc2 3
1 100.0 Man6GlcNAc2 1
1 33.0 Man6GlcNAc2 2
1 10.0 Man6GlcNAc2 3
1 100.0 Man7GlcNAc2 D1 1
1 33.0 Man7GlcNAc2 D1 2
1 10.0 Man7GlcNAc2 D1 3
1 100.0 Man7GlcNAc2 D3 1
1 33.0 Man7GlcNAc2 D3 2
1 10.0 Man7GlcNAc2 D3 3
...
...
2 100.0 Man8GlcNAc2 D1D3 1
2 33.0 Man8GlcNAc2 D1D3 2
2 10.0 Man8GlcNAc2 D1D3 3
2 100.0 Man9GlcNAc2 1
2 33.0 Man9GlcNAc2 2
2 10.0 Man9GlcNAc2 3
...
我的代码是
data["Replicate"] = data.groupby(["Block", "Name", "Concentration"]).cumcount()+1
我认为有道理,但我得到的输出不是所需的输出,它在下面:
which i think makes sense, but the output i got is not the desired output, it is below:
Block Concentration Name Replicate
1 Array Marker 1
1 Array Marker 2
1 100.0 Man5GlcNAc2 1
1 33.0 Man5GlcNAc2 1
1 10.0 Man5GlcNAc2 1
1 100.0 Man6GlcNAc2 1
1 33.0 Man6GlcNAc2 1
1 10.0 Man6GlcNAc2 1
1 100.0 Man7GlcNAc2 D1 1
1 33.0 Man7GlcNAc2 D1 1
1 10.0 Man7GlcNAc2 D1 1
1 100.0 Man7GlcNAc2 D3 1
1 33.0 Man7GlcNAc2 D3 1
1 10.0 Man7GlcNAc2 D3 1
...
...
1 100.0 Man8GlcNAc2 D1D3 1
1 33.0 Man8GlcNAc2 D1D3 1
1 10.0 Man8GlcNAc2 D1D3 1
1 100.0 Man9GlcNAc2 1
1 33.0 Man9GlcNAc2 1
1 10.0 Man9GlcNAc2 1
...
1 100.0 Man5GlcNAc2 2
1 33.0 Man5GlcNAc2 2
1 10.0 Man5GlcNAc2 2
....
复制列是' 1'直到下一行,我不知道ho当它选择哪些行来分配数字。应该共有3个块|集合|名称组合是相同的,所以我需要分配'1,2,3'到每个分离它们以后,当我使用数据透视表。我已经将'浓度'列列为字符串类型,因此数字不应该是一个问题。
the replicate columns is '1' until later down the rows, and I am not sure how it's picking which rows to assign the numbers to. There should be a total of 3 block|concentration|name combos that are the same, so I need to assign '1,2,3' to each to separate them for later when I use pivot table. I have made the 'concentration' column as string type so being numeric should not be a problem.
推荐答案
而不是函数 cumcount()+ 1
可以使用滚动计数与移动窗口= 3
:
Instead of function cumcount()+1
can be used rolling count with moving window=3
:
#groupby and set rolling count from column Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3)
格式化很奇怪。如果复制数据不成问题,可以通过调用列浓度
来修复它,以便在列中标记空格, / code>从文本的开始和结束。
Formatting is very strange. If it isn't problem with copy data to question, you can repair it by casting column Concentration
to float and striping white-spaces in column Name
from start and end of text.
Block Concentration Name Replicate
1 Array Marker
1 Array Marker
1 100.0 Man5GlcNAc2
1 33.0 Man5GlcNAc2
1 10.0 Man5GlcNAc2
1 100.0 Man6GlcNAc2
1 33.0 Man6GlcNAc2
1 10.0 Man6GlcNAc2
1 100.0 Man7GlcNAc2 D1
1 33.0 Man7GlcNAc2 D1
1 10.0 Man7GlcNAc2 D1
1 100.0 Man7GlcNAc2 D3
1 33.0 Man7GlcNAc2 D3
1 10.0 Man7GlcNAc2 D3
#convert column Concentration to float
data['Concentration'] = data['Concentration'].astype(float)
#strip first and last whitespaces
data['Name'] = data['Name'].str.strip()
#groupby and set rolling count from column Block
data["Replicate"] = data.groupby(["Block", "Name"])["Block"].transform(pd.rolling_count, window=3)
Block Concentration Name Replicate
0 1 Array Marker 1
1 1 Array Marker 2
2 1 100 Man5GlcNAc2 1
3 1 33 Man5GlcNAc2 2
4 1 10 Man5GlcNAc2 3
5 1 100 Man6GlcNAc2 1
6 1 33 Man6GlcNAc2 2
7 1 10 Man6GlcNAc2 3
8 1 100 Man7GlcNAc2 D1 1
9 1 33 Man7GlcNAc2 D1 2
10 1 10 Man7GlcNAc2 D1 3
11 1 100 Man7GlcNAc2 D3 1
12 1 33 Man7GlcNAc2 D3 2
13 1 10 Man7GlcNAc2 D3 3
这篇关于groupby不识别数字列大 pandas python的问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!