将多个列值更改为二进制值 [英] Changing multiple column values to binary values
问题描述
我之前曾问过这个问题,但是我得到的答案并没有像我想的那样得出来,所以我就在这里.
I've asked this question before but the answer I got didn't quite work out as I thought it had, so that here I am.
上一个问题:我正在尝试定义一个函数,在该函数中将使用一个数据框并更改列中的值以创建多个新数据框.
I am trying to define a function where it will take a dataframe and change values in a column to create multiple new dataframes.
以df1为例,如下所示:
As an example, from df1 looking like:
df1:
class colB colC
0 1 1b 1c
1 2 2b 2c
2 3 3b 3c
3 1 4b 4c
4 2 5b 5c
我正在尝试创建多个二进制类来实现一对一分类.因此该函数将创建...
I am trying to create multiple binary classes to implement one-vs-all classification. So the function would create...
df2:
class colB colC
0 1 1b 1c
1 -1 2b 2c
2 -1 3b 3c
3 1 4b 4c
4 -1 5b 5c
df3:
class colB colC
0 -1 1b 1c
1 1 2b 2c
2 -1 3b 3c
3 -1 4b 4c
4 1 5b 5c
df4:
class colB colC
0 -1 1b 1c
1 -1 2b 2c
2 1 3b 3c
3 -1 4b 4c
4 -1 5b 5c
,依此类推.所有唯一值都是1到120之间的增量值.
and so on. All the unique values are an incremental value ranging from 1 to 120.
以前的答案给出的问题(np.identity)是它创建了将每个单个值都设为1或-1的数据帧,而不是将相同的值归为同一类.
The problem with the previous answer give (np.identity) was that it created dataframes taking every single value as either 1 or -1 instead of categorizing identical values as the same class accordingly.
谢谢
推荐答案
使用np.where
和unique
的类似想法(再次重命名class
列,因此它不会覆盖内置名称):
A similar idea using np.where
and unique
(again renaming your class
column so it doesn't override a builtin name):
dfs = [
df1.assign(class_=np.where(df1['class_'].eq(i), 1, -1)) for i in df1['class_'].unique()
]
for d in dfs:
print(d, end='\n\n')
class_ colB colC
0 1 1b 1c
1 -1 2b 2c
2 -1 3b 3c
3 1 4b 4c
4 -1 5b 5c
class_ colB colC
0 -1 1b 1c
1 1 2b 2c
2 -1 3b 3c
3 -1 4b 4c
4 1 5b 5c
class_ colB colC
0 -1 1b 1c
1 -1 2b 2c
2 1 3b 3c
3 -1 4b 4c
4 -1 5b 5c
这篇关于将多个列值更改为二进制值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!