在Pandas DF中取消虚拟变量的最有效方法 [英] Most efficient way to un-dummy variables in Pandas DF
本文介绍了在Pandas DF中取消虚拟变量的最有效方法的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
因此在下面的屏幕截图中,我们有3个不同的能量站点,ID01,ID18和ID31.它们的格式为虚拟变量类型,出于可视化的目的,我只想创建一个我可以使用的名为站点"的列.您会看到我快速创建的循环,但这似乎效率很低.关于如何以最快的方式实现这一目标的任何指示?
So in the screenshot below, we have 3 different energy sites, ID01, ID18, and ID31. They're in a dummy variable type of format, and for visualization purposes I want to just create a column named 'Sites' that I can use. You'll see the loop I quickly made to do this, but it seems super inefficient. Any pointers on how to achieve this in the fastest way possible?
推荐答案
设置
data = pd.DataFrame([
[1, 0, 0],
[0, 1, 0],
[0, 0, 1],
[1, 0, 0],
[0, 1, 0]
], columns=['ID01', 'ID18', 'ID31']).assign(A=1, B=2)
data
ID01 ID18 ID31 A B
0 1 0 0 1 2
1 0 1 0 1 2
2 0 0 1 1 2
3 1 0 0 1 2
4 0 1 0 1 2
dot
具有字符串和对象的产品.
如果这些是真正的虚拟值0
或1
dot
product with strings and objects.
This works if these are truly dummy values 0
or 1
def undummy(d):
return d.dot(d.columns)
data.assign(Site=data.filter(regex='^ID').pipe(undummy))
ID01 ID18 ID31 A B Site
0 1 0 0 1 2 ID01
1 0 1 0 1 2 ID18
2 0 0 1 1 2 ID31
3 1 0 0 1 2 ID01
4 0 1 0 1 2 ID18
argmax
切片
这可以工作,但是如果数据与问题不一样,则会产生意外结果.
argmax
slicing
This works but can produce unexpected results if data is not as represented in question.
def undummy(d):
return d.columns[d.values.argmax(1)]
data.assign(Site=data.filter(regex='^ID').pipe(undummy))
ID01 ID18 ID31 A B Site
0 1 0 0 1 2 ID01
1 0 1 0 1 2 ID18
2 0 0 1 1 2 ID31
3 1 0 0 1 2 ID01
4 0 1 0 1 2 ID18
这篇关于在Pandas DF中取消虚拟变量的最有效方法的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文