在Pandas中为python创建虚拟变量 [英] Creating dummy variables in pandas for python
问题描述
我正在尝试使用python中的pandas从类别变量中创建一系列虚拟变量.我遇到过get_dummies
函数,但是每当我尝试调用它时,都会收到一个错误,提示您未定义名称.
I'm trying to create a series of dummy variables from a categorical variable using pandas in python. I've come across the get_dummies
function, but whenever I try to call it I receive an error that the name is not defined.
创建虚拟变量的任何想法或其他方式将不胜感激.
Any thoughts or other ways to create the dummy variables would be appreciated.
编辑:由于其他人似乎也遇到了这种情况,因此熊猫中的get_dummies
函数现在可以正常使用了.这意味着以下应该起作用:
EDIT: Since others seem to be coming across this, the get_dummies
function in pandas now works perfectly fine. This means the following should work:
import pandas as pd
dummies = pd.get_dummies(df['Category'])
请参见 http://blog.yhathq.com/posts/logistic -regression-and-python.html 了解更多信息.
推荐答案
很难从问题中推断出您要查找的内容,但我的最佳猜测如下.
It's hard to infer what you're looking for from the question, but my best guess is as follows.
如果我们假设您有一个DataFrame,其中某些列为"Category"(类别)并且包含类别的整数(或其他唯一标识符),那么我们可以执行以下操作.
If we assume you have a DataFrame where some column is 'Category' and contains integers (or otherwise unique identifiers) for categories, then we can do the following.
调用DataFrame dfrm
,并假设对于每一行,dfrm['Category']
是1到N之间的整数集中的某个值.然后,
Call the DataFrame dfrm
, and assume that for each row, dfrm['Category']
is some value in the set of integers from 1 to N. Then,
for elem in dfrm['Category'].unique():
dfrm[str(elem)] = dfrm['Category'] == elem
现在,根据该行中的数据是否属于该类别,每个类别都有一个新的指示符列,该列为是/否.
Now there will be a new indicator column for each category that is True/False depending on whether the data in that row are in that category.
如果要控制类别名称,可以制作一个字典,例如
If you want to control the category names, you could make a dictionary, such as
cat_names = {1:'Some_Treatment', 2:'Full_Treatment', 3:'Control'}
for elem in dfrm['Category'].unique():
dfrm[cat_names[elem]] = dfrm['Category'] == elem
导致具有指定名称的列,而不仅仅是类别值的字符串转换.实际上,对于某些类型,str()
可能不会产生任何对您有用的东西.
to result in having columns with specified names, rather than just string conversion of the category values. In fact, for some types, str()
may not produce anything useful for you.
这篇关于在Pandas中为python创建虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!