在Pandas中为python创建虚拟变量 [英] Creating dummy variables in pandas for python

查看:564
本文介绍了在Pandas中为python创建虚拟变量的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用python中的pandas从类别变量中创建一系列虚拟变量.我遇到过get_dummies函数,但是每当我尝试调用它时,都会收到一个错误,提示您未定义名称.

I'm trying to create a series of dummy variables from a categorical variable using pandas in python. I've come across the get_dummies function, but whenever I try to call it I receive an error that the name is not defined.

创建虚拟变量的任何想法或其他方式将不胜感激.

Any thoughts or other ways to create the dummy variables would be appreciated.

编辑:由于其他人似乎也遇到了这种情况,因此熊猫中的get_dummies函数现在可以正常使用了.这意味着以下应该起作用:

EDIT: Since others seem to be coming across this, the get_dummies function in pandas now works perfectly fine. This means the following should work:

import pandas as pd

dummies = pd.get_dummies(df['Category'])

请参见 http://blog.yhathq.com/posts/logistic -regression-and-python.html 了解更多信息.

推荐答案

很难从问题中推断出您要查找的内容,但我的最佳猜测如下.

It's hard to infer what you're looking for from the question, but my best guess is as follows.

如果我们假设您有一个DataFrame,其中某些列为"Category"(类别)并且包含类别的整数(或其他唯一标识符),那么我们可以执行以下操作.

If we assume you have a DataFrame where some column is 'Category' and contains integers (or otherwise unique identifiers) for categories, then we can do the following.

调用DataFrame dfrm,并假设对于每一行,dfrm['Category']是1到N之间的整数集中的某个值.然后,

Call the DataFrame dfrm, and assume that for each row, dfrm['Category'] is some value in the set of integers from 1 to N. Then,

for elem in dfrm['Category'].unique():
    dfrm[str(elem)] = dfrm['Category'] == elem

现在,根据该行中的数据是否属于该类别,每个类别都有一个新的指示符列,该列为是/否.

Now there will be a new indicator column for each category that is True/False depending on whether the data in that row are in that category.

如果要控制类别名称,可以制作一个字典,例如

If you want to control the category names, you could make a dictionary, such as

cat_names = {1:'Some_Treatment', 2:'Full_Treatment', 3:'Control'}
for elem in dfrm['Category'].unique():
    dfrm[cat_names[elem]] = dfrm['Category'] == elem

导致具有指定名称的列,而不仅仅是类别值的字符串转换.实际上,对于某些类型,str()可能不会产生任何对您有用的东西.

to result in having columns with specified names, rather than just string conversion of the category values. In fact, for some types, str() may not produce anything useful for you.

这篇关于在Pandas中为python创建虚拟变量的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆