Python Pandas-将某些列类型更改为类别 [英] Python Pandas - Changing some column types to categories

查看:805
本文介绍了Python Pandas-将某些列类型更改为类别的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我已将以下CSV文件输入到iPython Notebook:

public = pd.read_csv("categories.csv")
public

我还以pd导入了pandas,以np导入了numpy,以plt导入了matplotlib.pyplot.存在以下数据类型(以下是摘要-大约有100列)

In [36]:   public.dtypes
Out[37]:   parks          object
           playgrounds    object
           sports         object
           roading        object               
           resident       int64
           children       int64

我想将公园",游乐场",运动"和道路"更改为类别(它们具有李克特量表响应-尽管每一列都有不同类型的李克特响应(例如,强烈同意" ,同意"等,另一个具有非常重要",重要"等),其余部分保留为int64.

我能够创建一个单独的数据框-public1-并使用以下代码将其中一列更改为类别类型:

public1 = {'parks': public.parks}
public1 = public1['parks'].astype('category')

但是,当我尝试使用此代码一次更改数字时,我没有成功:

public1 = {'parks': public.parks,
           'playgrounds': public.parks}
public1 = public1['parks', 'playgrounds'].astype('category')

尽管如此,我不想仅使用类别列创建单独的数据框.我希望在原始数据框中更改它们.

我尝试了多种方法来实现这一目标,然后在此处尝试了代码: Pandas:更改列的数据类型 ...

public[['parks', 'playgrounds', 'sports', 'roading']] = public[['parks', 'playgrounds', 'sports', 'roading']].astype('category')

,并出现以下错误:

 NotImplementedError: > 1 ndim Categorical are not supported at this time

有没有一种方法可以将公园",游乐场",运动",道路"更改为类别(这样就可以分析李克特量表的回答),而留下居民"和孩子"(以及94其他的字符串,int + floats的列,请保持不变?还是有更好的方法来做到这一点?如果有人有任何建议和/或反馈,我将不胜感激.

非常感谢.

编辑添加-我正在使用Python 2.7.

解决方案

有时,您只需要使用for循环即可:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')

I have fed the following CSV file into iPython Notebook:

public = pd.read_csv("categories.csv")
public

I've also imported pandas as pd, numpy as np and matplotlib.pyplot as plt. The following data types are present (the below is a summary - there are about 100 columns)

In [36]:   public.dtypes
Out[37]:   parks          object
           playgrounds    object
           sports         object
           roading        object               
           resident       int64
           children       int64

I want to change 'parks', 'playgrounds', 'sports' and 'roading' to categories (they have likert scale responses in them - each column has different types of likert responses though (e.g. one has "strongly agree", "agree" etc., another has "very important", "important" etc.), leaving the remainder as int64.

I was able to create a separate dataframe - public1 - and change one of the columns to a category type using the following code:

public1 = {'parks': public.parks}
public1 = public1['parks'].astype('category')

However, when I tried to change a number at once using this code, I was unsuccessful:

public1 = {'parks': public.parks,
           'playgrounds': public.parks}
public1 = public1['parks', 'playgrounds'].astype('category')

Notwithstanding this, I don't want to create a separate dataframe with just the categories columns. I would like them changed in the original dataframe.

I tried numerous ways to achieve this, then tried the code here: Pandas: change data type of columns...

public[['parks', 'playgrounds', 'sports', 'roading']] = public[['parks', 'playgrounds', 'sports', 'roading']].astype('category')

and got the following error:

 NotImplementedError: > 1 ndim Categorical are not supported at this time

Is there a way to change 'parks', 'playgrounds', 'sports', 'roading' to categories (so the likert scale responses can then be analysed), leaving 'resident' and 'children' (and the 94 other columns that are string, int + floats) untouched please? Or, is there a better way to do this? If anyone has any suggestions and/or feedback I would be most grateful....am slowly going bald ripping my hair out!

Many thanks in advance.

edited to add - I am using Python 2.7.

解决方案

Sometimes, you just have to use a for-loop:

for col in ['parks', 'playgrounds', 'sports', 'roading']:
    public[col] = public[col].astype('category')

这篇关于Python Pandas-将某些列类型更改为类别的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆