如何将数字数据映射到Pandas数据框中的类别/箱中 [英] How to map numeric data into categories / bins in Pandas dataframe

查看：189 发布时间：2020/5/18 18:34:03 python python-2.7 pandas numpy dataframe

本文介绍了如何将数字数据映射到Pandas数据框中的类别/箱中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我刚刚开始使用python进行编码，我的一般编码技能还很生锈:(所以请耐心等待

I've just started coding in python, and my general coding skills are fairly rusty :( so please be a bit patient

我有一个熊猫数据框:

它大约有300万行.年龄单位分为3种:Y，D，W(年)，Days&周.任何1岁以上的人的年龄单位均为Y，而我想要的第一个分组年龄为< 2岁，因此我要测试的所有年龄单位均为Y ...

It has around 3m rows. There are 3 kinds of age_units: Y, D, W for years, Days & Weeks. Any individual over 1 year old has an age unit of Y and my first grouping I want is <2y old so all I have to test for in Age Units is Y...

我想创建一个新列AgeRange并填充以下范围:

I want to create a new column AgeRange and populate with the following ranges:

< 2
2-18
18-35
35-65
65岁以上

所以我写了一个函数

def agerange(values):
    for i in values:
        if complete.Age_units == 'Y':
            if complete.Age > 1 AND < 18 return '2-18'
            elif complete.Age > 17 AND < 35 return '18-35'
            elif complete.Age > 34 AND < 65 return '35-65'
            elif complete.Age > 64 return '65+'
        else return '< 2'

我想，如果我将数据帧作为一个整体传递，我会得到我需要的东西，然后可以创建我想要的列:

I thought if I passed in the dataframe as a whole I would get back what I needed and then could create the column I wanted something like this:

agedetails['age_range'] = ageRange(agedetails)

但是，当我尝试运行第一个代码来创建我得到的功能时:

BUT when I try to run the first code to create the function I get:

  File "<ipython-input-124-cf39c7ce66d9>", line 4
    if complete.Age > 1 AND complete.Age < 18 return '2-18'
                          ^
SyntaxError: invalid syntax

很明显，它不接受AND-但我想我在课堂上听说可以使用AND了吗?我一定会弄错，但是这样做的正确方法是什么?

Clearly it is not accepting the AND - but I thought I heard in class I could use AND like this? I must be mistaken but then what would be the right way to do this?

因此，在收到该错误之后，我什至不确定传入数据帧的方法是否也会引发错误.我猜大概是.在这种情况下-我也将如何使它正常工作?

So after getting that error, I'm not even sure the method of passing in a dataframe will throw an error either. I am guessing probably yes. In which case - how would I make that work as well?

我希望学习最好的方法，但是对我来说，最好的方法的一部分就是保持简单，即使那意味着要分几步做事...

I am looking to learn the best method, but part of the best method for me is keeping it simple even if that means doing things in a couple of steps...

NumPy: `np.digitize`

np.digitize提供了另一种干净的解决方案.想法是定义边界和名称，创建字典，然后将np.digitize应用于年龄"列.最后，使用您的字典来映射类别名称.

NumPy: `np.digitize`

np.digitize provides another clean solution. The idea is to define your boundaries and names, create a dictionary, then apply np.digitize to your Age column. Finally, use your dictionary to map your category names.

请注意，在边界情况下，下限用于映射到bin.

Note that for boundary cases the lower bound is used for mapping to a bin.

import pandas as pd, numpy as np

df = pd.DataFrame({'Age': [99, 53, 71, 84, 84],
                   'Age_units': ['Y', 'Y', 'Y', 'Y', 'Y']})

bins = [0, 2, 18, 35, 65]
names = ['<2', '2-18', '18-35', '35-65', '65+']

d = dict(enumerate(names, 1))

df['AgeRange'] = np.vectorize(d.get)(np.digitize(df['Age'], bins))

结果

   Age Age_units AgeRange
0   99         Y      65+
1   53         Y    35-65
2   71         Y      65+
3   84         Y      65+
4   84         Y      65+

这篇关于如何将数字数据映射到Pandas数据框中的类别/箱中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将数字数据映射到Pandas数据框中的类别/箱中 [英] How to map numeric data into categories / bins in Pandas dataframe

问题描述

推荐答案

NumPy: `np.digitize`

NumPy: `np.digitize`

结果

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将数字数据映射到Pandas数据框中的类别/箱中 [英] How to map numeric data into categories / bins in Pandas dataframe

问题描述

推荐答案

NumPy: np.digitize

NumPy: np.digitize

结果

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

NumPy: `np.digitize`

NumPy: `np.digitize`

登录关闭