如何使用python根据一列将整个数据集分为4个范围 [英] How to split the whole dataset into 4 range based on one column using python

查看:1379
本文介绍了如何使用python根据一列将整个数据集分为4个范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个具有 7k条记录的数据集[电信数据集]

我想根据一个特定的列 [任期列] 将数据集分为4个范围包含1到72个数字。

I want to split that dataset into 4 range based on one particular column ["tenure column"], which contains 1 to 72 number.

需要根据以下任期列拆分整个数据:-

Need to split the whole data based on this tenure column like:-


1到18范围[1-数据集],19到36范围[2-数据集],37到54范围[3-数据集],55到72范围[4-数据集]

1 to 18 Range [1-dataset], 19 to 36 Range [2-dataset], 37 to 54 Range [3-dataset], 55 to 72 Range[4-dataset]

我的带有head(5)的样本数据集

My sample dataset with head(5)

out.head(5)
Out[51]: 
   customerID      Date  gender  age  region  SeniorCitizen  Partner  \
0  9796-BPKIW  1/2/2008       1   57       1              1        0   
1  4298-OYIFC  1/4/2008       1   50       2              0        1   
2  9606-PBKBQ  1/6/2008       1   85       0              1        1   
3  1704-NRWYE  1/9/2008       0   55       0              1        0   
4  9758-MFWGD  1/6/2008       0   52       1              1        1   

   Dependents  tenure  PhoneService  ...    DeviceProtection  TechSupport  \
0           0       8             1  ...                   0            0   
1           0      15             1  ...                   1            1   
2           0      32             1  ...                   0            0   
3           0       9             1  ...                   0            0   
4           1      48             0  ...                   0            0   

   StreamingTV  StreamingMovies  Contract  PaperlessBilling  PaymentMethod  \
0            0                0         0                 1              1   
1            1                1         0                 1              2   
2            0                1         0                 1              2   
3            1                0         0                 1              2   
4            0                0         1                 0              0   

   MonthlyCharges  TotalCharges  Churn  
0           69.95        562.70      0  
1          103.45       1539.80      0  
2           85.00       2642.05      1  
3           80.85        751.65      1  
4           29.90       1388.75      0  


推荐答案

使用熊猫轻松地完成此任务。

Use pandas to easily do this thing.

import pandas as pd

df = pd.read_csv('your_dataset_file.csv', sep=',', header=0)
# Sort it according to tenure
df.sort_values(by=['tenure'], inplace=True)
# Create bin edges 
step_size = int(df.tenure.max()/4)
bin_edges = list(range(0,df.tenure.max()+step_size, step_size))
lbls = ['a','b','c','d']
df['bin'] = pd.cut(df.tenure,bin_edges, labels= lbls)
# Create separate dataframes from it
df1 = df[df.bin == 'a']
df2 = df[df.bin == 'b']
df3 = df[df.bin == 'c']
df4 = df[df.bin == 'd']

这篇关于如何使用python根据一列将整个数据集分为4个范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆