如何使用python根据一列将整个数据集分为4个范围 [英] How to split the whole dataset into 4 range based on one column using python
本文介绍了如何使用python根据一列将整个数据集分为4个范围的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我有一个具有 7k条记录的数据集[电信数据集]
。
我想根据一个特定的列 [任期列]
将数据集分为4个范围包含1到72个数字。
I want to split that dataset into 4 range based on one particular column ["tenure column"]
, which contains 1 to 72 number.
需要根据以下任期列拆分整个数据:-
Need to split the whole data based on this tenure column like:-
1到18范围[1-数据集],19到36范围[2-数据集],37到54范围[3-数据集],55到72范围[4-数据集]
1 to 18 Range [1-dataset], 19 to 36 Range [2-dataset], 37 to 54 Range [3-dataset], 55 to 72 Range[4-dataset]
我的带有head(5)的样本数据集
My sample dataset with head(5)
out.head(5)
Out[51]:
customerID Date gender age region SeniorCitizen Partner \
0 9796-BPKIW 1/2/2008 1 57 1 1 0
1 4298-OYIFC 1/4/2008 1 50 2 0 1
2 9606-PBKBQ 1/6/2008 1 85 0 1 1
3 1704-NRWYE 1/9/2008 0 55 0 1 0
4 9758-MFWGD 1/6/2008 0 52 1 1 1
Dependents tenure PhoneService ... DeviceProtection TechSupport \
0 0 8 1 ... 0 0
1 0 15 1 ... 1 1
2 0 32 1 ... 0 0
3 0 9 1 ... 0 0
4 1 48 0 ... 0 0
StreamingTV StreamingMovies Contract PaperlessBilling PaymentMethod \
0 0 0 0 1 1
1 1 1 0 1 2
2 0 1 0 1 2
3 1 0 0 1 2
4 0 0 1 0 0
MonthlyCharges TotalCharges Churn
0 69.95 562.70 0
1 103.45 1539.80 0
2 85.00 2642.05 1
3 80.85 751.65 1
4 29.90 1388.75 0
推荐答案
使用熊猫轻松地完成此任务。
Use pandas to easily do this thing.
import pandas as pd
df = pd.read_csv('your_dataset_file.csv', sep=',', header=0)
# Sort it according to tenure
df.sort_values(by=['tenure'], inplace=True)
# Create bin edges
step_size = int(df.tenure.max()/4)
bin_edges = list(range(0,df.tenure.max()+step_size, step_size))
lbls = ['a','b','c','d']
df['bin'] = pd.cut(df.tenure,bin_edges, labels= lbls)
# Create separate dataframes from it
df1 = df[df.bin == 'a']
df2 = df[df.bin == 'b']
df3 = df[df.bin == 'c']
df4 = df[df.bin == 'd']
这篇关于如何使用python根据一列将整个数据集分为4个范围的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文