Panda .loc或.iloc从数据集中选择列 [英] Panda .loc or .iloc to select the columns from a dataset

查看:109
本文介绍了Panda .loc或.iloc从数据集中选择列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我一直在尝试从数据集中为所有行选择一组特定的列.我尝试了以下类似的方法.

I have been trying to select a particular set of columns from a dataset for all the rows. I tried something like below.

train_features = train_df.loc[,[0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]

我想提到的是,所有行都是包含在内的,但只需要编号的列即可. 有没有更好的方法来解决这个问题.

I want to mention that all rows are inclusive but only need the numbered columns. Is there any better way to approach this.

样本数据:

age  job        marital   education    default   housing   loan   equities   contact     duration   campaign   pdays   previous   poutcome   emp.var.rate   cons.price.idx   cons.conf.idx   euribor3m     nr.employed   y
56   housemaid  married   basic.4y     1         1         1      1          0           261        1          999     0          2          1.1            93.994           -36.4           3.299552287   5191          1
37   services   married   high.school  1         0         1      1          0           226        1          999     0          2          1.1            93.994           -36.4           0.743751247   5191          1
56   services   married   high.school  1         1         0      1          0           307        1          999     0          2          1.1            93.994           -36.4           1.28265179    5191          1

我试图忽略数据集中的工作,婚姻,教育和y栏. y列是目标变量.

I'm trying to neglect job, marital, education and y column in my dataset. y column is the target variable.

推荐答案

如果需要按职位选择,请使用

If need select by positions use iloc:

train_features = train_df.iloc[:, [0,4,5,6,7,8,9,10,11,12,13,14,15,16,17,18]]
print (train_features)
   age  default  housing  loan  equities  contact  duration  campaign  pdays  \
0   56        1        1     1         1        0       261         1    999   
1   37        1        0     1         1        0       226         1    999   
2   56        1        1     0         1        0       307         1    999   

   previous  poutcome  emp.var.rate  cons.price.idx  cons.conf.idx  euribor3m  \
0         0         2           1.1          93.994          -36.4   3.299552   
1         0         2           1.1          93.994          -36.4   0.743751   
2         0         2           1.1          93.994          -36.4   1.282652   

   nr.employed  
0         5191  
1         5191  
2         5191  

另一种解决方案是 drop 不必要的列:

Another solution is drop unnecessary columns:

cols= ['job','marital','education','y']
train_features = train_df.drop(cols, axis=1)
print (train_features)
   age  default  housing  loan  equities  contact  duration  campaign  pdays  \
0   56        1        1     1         1        0       261         1    999   
1   37        1        0     1         1        0       226         1    999   
2   56        1        1     0         1        0       307         1    999   

   previous  poutcome  emp.var.rate  cons.price.idx  cons.conf.idx  euribor3m  \
0         0         2           1.1          93.994          -36.4   3.299552   
1         0         2           1.1          93.994          -36.4   0.743751   
2         0         2           1.1          93.994          -36.4   1.282652   

   nr.employed  
0         5191  
1         5191  
2         5191  

这篇关于Panda .loc或.iloc从数据集中选择列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆