按名称列表对Pandas中的多个列范围进行切片 [英] Slicing multiple ranges of columns in Pandas, by list of names

查看:431
本文介绍了按名称列表对Pandas中的多个列范围进行切片的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试通过两种不同的方法在Pandas数据框中选择多个列:

I am trying to select multiple columns in a Pandas dataframe in two different approaches:

1)通过列号,例如1-3列和6列起.

1)via the columns number, for examples, columns 1-3 and columns 6 onwards.

2)通过列名列表,例如:

2)via a list of column names, for instance:

years = list(range(2000,2017))
months = list(range(1,13))
years_month = list(["A", "B", "B"])
for y in years:
    for m in months:
        y_m = str(y) + "-" + str(m)
        years_month.append(y_m)     

然后, years_month 将产生以下内容:

Then, years_month would produce the following:

['A',
 'B',
 'C',
 '2000-1',
 '2000-2',
 '2000-3',
 '2000-4',
 '2000-5',
 '2000-6',
 '2000-7',
 '2000-8',
 '2000-9',
 '2000-10',
 '2000-11',
 '2000-12',
 '2001-1',
 '2001-2',
 '2001-3',
 '2001-4',
 '2001-5',
 '2001-6',
 '2001-7',
 '2001-8',
 '2001-9',
 '2001-10',
 '2001-11',
 '2001-12']

也就是说,在两种方法中,最好的(或正确的)方式是只加载名称在 years_month 列表中的列?

That said, what is the best(or correct) way to load only the columns in which the names are in the list years_month in the two approaches?

推荐答案

我认为您需要 numpy.r_ 合并列的位置,然后使用

I think you need numpy.r_ for concanecate positions of columns, then use iloc for selecting:

print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])

以及list的第二种方法子集:

and for second approach subset by list:

print (df[years_month])

示例:

df = pd.DataFrame({'2000-1':[1,3,5],
                   '2000-2':[5,3,6],
                   '2000-3':[7,8,9],
                   '2000-4':[1,3,5],
                   '2000-5':[5,3,6],
                   '2000-6':[7,8,9],
                   '2000-7':[1,3,5],
                   '2000-8':[5,3,6],
                   '2000-9':[7,4,3],
                   'A':[1,2,3],
                   'B':[4,5,6],
                   'C':[7,8,9]})

print (df)
   2000-1  2000-2  2000-3  2000-4  2000-5  2000-6  2000-7  2000-8  2000-9  A  \
0       1       5       7       1       5       7       1       5       7  1   
1       3       3       8       3       3       8       3       3       4  2   
2       5       6       9       5       6       9       5       6       3  3   

   B  C  
0  4  7  
1  5  8  
2  6  9  

print (df.iloc[:, np.r_[1:3, 6:len(df.columns)]])
   2000-2  2000-3  2000-7  2000-8  2000-9  A  B  C
0       5       7       1       5       7  1  4  7
1       3       8       3       3       4  2  5  8
2       6       9       5       6       3  3  6  9


您还可以将ranges的总和(必须在python 3中广播到list):


You can also sum of ranges (cast to list in python 3 is necessary):

rng = list(range(1,3)) + list(range(6, len(df.columns)))
print (rng)
[1, 2, 6, 7, 8, 9, 10, 11]

print (df.iloc[:, rng])
   2000-2  2000-3  2000-7  2000-8  2000-9  A  B  C
0       5       7       1       5       7  1  4  7
1       3       8       3       3       4  2  5  8
2       6       9       5       6       3  3  6  9

这篇关于按名称列表对Pandas中的多个列范围进行切片的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆