如何根据数据帧的名称而不是索引来选择列的范围? [英] How to select range of columns in a dataframe based on their name and not their indexes?
问题描述
将pandas作为pd
导入numpy作为np
df = pd.DataFrame(np.random.randint(10,size =(6,6)),
columns = ['c'+ str(i)for i in range(6) ],
index = [r+ str(i)for i in range(6)])
可以看起来如下:
c0 c1 c2 c3 c4 c5
r0 2 7 3 3 2 8
r1 6 9 6 7 9 1
r2 4 0 9 8 4 2
r3 9 0 4 3 5 4
r4 7 6 8 8 0 8
r5 0 6 1 8 2 2
我可以轻松地选择某些行和/或一系列列 .loc
:
print df.loc [['r1' 'r5'],'c1':'c4']
那将返回:
c1 c2 c3 c4
r1 9 6 7 9
r5 6 1 8 2
所以,我可以在列表中选择特定的行/列,使用冒号的行/列范围。
如何在R中执行此操作? 此处和 here 一般必须通过索引指定列的期望范围,但不能 - 或在至少我没有找到它 - 按名称访问这些。举个例子:
df< - data.frame(c1 = 1:6,c2 = 2:7,c3 = 3:8,c4 = 4:9,c5 = 5:10,c6 = 6:11)
rownames(df)< - c('r1','r2','r3','r4 ','r5','r6')
该命令
df [c('r1','r5'),'c1':'c4']
不起作用,并引发错误。对我来说唯一有用的是
df [c('r1','r5'),1:4]
其中返回
c1 c2 c3 c4
r1 1 2 3 4
r5 5 6 7 8
但是我如何按名称选择列,而不是按索引选择列(这在分析中删除某些列时可能很重要)?在这种特殊情况下,我当然可以使用 grep
,但是具有任意名称的列如何?
所以我不想使用
df [c('r1','r5'),c('c1','c2 ','c3','c4')]
但是一个实际的切片。
编辑:
可以找到后续问题 here 。
看起来你可以用子集
完成这个操作:
> df< - data.frame(c1 = 1:6,c2 = 2:7,c3 = 3:8,c4 = 4:9,c5 = 5:10,c6 = 6:11)
& rownames(df)< - c('r1','r2','r3','r4','r5','r6')
>子集(df,select = c1:c4)
c1 c2 c3 c4
r1 1 2 3 4
r2 2 3 4 5
r3 3 4 5 6
r4 4 5 6 7
r5 5 6 7 8
r6 6 7 8 9
>子集(df,select = c1:c2)
c1 c2
r1 1 2
r2 2 3
r3 3 4
r4 4 5
r5 5 6
r6 6 7
如果你想按行名范围排列,这个黑客会do:
> gRI< - function(df,rName){which(match(rNames,rName)== 1)}
> df [gRI(df,r2):gRI(df,r4),]
c1 c2 c3 c4 c5 c6
r2 2 3 4 5 6 7
r3 3 4 5 6 7 8
r4 4 5 6 7 8 9
In a pandas dataframe created like this:
import pandas as pd
import numpy as np
df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
columns=['c' + str(i) for i in range(6)],
index=["r" + str(i) for i in range(6)])
which could look as follows:
c0 c1 c2 c3 c4 c5
r0 2 7 3 3 2 8
r1 6 9 6 7 9 1
r2 4 0 9 8 4 2
r3 9 0 4 3 5 4
r4 7 6 8 8 0 8
r5 0 6 1 8 2 2
I can easily select certain rows and/or a range of columns using .loc
:
print df.loc[['r1', 'r5'], 'c1':'c4']
That would return:
c1 c2 c3 c4
r1 9 6 7 9
r5 6 1 8 2
So, particular rows/columns I can select in a list, a range of rows/columns using a colon.
How would one do this in R? Here and here one always has to specify the desired range of columns by their index but one cannot - or at least I did not find it - access those by name. To give an example:
df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
The command
df[c('r1', 'r5'),'c1':'c4']
does not work and throws an error. The only thing that worked for me is
df[c('r1', 'r5'), 1:4]
which returns
c1 c2 c3 c4
r1 1 2 3 4
r5 5 6 7 8
But how would I select the columns by their name and not by their index (which might be important when I drop certain columns throughout the analysis)? In this particular case I could of course use grep
but how about columns that have arbitrary names?
So I don't want to use
df[c('r1', 'r5'),c('c1','c2', 'c3', 'c4')]
but an actual slice.
EDIT:
A follow-up question can be found here.
It looks like you can accomplish this with a subset
:
> df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
> rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
> subset(df, select=c1:c4)
c1 c2 c3 c4
r1 1 2 3 4
r2 2 3 4 5
r3 3 4 5 6
r4 4 5 6 7
r5 5 6 7 8
r6 6 7 8 9
> subset(df, select=c1:c2)
c1 c2
r1 1 2
r2 2 3
r3 3 4
r4 4 5
r5 5 6
r6 6 7
If you want to subset by row name range, this hack would do:
> gRI <- function(df, rName) {which(match(rNames, rName) == 1)}
> df[gRI(df,"r2"):gRI(df,"r4"),]
c1 c2 c3 c4 c5 c6
r2 2 3 4 5 6 7
r3 3 4 5 6 7 8
r4 4 5 6 7 8 9
这篇关于如何根据数据帧的名称而不是索引来选择列的范围?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!