如何根据数据帧的名称而不是索引来选择列的范围？ [英] How to select range of columns in a dataframe based on their name and not their indexes?

查看：229 发布时间：2017/3/26 4:48:09 r dataframe subset

本文介绍了如何根据数据帧的名称而不是索引来选择列的范围？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

 将pandas作为pd 
导入numpy作为np 
 
 df = pd.DataFrame（np.random.randint（10，size =（6，6）），
 columns = ['c'+ str（i）for i in range（6） ]，
 index = [r+ str（i）for i in range（6）]）

可以看起来如下：

  c0 c1 c2 c3 c4 c5 
 r0 2 7 3 3 2 8 
 r1 6 9 6 7 9 1 
 r2 4 0 9 8 4 2 
 r3 9 0 4 3 5 4 
 r4 7 6 8 8 0 8 
 r5 0 6 1 8 2 2

我可以轻松地选择某些行和/或一系列列 .loc ：

  print df.loc [['r1' 'r5']，'c1'：'c4']

那将返回：

  c1 c2 c3 c4 
 r1 9 6 7 9 
 r5 6 1 8 2

所以，我可以在列表中选择特定的行/列，使用冒号的行/列范围。

如何在R中执行此操作？此处和 here 一般必须通过索引指定列的期望范围，但不能 - 或在至少我没有找到它 - 按名称访问这些。举个例子：

  df<  -  data.frame（c1 = 1：6，c2 = 2：7，c3 = 3：8，c4 = 4：9，c5 = 5：10，c6 = 6：11）
 rownames（df）<  -  c（'r1'，'r2'，'r3'，'r4 '，'r5'，'r6'）

该命令

  df [c（'r1'，'r5'），'c1'：'c4']

不起作用，并引发错误。对我来说唯一有用的是

  df [c（'r1'，'r5'），1：4]

其中返回

  c1 c2 c3 c4 
 r1 1 2 3 4 
 r5 5 6 7 8

但是我如何按名称选择列，而不是按索引选择列（这在分析中删除某些列时可能很重要）？在这种特殊情况下，我当然可以使用 grep ，但是具有任意名称的列如何？

所以我不想使用

  df [c（'r1'，'r5'），c（'c1'，'c2 '，'c3'，'c4'）]

但是一个实际的切片。

编辑：

可以找到后续问题 here 。

解决方案

看起来你可以用子集完成这个操作：

 > df<  -  data.frame（c1 = 1：6，c2 = 2：7，c3 = 3：8，c4 = 4：9，c5 = 5：10，c6 = 6：11）
& rownames（df）<  -  c（'r1'，'r2'，'r3'，'r4'，'r5'，'r6'）
>子集（df，select = c1：c4）
 c1 c2 c3 c4 
 r1 1 2 3 4 
 r2 2 3 4 5 
 r3 3 4 5 6 
 r4 4 5 6 7 
 r5 5 6 7 8 
 r6 6 7 8 9 
>子集（df，select = c1：c2）
 c1 c2 
 r1 1 2 
 r2 2 3 
 r3 3 4 
 r4 4 5 
 r5 5 6 
 r6 6 7

如果你想按行名范围排列，这个黑客会do：

 > gRI<  -  function（df，rName）{which（match（rNames，rName）== 1）} 
> df [gRI（df，r2）：gRI（df，r4），] 
 c1 c2 c3 c4 c5 c6 
 r2 2 3 4 5 6 7 
 r3 3 4 5 6 7 8 
 r4 4 5 6 7 8 9

In a pandas dataframe created like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
                  columns=['c' + str(i) for i in range(6)],
                  index=["r" + str(i) for i in range(6)])

which could look as follows:

    c0  c1  c2  c3  c4  c5
r0   2   7   3   3   2   8
r1   6   9   6   7   9   1
r2   4   0   9   8   4   2
r3   9   0   4   3   5   4
r4   7   6   8   8   0   8
r5   0   6   1   8   2   2

I can easily select certain rows and/or a range of columns using .loc:

print df.loc[['r1', 'r5'], 'c1':'c4']

That would return:

    c1  c2  c3  c4
r1   9   6   7   9
r5   6   1   8   2

So, particular rows/columns I can select in a list, a range of rows/columns using a colon.

How would one do this in R? Here and here one always has to specify the desired range of columns by their index but one cannot - or at least I did not find it - access those by name. To give an example:

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')

The command

df[c('r1', 'r5'),'c1':'c4']

does not work and throws an error. The only thing that worked for me is

df[c('r1', 'r5'), 1:4]

which returns

   c1 c2 c3 c4
r1  1  2  3  4
r5  5  6  7  8

But how would I select the columns by their name and not by their index (which might be important when I drop certain columns throughout the analysis)? In this particular case I could of course use grep but how about columns that have arbitrary names?

So I don't want to use

df[c('r1', 'r5'),c('c1','c2', 'c3', 'c4')]

but an actual slice.

EDIT:

A follow-up question can be found here.

解决方案

It looks like you can accomplish this with a subset:

> df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
> rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')
> subset(df, select=c1:c4)
   c1 c2 c3 c4
r1  1  2  3  4
r2  2  3  4  5
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
r6  6  7  8  9
> subset(df, select=c1:c2)
   c1 c2
r1  1  2
r2  2  3
r3  3  4
r4  4  5
r5  5  6
r6  6  7

If you want to subset by row name range, this hack would do:

> gRI <- function(df, rName) {which(match(rNames, rName) == 1)}
> df[gRI(df,"r2"):gRI(df,"r4"),]
   c1 c2 c3 c4 c5 c6
r2  2  3  4  5  6  7
r3  3  4  5  6  7  8
r4  4  5  6  7  8  9

这篇关于如何根据数据帧的名称而不是索引来选择列的范围？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何根据数据帧的名称而不是索引来选择列的范围？ [英] How to select range of columns in a dataframe based on their name and not their indexes?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何根据数据帧的名称而不是索引来选择列的范围？ [英] How to select range of columns in a dataframe based on their name and not their indexes?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭