如何通过根据名称而不是索引选择一列列和行来切片数据框? [英] How to slice a dataframe by selecting a range of columns and rows based on names and not indexes?

查看:223
本文介绍了如何通过根据名称而不是索引选择一列列和行来切片数据框?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这是我问的问题的后续问题这里。在那里我学到了一个如何做这个列(见下文)和b)在R中选择行和列似乎有很大的不同,这意味着我不能对行使用相同的方法。



所以假设我有一个这样的大熊猫数据框:

  import pandas as pd 
import numpy as np

df = pd.DataFrame(np.random.randint(10,size =(6,6)),
columns = ['c'+ str i)for i in range(6)],
index = [r+ str(i)for i in range(6)])

c0 c1 c2 c3 c4 c5
r0 4 2 3 9 9 0
r1 9 0 8 1 7 5
r2 2 6 7 5 4 7
r3 6 9 9 1 3 4
r4 1 1 1 3 0 3
r5 0 8 5 8 2 9

那么我可以轻松选择行和列的名称如下所示:

  print df.loc ['r3':'r5','c1' c4'] 

哪些返回

  c1 c2 c3 c4 
r3 9 9 1 3
r4 1 1 3 0
r5 8 5 8 2

如何在R中执行此操作?给出这样一个数据框

  df<  -  data.frame(c1 = 1:6,c2 = 2:7,c3 = 3:8,c4 = 4:9,c5 = 5:10,c6 = 6:11)
rownames(df)< - c('r1','r2','r3','r4 ','r5','r6')

c1 c2 c3 c4 c5 c6
r1 1 2 3 4 5 6
r2 2 3 4 5 6 7
r3 3 4 5 6 7 8
r4 4 5 6 7 8 9
r5 5 6 7 8 9 10
r6 6 7 8 9 10 11

显然,如果我知道我所需行/列的索引,我可以简单地做:

  df [3:5,1:4] 

但是我可能会删除分析中的行/列,以便我宁愿选择名称比索引。从上面的链接我了解到,对于列,以下将工作:

 子集(df,select = c1:c4)

其中返回

  c1 c2 c3 c4 
r1 1 2 3 4
r2 2 3 4 5
r3 3 4 5 6
r4 4 5 6 7
r5 5 6 7 8
r6 6 7 8 9

但是我还可以选择一系列行在这个特殊情况下,我可以使用 grep ,但是列如何有任意的名字?



我不想使用

  df [c('r3','r4''r5'),c('c1','c2','c3','c4')] 
解决方案

您可以使用<$($)

c $ c> which() with rownames

  subset(df [which(rownames(df)=='r3'):which(rownames(df)=='r5'),],select = c1:c4) 


c1 c2 c3 c4
r3 3 4 5 6
r4 4 5 6 7
r5 5 6 7 8


This is a follow-up question of the question I asked here. There I learned a) how to do this for columns (see below) and b) that the selection of rows and columns seems to be quite differently handled in R which means that I cannot use the same approach for rows.

So suppose I have a pandas dataframe like this:

import pandas as pd
import numpy as np

df = pd.DataFrame(np.random.randint(10, size=(6, 6)),
                  columns=['c' + str(i) for i in range(6)],
                  index=["r" + str(i) for i in range(6)])

    c0  c1  c2  c3  c4  c5
r0   4   2   3   9   9   0
r1   9   0   8   1   7   5
r2   2   6   7   5   4   7
r3   6   9   9   1   3   4
r4   1   1   1   3   0   3
r5   0   8   5   8   2   9

then I can easily select rows and columns by their names like this:

print df.loc['r3':'r5', 'c1':'c4']

which returns

    c1  c2  c3  c4
r3   9   9   1   3
r4   1   1   3   0
r5   8   5   8   2

How would I do this in R? Given a dataframe like this

df <- data.frame(c1=1:6, c2=2:7, c3=3:8, c4=4:9, c5=5:10, c6=6:11)
rownames(df) <- c('r1', 'r2', 'r3', 'r4', 'r5', 'r6')

   c1 c2 c3 c4 c5 c6
r1  1  2  3  4  5  6
r2  2  3  4  5  6  7
r3  3  4  5  6  7  8
r4  4  5  6  7  8  9
r5  5  6  7  8  9 10
r6  6  7  8  9 10 11

Apparently, if I know the indexes of my desired rows/columns, I can simply do:

df[3:5, 1:4]

but I might delete rows/columns throughout my analysis so that I would rather select by name than by index. From the link above I learned that for columns the following would work:

subset(df, select=c1:c4)

which returns

  c1 c2 c3 c4
r1  1  2  3  4
r2  2  3  4  5
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8
r6  6  7  8  9

but how could I also select a range of rows by name at the same time?

In this particular case I could of course use grep but how about columns that have arbitrary names?

And I don't want to use

df[c('r3', 'r4' 'r5'), c('c1','c2', 'c3', 'c4')]

but an actual slice.

解决方案

You can use which() with rownames:

subset(df[which(rownames(df)=='r3'):which(rownames(df)=='r5'),], select=c1:c4)


   c1 c2 c3 c4
r3  3  4  5  6
r4  4  5  6  7
r5  5  6  7  8

这篇关于如何通过根据名称而不是索引选择一列列和行来切片数据框?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆