将Pandas数据框的行映射到numpy数组 [英] Mapping rows of a Pandas dataframe to numpy array

查看:104
本文介绍了将Pandas数据框的行映射到numpy数组的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对不起,我知道有很多与索引相关的问题,它可能使我无所适从,但是我对此有些麻烦.我熟悉 .loc .iloc .index 方法以及一般的切片.方法 .reset_index 可能尚未(也可能无法)在我们的数据帧上调用,因此索引标签可能不正确.数据框和numpy数组实际上是数据框的长度不同的子集,但是对于本示例,我将使其保持相同的大小(一旦有了示例,我就可以处理偏移量).

Sorry, I know there are so many questions relating to indexing, and it's probably starring me in the face, but I'm having a little trouble with this. I am familiar with .loc, .iloc, and .index methods and slicing in general. The method .reset_index may not have been (and may not be able to be) called on our dataframe and therefore index lables may not be in order. The dataframe and numpy array(s) are actually different length subsets of the dataframe, but for this example I'll keep them the same size (I can handle offsetting once I have an example).

这是一张图片,显示我在寻找什么:

Here is a picture that show's what I'm looking for:

我可以根据一些搜索条件从数据框中提取行的列.

I can pull cols of rows from the dataframe based on some search criteria.

idxlbls = df.index[df['timestamp'] == dt]
stuff = df.loc[idxlbls, 'col3':'col5']

但是如何将其映射到行号(数组索引,而不是标签索引)以用作numpy中的数组索引(假设行长相同)?

But how do I map that to row number (array indices, not label indices) to be used as an array index in numpy (assuming same row length)?

stuffprime = array[?, ?]

我需要它的原因是因为数据帧更大且更完整,并且包含列搜索条件,但是numpy数组是在管道中事先提取和修改的子集(并且没有相同的搜索条件)在他们之中).我需要搜索数据框并从numpy数组中提取等效数据.基本上,我需要将数据帧中的特定行与numpy数组的相应行相关联.

The reason I need it is because the dataframe is much larger and more complete and contains the column searching criteria, but the numpy arrays are subsets that have been extracted and modified prior in the pipeline (and do not have the same searching criteria in them). I need to search the dataframe and pull the equivalent data from the numpy arrays. Basically I need to correlate specific rows from a dataframe to the corresponding rows of a numpy array.

推荐答案

我认为需要

I believe need get_indexer for positions by filtered columns names, for index is possible use same way or numpy.where for positions by boolean mask:

df = pd.DataFrame({'timestamp':list('abadef'),
                   'B':[4,5,4,5,5,4],
                   'C':[7,8,9,4,2,3],
                   'D':[1,3,5,7,1,0],
                   'E':[5,3,6,9,2,4]}, index=list('ABCDEF'))

print (df)
  timestamp  B  C  D  E
A         a  4  7  1  5
B         b  5  8  3  3
C         a  4  9  5  6
D         d  5  4  7  9
E         e  5  2  1  2
F         f  4  3  0  4

idxlbls = df.index[df['timestamp'] == 'a']
stuff = df.loc[idxlbls, 'C':'E']
print (stuff)
   C  D  E
A  7  1  5
C  9  5  6

a = df.index.get_indexer(stuff.index)

或通过布尔掩码获取职位:

Or get positions by boolean mask:

a = np.where(df['timestamp'] == 'a')[0]

print (a)
[0 2]


b = df.columns.get_indexer(stuff.columns)
print (b)
[2 3 4]

这篇关于将Pandas数据框的行映射到numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆