将Pandas数据框的行映射到numpy数组 [英] Mapping rows of a Pandas dataframe to numpy array
问题描述
对不起,我知道有很多与索引相关的问题,它可能使我无所适从,但是我对此有些麻烦.我熟悉 .loc
, .iloc
和 .index
方法以及一般的切片.方法 .reset_index
可能尚未(也可能无法)在我们的数据帧上调用,因此索引标签可能不正确.数据框和numpy数组实际上是数据框的长度不同的子集,但是对于本示例,我将使其保持相同的大小(一旦有了示例,我就可以处理偏移量).
Sorry, I know there are so many questions relating to indexing, and it's probably starring me in the face, but I'm having a little trouble with this. I am familiar with .loc
, .iloc
, and .index
methods and slicing in general. The method .reset_index
may not have been (and may not be able to be) called on our dataframe and therefore index lables may not be in order. The dataframe and numpy array(s) are actually different length subsets of the dataframe, but for this example I'll keep them the same size (I can handle offsetting once I have an example).
这是一张图片,显示我在寻找什么:
Here is a picture that show's what I'm looking for:
我可以根据一些搜索条件从数据框中提取行的列.
I can pull cols of rows from the dataframe based on some search criteria.
idxlbls = df.index[df['timestamp'] == dt]
stuff = df.loc[idxlbls, 'col3':'col5']
但是如何将其映射到行号(数组索引,而不是标签索引)以用作numpy中的数组索引(假设行长相同)?
But how do I map that to row number (array indices, not label indices) to be used as an array index in numpy (assuming same row length)?
stuffprime = array[?, ?]
我需要它的原因是因为数据帧更大且更完整,并且包含列搜索条件,但是numpy数组是在管道中事先提取和修改的子集(并且没有相同的搜索条件)在他们之中).我需要搜索数据框并从numpy数组中提取等效数据.基本上,我需要将数据帧中的特定行与numpy数组的相应行相关联.
The reason I need it is because the dataframe is much larger and more complete and contains the column searching criteria, but the numpy arrays are subsets that have been extracted and modified prior in the pipeline (and do not have the same searching criteria in them). I need to search the dataframe and pull the equivalent data from the numpy arrays. Basically I need to correlate specific rows from a dataframe to the corresponding rows of a numpy array.
推荐答案
我认为需要 numpy.where
用于布尔掩码的位置:
I believe need get_indexer
for positions by filtered columns names, for index is possible use same way or numpy.where
for positions by boolean mask:
df = pd.DataFrame({'timestamp':list('abadef'),
'B':[4,5,4,5,5,4],
'C':[7,8,9,4,2,3],
'D':[1,3,5,7,1,0],
'E':[5,3,6,9,2,4]}, index=list('ABCDEF'))
print (df)
timestamp B C D E
A a 4 7 1 5
B b 5 8 3 3
C a 4 9 5 6
D d 5 4 7 9
E e 5 2 1 2
F f 4 3 0 4
idxlbls = df.index[df['timestamp'] == 'a']
stuff = df.loc[idxlbls, 'C':'E']
print (stuff)
C D E
A 7 1 5
C 9 5 6
a = df.index.get_indexer(stuff.index)
或通过布尔掩码获取职位:
Or get positions by boolean mask:
a = np.where(df['timestamp'] == 'a')[0]
print (a)
[0 2]
b = df.columns.get_indexer(stuff.columns)
print (b)
[2 3 4]
这篇关于将Pandas数据框的行映射到numpy数组的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!