在pandas框架列中查找数组元素位置(a.k.a pd.series) [英] Finding an array elements location in a pandas frame column (a.k.a pd.series)
问题描述
我有一个类似于这个的熊猫框架:
I have a pandas frame similar to this one:
import pandas as pd
import numpy as np
data = {'Col1' : [4,5,6,7], 'Col2' : [10,20,30,40], 'Col3' : [100,50,-30,-50], 'Col4' : ['AAA', 'BBB', 'AAA', 'CCC']}
df = pd.DataFrame(data=data, index = ['R1','R2','R3','R4'])
Col1 Col2 Col3 Col4
R1 4 10 100 AAA
R2 5 20 50 BBB
R3 6 30 -30 AAA
R4 7 40 -50 CCC
给定一系列目标:
target_array = np.array(['AAA', 'CCC', 'EEE'])
我想找到 Col4
中的单元格元素索引,它们也出现在 target_array
。
I would like to find the cell elements indices in Col4
which also appear in the target_array
.
我试图找到一个记录在案的答案,但这似乎超出了我的技能......任何人都有任何建议吗?
I have tried to find a documented answer but it seems beyond my skill... Anyone has any advice?
PS顺便提一下,对于这种特殊情况,我可以输入一个目标数组,其元素是数据帧索引名称 array(['R1','R3','R5'])
。这样会更容易吗?
P.S. Incidentally, for this particular case I can input a target array whose elements are the data frame indices names array(['R1', 'R3', 'R5'])
. Would it be easier that way?
编辑1:
非常感谢你们所有的好评。可悲的是,我只能选择一个,但每个人似乎都认为@Divakar是最好的。你还应该看看所有可能的piRSquared和MaxU速度比较
Thank you very much for all the great replies. Sadly I can only choose one but everyone seems to point @Divakar as the best. Still you should look at piRSquared and MaxU speed comparisons for all the possibilities available
推荐答案
你可以使用NumPy in1d
-
You can use NumPy's in1d
-
df.index[np.in1d(df['Col4'],target_array)]
说明
1)创建 1D 每行对应的code>掩码告诉我们
col4的
元素与 target_array
:
1) Create a 1D
mask corresponding to each row telling us whether there is a match between col4's
element and any element in target_array
:
mask = np.in1d(df['Col4'],target_array)
2)使用掩码从数据帧中选择有效索引作为最终输出:
2) Use the mask to select valid indices from the dataframe as final output :
out = df.index[np.in1d(df['Col4'],target_array)]
这篇关于在pandas框架列中查找数组元素位置(a.k.a pd.series)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!