在pandas.dataframe中搜索优化的选择 [英] search for an optimized selection in a pandas.dataframe
本文介绍了在pandas.dataframe中搜索优化的选择的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
根据此选择,在pandas.dataframe中选择包含N列(字符串,整数和浮点数)的某些行的最有效方法是:
What is the most efficient way for selecting some rows in a pandas.dataframe, containing N columns (strings, integers and floats), according to this selection:
- 遍历2个列(整数)的所有组合.
- 对于每种不同的组合,请仅保留一行(即所有列),将最小值保留在第三列(浮点数)中
例如,对于(titi,tutu)与第三列为tete的组合:
for instance, for combinations of (titi,tutu) with the third column being tete:
toto tata titi tutu tete
0 a 18 600 700 4.5
1 b 18 600 800 10.1
2 c 18 600 700 12.6
3 d 3 300 400 3.4
4 a 16 900 1000 6.0
5 a 18 600 800 10.1
6 c 3 300 400 3.0
7 a 16 900 1000 6.0
必须给:
toto tata titi tutu tete
0 a 18 600 700 4.5
1 b 18 600 800 10.1
4 a 16 900 1000 6.0
6 c 3 300 400 3.0
此刻,我从以下代码开始:
For the moment, I began with the following code:
import pandas
indicesToKeep = []
indicesToRemove = []
reader = pandas.read_csv('/Users/steph/work/perso/sof/test.csv')
columns = reader.columns
for i in reader['titi'].unique():
#temp = reader[[:]].query('titi == i')#does not work !
temp = reader.loc[(reader.titi == i),columns]
for j in temp['tutu'].unique():
temp2 = temp.loc[(temp.tutu == j),columns]
minimum = min(temp2.tete)
indicesToKeep.append(min(
temp2[temp2.tete==minimum].index.tolist()))
################
# compute the complement of indicesToKeep
#but I don't remember the pythonic syntax
for i in range(len(reader)):
if i not in indicesToKeep:
indicesToRemove.append(i)
############################
reader = reader.drop(indicesToRemove)
注意:
- 我确定这没有优化.
- 我使用旧的"loc"方法,因为我不知道如何使用"query"
推荐答案
IIUC sort_values
+ drop_duplicates
,如果您起诉熊猫尝试不使用for循环,则大多数情况下它比矢量化方法慢>
IIUC sort_values
+drop_duplicates
, if you are suing pandas try to not using for loop,most of time it is slow than the vectorized method
df.sort_values('tete').drop_duplicates(['titi','tutu']).sort_index()
Out[583]:
toto tata titi tutu tete
0 a 18 600 700 4.5
1 b 18 600 800 10.1
4 a 16 900 1000 6.0
6 c 3 300 400 3.0
这篇关于在pandas.dataframe中搜索优化的选择的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文