在两列python数据框之间的范围内搜索特定值 [英] Searching a particular value in a range among two columns python dataframe
问题描述
我有两个csv文件,根据csv文件1中单元格的值,我应该能够在csv文件2的列中搜索该值并从csv文件2的另一列中获取相应的值.很抱歉,这很令人困惑.通过插图可能会弄清楚
I have two csv files.Depending upon the value of a cell in csv file 1 I should be able to search that value in a column of csv file 2 and get he corresponding value from other column in csv file 2. I am sorry if this very confusing.It will probably get clear by illustration
CSV文件1
Car Mileage
A 8
B 6
C 10
CSV文件2
Score Mileage(Min) Mileage(Max)
1 1 3
2 4 6
3 7 9
4 10 12
5 13 15
我想要的输出CSV文件就是这样
And my desired output CSV file is something like this
Car Mileage Score
A 8 3
B 6 2
C 10 4
根据A的行驶里程8,给汽车A得分3,然后在csv文件2中查看该行驶里程在哪个范围内,然后获得该范围的相应得分值.任何帮助将不胜感激预先感谢
Car A is given a score of 3 depending upon its mileage 8 and then looking that mileage in csv file 2 in what range it falls and then getting corresponding score value for that range. Any help will be appreciated Thanks in advance
推荐答案
撰写本文时,当前的稳定版本是v0.21.
要读取文件,请使用 pd.read_csv
-
To read your files, use pd.read_csv
-
df0 = pd.read_csv('file1.csv')
df1 = pd.read_csv('file2.csv')
df0
Car Mileage
0 A 8
1 B 6
2 C 10
df1
Score Mileage(Min) Mileage(Max)
0 1 1 3
1 2 4 6
2 3 7 9
3 4 10 12
4 5 13 15
要找到分数,请使用 pd.通过调用
.这应该真的很快- IntervalIndex.from_tuples
来访问IntervalIndex
To find the Score, use pd.IntervalIndex
by calling IntervalIndex.from_tuples
. This should be really fast -
v = df1.loc[:, 'Mileage(Min)':'Mileage(Max)'].apply(tuple, 1).tolist()
idx = pd.IntervalIndex.from_tuples(v, closed='both') # you can also use `from_arrays`
df0['Score'] = df1.iloc[idx.get_indexer(df0.Mileage.values), 'Score'].values
df0
Car Mileage Score
0 A 8 3
1 B 6 2
2 C 10 4
Other methods of creating an IntervalIndex
are outlined here.
要编写结果,请使用 pd.DataFrame.to_csv
-
To write your result, use pd.DataFrame.to_csv
-
df0.to_csv('file3.csv')
这是我在这里所做的概述.
Here's a high level outline of what I've done here.
- 首先,读入CSV文件
- 使用
pd.IntervalIndex
构建间隔索引树.因此,搜索现在的复杂度是对数的. - 使用
idx.get_indexer
来查找树中每个值的索引 - 使用索引在
df1
中找到Score
值,并将其分配回df0
.请注意,我叫.values
,否则,分配回来时值将不对齐. - 将结果写回CSV
- First, read in your CSV files
- Use
pd.IntervalIndex
to build an interval index tree. So, searching is now logarithmic in complexity. - Use
idx.get_indexer
to find the index of each value in the tree - Use the index to locate the
Score
value indf1
, and assign this back todf0
. Note that I call.values
, otherwise, the values will be misaligned when assigning back. - Write your result back to CSV
有关 Intervalindex
的更多信息,请查看此SO Q/A-
For more information on Intervalindex
, take a look at this SO Q/A - Finding matching interval(s) in pandas Intervalindex
请注意, IntervalIndex
是 v0.20
中的新增功能,因此,如果您使用的是旧版本,请确保使用
Note that IntervalIndex
is new in v0.20
, so if you have an older version, make sure you update your version with
pip install --upgrade pandas
这篇关于在两列python数据框之间的范围内搜索特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!