在两列python数据框之间的范围内搜索特定值 [英] Searching a particular value in a range among two columns python dataframe

查看:44
本文介绍了在两列python数据框之间的范围内搜索特定值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有两个csv文件,根据csv文件1中单元格的值,我应该能够在csv文件2的列中搜索该值并从csv文件2的另一列中获取相应的值.很抱歉,这很令人困惑.通过插图可能会弄清楚

I have two csv files.Depending upon the value of a cell in csv file 1 I should be able to search that value in a column of csv file 2 and get he corresponding value from other column in csv file 2. I am sorry if this very confusing.It will probably get clear by illustration

CSV文件1

Car   Mileage
 A       8
 B       6
 C       10

CSV文件2

Score  Mileage(Min)    Mileage(Max)
 1       1                 3
 2       4                 6
 3       7                 9
 4       10                12 
 5       13                15 

我想要的输出CSV文件就是这样

And my desired output CSV file is something like this

Car    Mileage     Score
 A       8           3
 B       6           2
 C       10          4

根据A的行驶里程8,给汽车A得分3,然后在csv文件2中查看该行驶里程在哪个范围内,然后获得该范围的相应得分值.任何帮助将不胜感激预先感谢

Car A is given a score of 3 depending upon its mileage 8 and then looking that mileage in csv file 2 in what range it falls and then getting corresponding score value for that range. Any help will be appreciated Thanks in advance

推荐答案

撰写本文时,当前的稳定版本是v0.21.

要读取文件,请使用 pd.read_csv -

To read your files, use pd.read_csv -

df0 = pd.read_csv('file1.csv')
df1 = pd.read_csv('file2.csv')

df0

  Car  Mileage
0   A        8
1   B        6
2   C       10

df1

   Score  Mileage(Min)  Mileage(Max)
0      1             1             3
1      2             4             6
2      3             7             9
3      4            10            12
4      5            13            15

要找到分数,请使用 pd.通过调用 IntervalIndex.from_tuples 来访问IntervalIndex .这应该真的很快-

To find the Score, use pd.IntervalIndex by calling IntervalIndex.from_tuples. This should be really fast -

v = df1.loc[:, 'Mileage(Min)':'Mileage(Max)'].apply(tuple, 1).tolist()
idx = pd.IntervalIndex.from_tuples(v, closed='both') # you can also use `from_arrays`


df0['Score'] = df1.iloc[idx.get_indexer(df0.Mileage.values), 'Score'].values
df0

  Car  Mileage  Score
0   A        8      3
1   B        6      2
2   C       10      4

其他创建 IntervalIndex 的方法概述如下

Other methods of creating an IntervalIndex are outlined here.

要编写结果,请使用 pd.DataFrame.to_csv -

To write your result, use pd.DataFrame.to_csv -

df0.to_csv('file3.csv')


这是我在这里所做的概述.


Here's a high level outline of what I've done here.

  1. 首先,读入CSV文件
  2. 使用 pd.IntervalIndex 构建间隔索引.因此,搜索现在的复杂度是对数的.
  3. 使用 idx.get_indexer 来查找树中每个值的索引
  4. 使用索引在 df1 中找到 Score 值,并将其分配回 df0 .请注意,我叫 .values ,否则,分配回来时值将不对齐.
  5. 将结果写回CSV
  1. First, read in your CSV files
  2. Use pd.IntervalIndex to build an interval index tree. So, searching is now logarithmic in complexity.
  3. Use idx.get_indexer to find the index of each value in the tree
  4. Use the index to locate the Score value in df1, and assign this back to df0. Note that I call .values, otherwise, the values will be misaligned when assigning back.
  5. Write your result back to CSV

有关 Intervalindex 的更多信息,请查看此SO Q/A-

For more information on Intervalindex, take a look at this SO Q/A - Finding matching interval(s) in pandas Intervalindex

请注意, IntervalIndex v0.20 中的新增功能,因此,如果您使用的是旧版本,请确保使用

Note that IntervalIndex is new in v0.20, so if you have an older version, make sure you update your version with

pip install --upgrade pandas

这篇关于在两列python数据框之间的范围内搜索特定值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆