大 pandas ：添加具有与其他数据帧匹配的行的索引的列 [英] Pandas: add column with index of matching row from other dataframe

查看：59 发布时间：2020/10/17 2:38:44 python pandas dataframe

本文介绍了大 pandas ：添加具有与其他数据帧匹配的行的索引的列的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

清理共享点列表以使用适当的表关系将其上传到mssql。

基本上，两个数据框（数据，配置）都共享一些公共列（国家/地区，业务）。
我想做的是在datadf中插入一个新列，其中，对于每一行，它都基于列country和business的值包含configdf中匹配行的索引。

数据帧数据：

  ----- |- ------- | ---------- | ----- 
 ... |国家（地区）|商业| ... 
 ----- | --------- | ---------- | ----- 
 | A | 1 | 
 ----- | --------- | -------------- | ----- 
 | A | 1 | 
 ----- | --------- || ---------- | ----- 
 | A | 2 | 
 ----- | --------- | -------------- | ----- 
 | A | 2 | 
 ----- | --------- | -------------- | ----- 
 | B | 1 | 
 ----- | --------- | -------------- | ----- 
 | B | 1 | 
 ----- | --------- | ------------ || ----- 
 | B | 2 | 
 ----- | --------- | -------------- | ----- 
 | C | 1 | 
 ----- | --------- | -------------- | ----- 
 | C | 2 | 
 ----- | --------- | ------------ || ----

数据帧配置（ID =索引）：

 - -| --------- || ---------- | ----- 
 ID |国家（地区）|商业| ... 
 ---- | --------- | ---------- | ----- 
 1 | A | 1 | 
 ---- | --------- | ---------- || ---- 
 2 | A | 2 | 
 ---- | --------- | ---------- | ----- 
 3 | B | 1 | 
 ---- | --------- | ------------ || ---- 
 4 | B | 2 | 
 ---- | --------- | ---------- | ----- 
 5 | C | 1 | 
 ---- | --------- | ---------- || ---- 
 6 | C | 2 | 
 ---- | --------- | ---------- || ----

我想添加到数据框数据中的内容：

 - -| --------- | ---------- | ----------- | ----- 
 ... |国家（地区）|商业| config_ID | ... 
 ----- | --------- | ---------- | ----------- | ---- -
 | A | 1 | 1 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | A | 1 | 1 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | A | 2 | 2 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | A | 2 | 2 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | B | 1 | 3 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | B | 1 | 3 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | B | 2 | 4 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | C | 1 | 5 | 
 ----- | --------- | ------------ || ----------- | ----- 
 | C | 2 | 6 | 
 ----- | --------- | -------------- | ----------- | -----

----发现有用的东西----

  datadf ['config_ID'] = datadf.apply（lambda x：configdf [（configdf.country == x.country）&（configdf.business_unit == x.business_unit ）]。index [0]，轴= 1）

尽管我是公开征求其他建议，特别是如果它可以与df.insert（）一起使用。

解决方案

这里是使用熊猫合并的解决方案。 / p>

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge

 将熊猫作为pd 
 
＃导入两个数据框
 data = pd.DataFrame（{国家/地区：['A'，'A'，'A'，'A'，'B'，'B'，'B'，'C'，'C']，
业务 '：[1,1,2,2,1,1,2,1,2]}）
 
 configdf = pd.DataFrame（{'Country'：['A'，'A' ，'B'，'B'，'C'，'C']，
'Business'：[1,2,1,2,1,2]}）
 
＃使索引值为
的列configdf.reset_index（inplace = True）
 
＃根据所选列合并两个数据框。 
 newdf = data.merge（configdf，on = ['Country'，'Business']）

Cleaning up sharepoint list for upload to mssql with proper table relationships.

Basically, two dataframes (data, config), both share some common columns (country, business). What I want to do is to insert a new column in datadf where for each row it contains index of matching row in configdf based on values in columns country and business.

dataframe data:

-----|---------|----------|-----
 ... | Country | Business | ...
-----|---------|----------|-----
     |    A    |     1    |
-----|---------|----------|-----
     |    A    |     1    |
-----|---------|----------|-----
     |    A    |     2    |
-----|---------|----------|-----
     |    A    |     2    |
-----|---------|----------|-----
     |    B    |     1    |
-----|---------|----------|-----
     |    B    |     1    |
-----|---------|----------|-----
     |    B    |     2    |
-----|---------|----------|-----
     |    C    |     1    |
-----|---------|----------|-----
     |    C    |     2    |
-----|---------|----------|-----

dataframe config (ID = index):

----|---------|----------|-----
 ID | Country | Business | ...
----|---------|----------|-----
  1 |    A    |     1    |
----|---------|----------|-----
  2 |    A    |     2    |
----|---------|----------|-----
  3 |    B    |     1    |
----|---------|----------|-----
  4 |    B    |     2    |
----|---------|----------|-----
  5 |    C    |     1    |
----|---------|----------|-----
  6 |    C    |     2    |
----|---------|----------|-----

what I want to add to dataframe data:

-----|---------|----------|-----------|-----
 ... | Country | Business | config_ID | ... 
-----|---------|----------|-----------|-----
     |    A    |     1    |     1     |
-----|---------|----------|-----------|-----
     |    A    |     1    |     1     |
-----|---------|----------|-----------|-----
     |    A    |     2    |     2     |
-----|---------|----------|-----------|-----
     |    A    |     2    |     2     |
-----|---------|----------|-----------|-----
     |    B    |     1    |     3     |
-----|---------|----------|-----------|-----
     |    B    |     1    |     3     |
-----|---------|----------|-----------|-----
     |    B    |     2    |     4     |
-----|---------|----------|-----------|-----
     |    C    |     1    |     5     |
-----|---------|----------|-----------|-----
     |    C    |     2    |     6     |
-----|---------|----------|-----------|-----

----Found something that works----

datadf['config_ID'] =  datadf.apply(lambda x: configdf[(configdf.country == x.country) & (configdf.business_unit == x.business_unit)].index[0], axis=1)

It gets the job done, although I am open for other suggestions, especially if it could work with df.insert()

解决方案

Here is a solution using pandas merge.

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.merge.html#pandas.DataFrame.merge

import pandas as pd

# make the two dataframes
data = pd.DataFrame({'Country':['A','A','A','A','B','B','B','C','C'],
                     'Business':[1,1,2,2,1,1,2,1,2]})

configdf = pd.DataFrame({'Country':['A','A','B','B','C','C'],
                         'Business':[1,2,1,2,1,2]})

# make a column with the index values
configdf.reset_index(inplace=True)

# merge the two dataframes based on the selected columns.
newdf = data.merge(configdf, on=['Country', 'Business'])

这篇关于大 pandas ：添加具有与其他数据帧匹配的行的索引的列的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

大 pandas ：添加具有与其他数据帧匹配的行的索引的列 [英] Pandas: add column with index of matching row from other dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

大 pandas ：添加具有与其他数据帧匹配的行的索引的列 [英] Pandas: add column with index of matching row from other dataframe

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭