如何在给定条件下将多个数据框objs合并到单个数据框obj中? [英] How to merge multiple data frame objs into a single data frame obj with given conditions in pandas/python

查看:101
本文介绍了如何在给定条件下将多个数据框objs合并到单个数据框obj中?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我要发送给python后端服务的POST请求如下,

The POST request I'm sending to my python backend service is as below,

{
    "updated_by": "969823826",
    "relation_on": "ID",
    "join_type": "inner",
    "sources": [
    {
        "json_obj": "path/demo8.json",
        "columns": [
            "ID",
            "FIRST_NAME",
            "LAST_NAME"
        ]
    },
    {
        "json_obj": "path/demo1.json",
        "columns": [
            "ID",
            "CITY",
            "SSN"
        ]
    }
  ]
}

因此,我正在尝试根据ID列合并为两个INNER JOIN对象。

So, I'm trying to merge as INNER JOIN the two sources objects based on ID column.

我正在合并 ID,FIRST_NAME , FILE1 中的LAST_NAME FILE2 中的 ID,CITY,SSN

I'm merging ID, FIRST_NAME, LAST_NAME from FILE1 with ID, CITY, SSN from FILE2.

通过使用静态方法,我可以做到这一点。

By using a static method I'm able to do this.

这是我的sta代码示例tic方法,

Here's my code sample for static method,

import json
import pandas as pd

file1 = "path\\demo1.json"
file2 = "path\\demo3.json"

df1 = pd.read_json(file1)
df2 = pd.read_json(file2)

#merge with specific columns and conditions
new_df = pd.merge(df1[['ID', 'FIRST_NAME', 'LAST_NAME']], df2[['ID', 'CITY', 'SSN']], on='ID', how="inner")   

#merging without any common column
df1['tmp'] = 1
df2['tmp'] = 1     

new_df = pd.merge(df1, df2, on=['tmp'])
new_df = new_df.drop('tmp', axis=1)

new_df.to_json("path\\merge-json.json", orient='records')

现在,如果我想使用for循环以动态方式合并数据帧,则会遇到麻烦。

Now, if I want to merge the data frames in a dynamic way by using for loop, I'm having some trouble.

尝试了几种选择,但是,我认为方向不对。

Tried several options, but, I think I'm not going into the right direction.

以下是动态方法的代码,

Here's the code for dynamic method,

updated_by = request.get_json()['updated_by']
relation_on = request.get_json()['relation_on']
join_type = request.get_json()['join_type']

sources = request.get_json()['sources']
sources = str(sources).replace("'", '"')
sources = json.loads(sources)

for sources_key, sources_value in enumerate(sources):
    print(sources_key, sources_value)

到此为止,上面的代码是执行,并且能够查看以下对象,

Till this point for the above code, it's executing and I'm able to view the objects as the below,

0 {'ctl_key': '969823826demo8txt', 'json_obj': 'path/demo8.json', 'columns': ['ID', 'FIRST_NAME', 'LAST_NAME']}
1 {'ctl_key': '969823826demo1csv', 'json_obj': 'path/demo1.json', 'columns': ['ID', 'CITY', 'SSN']}

现在,我最初的方法是根据文件输入创建新的数据帧,然后合并这两个数据帧并创建最终的数据帧。

Now, my initial approaches were to create new dataframes based on the file inputs and then merge those two data frames and create the final one.

需要JSON obj输出如下,

[
  {
    "ID": 1,
    "FIRST_NAME": "Albertine",
    "LAST_NAME": "Jan",
    "CITY": "Waymill",
    "SSN": "515-72-7353"
  },
  {
    "ID": 2,
    "FIRST_NAME": "Maryetta",
    "LAST_NAME": "Hoyt",
    "CITY": "Spellbridge",
    "SSN": "515-72-7354"
  },
  {
    "ID": 3,
    "FIRST_NAME": "Dustin",
    "LAST_NAME": "Divina",
    "CITY": "Stoneland",
    "SSN": "515-72-7355"
  },
  {
    "ID": 4,
    "FIRST_NAME": "Jenna",
    "LAST_NAME": "Sofia",
    "CITY": "Fayview",
    "SSN": "515-72-7356"
  }
]

任何准则,请...

推荐答案

当我外部连接数据框时,我想对要连接的列使用 pd.set_index 然后使用 pd.concat([df1,df2] ,轴= 1)
我认为这种情况应该有效。

When I outer join dataframes I like to use pd.set_index to the column I want to join on then use pd.concat([df1, df2], axis=1). I think that should work for this case.

这篇关于如何在给定条件下将多个数据框objs合并到单个数据框obj中?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆