使用 Pandas 向 mongoDB 插入新字段(列) [英] Inserting new fields(columns) to mongoDB with pandas

查看:141
本文介绍了使用 Pandas 向 mongoDB 插入新字段(列)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在 MongoDB 中有一个现有数据,其中主键设置在日期"上,其中包含一些字段.

I have an existing data in MongoDB where Primary Key is set on 'date' with a few fields in it.

我想将一个带有新字段(列)的新 Pandas 数据框插入到 MongoDB 中的现有数据中,并加入两个数据框上都存在的日期"字段.

And I want to insert a new pandas dataframe with new fields(columns) to the existing data in MongoDB, joining on the 'date' field which exists on the both dataframe.

例如,假设这是我在 MongoDB 中的数据框 A(从 MongoDB 调用数据时,我使用日期"字段设置索引)

For example, lets say the this is dataframe A I have in my MongoDB ( I set the index with 'date' field when calling the data from MongoDB)

这是我想插入到 MongoDB 的新数据框 B

And this is the new dataframe B I want to insert to MongoDB

这是带有新字段的最终数据框 C('std_50_3000window'、'std_50_300window'、'std_50_500window' 添加到 'date' 索引),我希望它有我的 MongoDB.

And this is the final dataframe C with new fields( 'std_50_3000window', 'std_50_300window', 'std_50_500window' added on 'date' index), which I want it to have on my MongoDB.

有没有办法做到这一点?(也许使用 insert_many 方法?)

Is there any way to do this?? (Maybe with insert_many method?)

推荐答案

你需要的方法是 update_one()upsert=True 在一个循环中;你不能使用 insert_many() 有两个原因;首先,您并不总是插入;有时您正在更新;其次 update_many()(和 insert_many())仅适用于单个过滤器;在您的情况下,每个过滤器都不同,因为每次更新都与不同的时间相关.

The method you need is update_one() with upsert=True in a loop; you can't use insert_many() for two reasons; firstly your not always inserting; sometime you are updating; secondly update_many() (and insert_many()) only work on a single filter; in your case each filter is different as each update relates to a different time.

这是一种通用解决方案,它将以您需要的方式组合数据帧(在本例中为df_adf_b - 您可以拥有任意数量).它使用 iterrows 获取数据帧的每一行,过滤日期,并将值设置为数据帧中的值.$set 运算符将覆盖已经存在的值,如果未设置则设置它们.upsert=True 将在日期不匹配时执行插入操作.

This is generic solution that will combine dataframes (df_a, df_b in this case - you can have as many as you like) in the manner that you need. It uses iterrows to get each row of the dataframe, filters on the date, and sets the values to those in the dataframe. the $set operator will override values if they are there already and set them if not set. upsert=True will perform an insert if there's no match on the date.

for df in [df_a, df_b]:
    for _, row in df.iterrows():
        db.mycollection.update_one({'date': row.get('date')}, {'$set': row.to_dict()}, upsert=True)

完整示例:

from pymongo import MongoClient
from pprint import pprint
import datetime
import pandas as pd

# Sample data setup

db = MongoClient()['mydatabase']

data_a = [[datetime.datetime(2017, 5, 19, 21, 20), 96, 8, 98],
          [datetime.datetime(2017, 5, 19, 21, 21), 95, 8, 97],
          [datetime.datetime(2017, 5, 19, 21, 22), 95, 8, 97]]

df_a = pd.DataFrame(data_a, columns=['date', 'std_500_1000window', 'std_50_100window', 'std_50_2000window'])

data_b = [[datetime.datetime(2017, 5, 19, 21, 20), 98, 9, 10],
          [datetime.datetime(2017, 5, 19, 21, 21), 98, 9, 10],
          [datetime.datetime(2017, 5, 19, 21, 22), 98, 9, 10]]

df_b = pd.DataFrame(data_b, columns=['date', 'std_50_3000window', 'std_50_300window', 'std_50_500window'])

# Perform the upserts

for df in [df_a, df_b]:
    for _, row in df.iterrows():
        db.mycollection.update_one({'date': row.get('date')}, {'$set': row.to_dict()}, upsert=True)

# Print the results

for record in db.mycollection.find():
    pprint(record)

结果:

{'_id': ObjectId('5f0ae909df5531ac655ce528'),
 'date': datetime.datetime(2017, 5, 19, 21, 20),
 'std_500_1000window': 96,
 'std_50_100window': 8,
 'std_50_2000window': 98,
 'std_50_3000window': 98,
 'std_50_300window': 9,
 'std_50_500window': 10}
{'_id': ObjectId('5f0ae909df5531ac655ce52a'),
 'date': datetime.datetime(2017, 5, 19, 21, 21),
 'std_500_1000window': 95,
 'std_50_100window': 8,
 'std_50_2000window': 97,
 'std_50_3000window': 98,
 'std_50_300window': 9,
 'std_50_500window': 10}
{'_id': ObjectId('5f0ae909df5531ac655ce52c'),
 'date': datetime.datetime(2017, 5, 19, 21, 22),
 'std_500_1000window': 95,
 'std_50_100window': 8,
 'std_50_2000window': 97,
 'std_50_3000window': 98,
 'std_50_300window': 9,
 'std_50_500window': 10}

这篇关于使用 Pandas 向 mongoDB 插入新字段(列)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆