如何将数据批量上传到appengine数据存储区？旧的方法不起作用 [英] How to upload data in bulk to the appengine datastore? Older methods do not work

查看：105 发布时间：2018/5/3 18:17:22 python google-app-engine google-cloud-storage google-cloud-datastore

本文介绍了如何将数据批量上传到appengine数据存储区？旧的方法不起作用的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

这应该是一个相当普遍的要求，也是一个简单的过程：将数据批量上传到appengine数据存储区。

然而，在stackoverflow上提到的旧解决方案（下面的链接*）似乎没有工作了。使用数据库API上传到数据存储区时最合理的解决方案是bulkloader方法，它不适用于NDB API。现在批量加载程序方法似乎有已被弃用，旧的链接仍然存在于文档中，导致错误的页面。以下是一个示例

https://developers.google.com/appengine/docs/python/tools/uploadingdata

以上链接仍然存在于此页面上： https://developers.google.com/appengine/docs/python/tools/uploadinganapp

现在推荐的用于批量加载数据的方法是什么？

这两种可行的选择似乎是1）使用remote_api或2）将CSV文件写入GCS存储区并从中读取数据。任何人都有成功使用任何方法的经验？

任何指针将不胜感激。谢谢！

[*以下链接提供的解决方案不再有效]

[1] < a href =https://stackoverflow.com/questions/741599/how-does-one-upload-data-in-bulk-to-a-google-appengine-datastore>如何将数据批量上传到google appengine datastore？

[2]

方法1：使用remote_api

如何：写入 bulkloader.yaml 文件，并使用终端
中的 appcfg.py upload_data 命令直接运行它。我不推荐这种方法，原因有两个：1.巨大的延迟时间2.不支持对于NDB

方法2：使用GCS并使用mapreduce

上传数据文件到GCS：

使用 storage-file-transfer-json-python github项目（chunked_transfer.py）从本地系统上传文件到gcs。
请务必从应用引擎管理控制台生成适当的client-secrets.json文件。

Mapreduce： 使用 appengine-mapreduce github项目。将mapreduce文件夹复制到您的项目顶层文件夹中。

将以下行添加到您的app.yaml文件中：

包括： - mapreduce / include.yaml
下面是你的main.py文件
import cgi import webapp2 导入日志从模型导入os，csv 导入DataStoreModel 从google.appengine.api导入StringIO 从mapreduce导入app_identity 从mapreduce导入base_handler 导入mapreduce_pipeline从mapreduce导入操作从mapreduce.input_readers导入操作op import InputReader $ b $ def testmapperFunc（newRequest）： f = StringIO.StringIO（newRequest） reader = csv.reader（f，delimiter ='，'）用于阅读器中的行： newEntry = DataStoreModel（attr1 = row [0]，link = row [1]）$ b $ b yield op .db.Put（newEntry） class TestGCSReaderPipeline（base_handler.PipelineBase）： def run（sel mapreduce_pipeline.MapreducePipeline（ test_gcs，$ b $testgcs.testmapperFunc， mapreduce.input_readers.FileInputReader， mapper_params = { files：[filename]， format：'lines' }， shards = 1） class tempTestRequestGCSUpload（webapp2 .RequestHandler）： def get（self）： bucket_name = os.environ.get（'BUCKET_NAME'， app_identity.get_default_gcs_bucket_name（）） bucket = '/ gs /'+ bucket_name filename = bucket +'/'+'tempfile.csv' pipeline = TestGCSReaderPipeline（filename） pipeline.with_params（target =mapreducetestmodtest ） pipeline.start（） self.response.out.write（'done'） application = webapp2.WSGIAppl （[ （'/ gcsupload'，tempTestRequestGCSUpload）， ]，debug = True）
记住：

Mapreduce项目使用现在不推荐使用的Google Cloud Storage Files API 。因此，我们无法保证将来的支持。

Map reduce为数据存储读取和写入增加了一个小的开销。

方法3：GCS和GCS客户端库

使用gcs客户端库（将'cloudstorage'文件夹复制到您的系统中

将以下代码添加到应用程序的main.py文件中。
导入cgi 导入webapp2 导入日志导入jinja2 导入os，csv 导入cloudstorage as gcs from google.appengine.ext从google.appengine.api导入ndb 从模型导入app_identity 导入DataStoreModel $ b $ class UploadGCSData（webapp2.RequestHandler）： def get（self）： bucket_name = os.environ.get（'BUCKET_NAME'， app_identity.get_default_gcs_bucket_name（）） bucket ='/'+ bucket_name filename = bucket +'/tempfile.csv' self .upload_file（filename） def upload_file（self，filename）： gcs_file = gcs.open（filename） datareader = csv.reader（gcs_file） count = 0 entities = [] 用于datareader中的行： count + = 1 newProd = DataStoreModel（attr1 = row [0]，link = row [1]） entities.append（newProd） if count％50 == 0 and entities： ndb.put_multi（entities） entities = [] 如果实体： ndb.put_multi（实体） application = webapp2.WSGIApplication（[ （'/ gcsupload'，UploadGCSData）， ]， debug = True）

This should be a fairly common requirement, and a simple process: upload data in bulk to the appengine datastore.

However, none of the older solutions mentioned on stackoverflow (links below*) seem to work anymore. The bulkloader method, which was the most reasonable solution when uploading to the datastore using the DB API doesn't work with the NDB API

And now the bulkloader method seems to have been deprecated and the old links, which are still present in the docs, lead to the wrong page. Here's an example

https://developers.google.com/appengine/docs/python/tools/uploadingdata

This above link is still present on this page: https://developers.google.com/appengine/docs/python/tools/uploadinganapp

What is the recommended method for bulkloading data now?

The two feasible alternatives seem to be 1) using the remote_api or 2) writing a CSV file to a GCS bucket and reading from that. Anybody have experience successfully using either method?

Any pointers will be greatly appreciated. Thanks!

[*The solutions offered at the links below are no longer valid]

[1] how does one upload data in bulk to a google appengine datastore?

[2] How to insert bulk data in Google App Engine Datastore?
解决方案
Method 1: Use remote_api

How to : write a bulkloader.yaml file and run it directly using "appcfg.py upload_data" command from terminal I don’t recommend this method for a couple of reasons: 1. huge latency 2. no support for NDB

Method 2: GCS and use mapreduce

Uploading Data File to GCS:

Use the "storage-file-transfer-json-python" github project (chunked_transfer.py) to upload files to gcs from your local system. Make sure to generate proper "client-secrets.json" file from the app engine admin console.

Mapreduce:

Use the "appengine-mapreduce" github project. Copy the "mapreduce" folder to your project top-level folder.

Add the below line to your app.yaml file:
includes: - mapreduce/include.yaml
Below is your main.py file
import cgi import webapp2 import logging import os, csv from models import DataStoreModel import StringIO from google.appengine.api import app_identity from mapreduce import base_handler from mapreduce import mapreduce_pipeline from mapreduce import operation as op from mapreduce.input_readers import InputReader def testmapperFunc(newRequest): f = StringIO.StringIO(newRequest) reader = csv.reader(f, delimiter=',') for row in reader: newEntry = DataStoreModel(attr1=row[0], link=row[1]) yield op.db.Put(newEntry) class TestGCSReaderPipeline(base_handler.PipelineBase): def run(self, filename): yield mapreduce_pipeline.MapreducePipeline( "test_gcs", "testgcs.testmapperFunc", "mapreduce.input_readers.FileInputReader", mapper_params={ "files": [filename], "format": 'lines' }, shards=1) class tempTestRequestGCSUpload(webapp2.RequestHandler): def get(self): bucket_name = os.environ.get('BUCKET_NAME', app_identity.get_default_gcs_bucket_name()) bucket = '/gs/' + bucket_name filename = bucket + '/' + 'tempfile.csv' pipeline = TestGCSReaderPipeline(filename) pipeline.with_params(target="mapreducetestmodtest") pipeline.start() self.response.out.write('done') application = webapp2.WSGIApplication([ ('/gcsupload', tempTestRequestGCSUpload), ], debug=True)
To remember:

Mapreduce project uses the now-deprecated "Google Cloud Storage Files API". So support in future is not guaranteed.

Map reduce adds a small overhead to datastore reads and writes.

Method 3: GCS and GCS Client Library

Upload the csv/text file to gcs using the above file-transfer method.

Use gcs client library (copy the 'cloudstorage' folder to your application top-level folder).

Add the below code to the application main.py file.
import cgi import webapp2 import logging import jinja2 import os, csv import cloudstorage as gcs from google.appengine.ext import ndb from google.appengine.api import app_identity from models import DataStoreModel class UploadGCSData(webapp2.RequestHandler): def get(self): bucket_name = os.environ.get('BUCKET_NAME', app_identity.get_default_gcs_bucket_name()) bucket = '/' + bucket_name filename = bucket + '/tempfile.csv' self.upload_file(filename) def upload_file(self, filename): gcs_file = gcs.open(filename) datareader = csv.reader(gcs_file) count = 0 entities = [] for row in datareader: count += 1 newProd = DataStoreModel(attr1=row[0], link=row[1]) entities.append(newProd) if count%50==0 and entities: ndb.put_multi(entities) entities=[] if entities: ndb.put_multi(entities) application = webapp2.WSGIApplication([ ('/gcsupload', UploadGCSData), ], debug=True)

这篇关于如何将数据批量上传到appengine数据存储区？旧的方法不起作用的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

如何将数据批量上传到appengine数据存储区？旧的方法不起作用 [英] How to upload data in bulk to the appengine datastore? Older methods do not work

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

如何将数据批量上传到appengine数据存储区？旧的方法不起作用 [英] How to upload data in bulk to the appengine datastore? Older methods do not work

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭