App Engine批量加载程序性能 [英] App Engine Bulk Loader Performance

查看：146 发布时间：2018/5/3 18:22:56 python performance google-app-engine bulk-load bulkloader

本文介绍了App Engine批量加载程序性能的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我正在使用App Engine批量加载程序（Python Runtime）将实体批量上传到数据存储。我上传的数据是以专有格式存储的，所以我通过自己的连接器（注册在 bulkload_config.py 中）将其转换为中间python字典。

  import google.appengine.ext.bulkload import connector_interface $ b $ class MyCustomConnector（connector_interface.ConnectorInterface）：
 .... 
 #Overridden方法
 def generate_import_record（self，filename，bulkload_state = None）：
 .... 
 yeild my_custom_dict

为了将这个中性python字典转换为数据存储实体，我使用了我在YAML中定义的自定义后导入函数。

  def feature_post_import（input_dict，entity_instance，bulkload_state）：
 .... 
 return [all_entities_to_put]

注意：我没有使用 entity_instance，bulkload_state code> feature_post_import 函数。我只是创建新的数据存储实体（基于我的 input_dict ），并返回它们。

现在，效果很好。然而，批量加载数据的过程似乎花费了太多时间。对于例如一个GB（约1,000,000个实体）的数据需要大约20个小时。我怎样才能提高批量加载过程的性能？我缺少什么？

我使用appcfg.py的一些参数是（10个线程，批量大小为每个线程10个实体）。

将Google App Engine Python群组链接发布： http://groups.google.com/group/google-appengine-python/browse_thread/thread/4c8def071a86c840

更新：
为了测试批量加载过程的性能，我加载了一个'Test' Kind 的 entities 。即使这个实体有一个非常简单的 FloatProperty ，它仍然花费我相同的时间来批量加载这些实体。

我仍然会尝试改变批量加载器参数， rps_limit ， bandwidth_limit 和 http_limit ，以查看是否可以获得更多吞吐量。方案
有一个参数叫做 rps_limit ，它决定了每秒上传的实体数量。这是主要的瓶颈。这个默认值是 20 。

同时增加 bandwidth_limit 以合理。

我增加了 rps_limit 至 500 我每1000个实体实现了5.5-6秒，这是一个从每1000个实体50秒的重大改进。
I am using the App Engine Bulk loader (Python Runtime) to bulk upload entities to the data store. The data that i am uploading is stored in a proprietary format, so i have implemented by own connector (registerd it in bulkload_config.py) to convert it to the intermediate python dictionary.
import google.appengine.ext.bulkload import connector_interface class MyCustomConnector(connector_interface.ConnectorInterface): .... #Overridden method def generate_import_record(self, filename, bulkload_state=None): .... yeild my_custom_dict
To convert this neutral python dictionary to a datastore Entity, i use a custom post import function that i have defined in my YAML.
def feature_post_import(input_dict, entity_instance, bulkload_state): .... return [all_entities_to_put]
Note: I am not using entity_instance, bulkload_state in my feature_post_import function. I am just creating new data store entities (based on my input_dict), and returning them.

Now, everything works great. However, the process of bulk loading data seems to take way too much time. For e.g. a GB (~ 1,000,000 entities) of data takes ~ 20 hours. How can I improve the performance of the bulk load process. Am i missing something?

Some of the parameters that i use with appcfg.py are (10 threads with a batch size of 10 entities per thread).

Linked a Google App Engine Python group post: http://groups.google.com/group/google-appengine-python/browse_thread/thread/4c8def071a86c840

Update: To test the performance of the Bulk Load process, I loaded entities of a 'Test' Kind. Even though this entity has a very simple FloatProperty, it still took me the same amount of time to bulk load those entities.

I am still going to try to vary the bulk loader parameters, rps_limit, bandwidth_limit and http_limit, to see if i can get any more throughput.
解决方案
There is parameter called rps_limit that determines the number of entities to upload per second. This was the major bottleneck. The default value for this is 20.

Also increase the bandwidth_limit to something reasonable.

I increased rps_limit to 500 and everything improved. I achieved 5.5 - 6 seconds per 1000 entities which is a major improvement from 50 seconds per 1000 entities.

这篇关于App Engine批量加载程序性能的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

App Engine批量加载程序性能 [英] App Engine Bulk Loader Performance

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录关闭

App Engine批量加载程序性能 [英] App Engine Bulk Loader Performance

问题描述

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭