使用MapReduce时，ndb模型不会保存在memcache中 [英] ndb Models are not saved in memcache when using MapReduce

查看：150 发布时间：2018/5/3 19:46:30 google-app-engine mapreduce memcached google-cloud-datastore app-engine-ndb

本文介绍了使用MapReduce时，ndb模型不会保存在memcache中的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我创建了两个MapReduce管道，用于上传CSV文件以批量创建类别和产品。每个产品都通过KeyProperty绑定到类别。类别和产品模型建立在ndb.Model上，所以基于文档，我认为从数据存储库中检索时，它们会自动缓存在Memcache中。

我已经在服务器上运行这些脚本来上传30个类别，然后再上传3000个产品。所有数据都按预期的方式出现在数据存储区中。

但是，似乎产品上传并未使用Memcache来获取类别。当我检查门户网站中的Memcache查看器时，它说明了命中计数大约为180，并且计数错误数在60左右。如果我每次上传3000个产品并检索该类别，应该不会有大约3000命中+错过获取类别（即Category.get_by_id（category_id））？在创建一个新的产品（算法可以处理实体创建和更新）之前，尝试检索现有产品的尝试可能还有3000多次。

以下是相关的产品映射函数，它从CSV文件中取出一行来创建或更新产品：

pre $ def $ product_bulk_import_map b产品批量导入地图功能。

result = {status：CREATED}
product_data =数据

尝试：
＃parse输入参数元组
byteoffset，line_data = data

＃解析基本产品数据
product_data = [x for csv.reader（[line_data]）] [0]
（p_id，c_id，p_type，p_description）= product_data

＃进程类别
category = Category.get_by_id（c_id）
如果category为None：
举例异常（product_import_error_messages [category]％c_id）

＃s如果产品不是无：
result [status] =UPDATED
product.category = category.key
product.product_type = p_type
product.description = p_description
else：
product = Product（
id = p_id，
category = category.key，
product_type = p_type，
description = p_description
）
product.put（）
result [entity] = product.to_dict（）
异常作为e：
＃捕获任何异常，并注意输出失败
result [status] =FAILED
result [entity] = str（e）

＃返回结果
yield（str（product_data），result）

解决方案

MapReduce故意为NDB禁用内存缓存。

请参阅 mapreduce / util。 py ln 373， _set_ndb_cache_policy（）（截至2015-05-01）：

  def _set_ndb_cache_policy（）：
告诉NDB永远不会在memcache或进程中缓存任何内容。 
 
这可确保通过NDB 
从数据存储区input_readers获取的实体不会膨胀请求内存大小，Datastore Puts将避免
调用memcache。没有这个，你会得到软内存限制退出，
会损害整体吞吐量。 

 ndb_ctx = ndb.get_context（）
 ndb_ctx.set_cache_policy（lambda key：False）
 ndb_ctx.set_memcache_policy（lambda key：False）

您可以强制 get_by_id（）和放入（）来使用memcache，例如：

  product = Product.get_by_id（p_id，use_memcache = True ）
 ... 
 product.put（use_memcache = True）

另外，你可以修改NDB上下文，如果你正在批量放入 mapreduce.operation ，但我不知道这是否有其他不良影响：

  ndb_ctx = ndb.get_context（）
 ndb_ctx.set_memcache_policy（lambda键：True）
 ... 
 yield operation.db.Put（product）

至于关于软内存限制退出，我不明白为什么会发生如果只启用了memcache （即没有上下文缓存）。

它实际上y看起来像你想让memcache被启用put，否则你的app最终会在你的mapper修改了下面的数据后从NDB的memcache中读取陈旧的数据。 I've created two MapReduce Pipelines for uploading CSVs files to create Categories and Products in bulk. Each product is gets tied to a Category through a KeyProperty. The Category and Product models are built on ndb.Model, so based on the documentation, I would think they'd be automatically cached in Memcache when retrieved from the Datastore. I've run these scripts on the server to upload 30 categories and, afterward, 3000 products. All the data appears in the Datastore as expected. However, it doesn't seem like the Product upload is using Memcache to get the Categories. When I check the Memcache viewer in the portal, it says something along the lines of the hit count being around 180 and the miss count around 60. If I was uploading 3000 products and retrieving the category each time, shouldn't I have around 3000 hits + misses from fetching the category (ie, Category.get_by_id(category_id))? And likely 3000 more misses from attempting to retrieve the existing product before creating a new one (algorithm handles both entity creation and updates). Here's the relevant product mapping function, which takes in a line from the CSV file in order to create or update the product: def product_bulk_import_map(data): """Product Bulk Import map function.""" result = {"status" : "CREATED"} product_data = data try: # parse input parameter tuple byteoffset, line_data = data # parse base product data product_data = [x for x in csv.reader([line_data])][0] (p_id, c_id, p_type, p_description) = product_data # process category category = Category.get_by_id(c_id) if category is None: raise Exception(product_import_error_messages["category"] % c_id) # store in datastore product = Product.get_by_id(p_id) if product is not None: result["status"] = "UPDATED" product.category = category.key product.product_type = p_type product.description = p_description else: product = Product( id = p_id, category = category.key, product_type = p_type, description = p_description ) product.put() result["entity"] = product.to_dict() except Exception as e: # catch any exceptions, and note failure in output result["status"] = "FAILED" result["entity"] = str(e) # return results yield (str(product_data), result) 解决方案 MapReduce intentionally disables memcache for NDB. See mapreduce/util.py ln 373, _set_ndb_cache_policy() (as of 2015-05-01): def _set_ndb_cache_policy(): """Tell NDB to never cache anything in memcache or in-process. This ensures that entities fetched from Datastore input_readers via NDB will not bloat up the request memory size and Datastore Puts will avoid doing calls to memcache. Without this you get soft memory limit exits, which hurts overall throughput. """ ndb_ctx = ndb.get_context() ndb_ctx.set_cache_policy(lambda key: False) ndb_ctx.set_memcache_policy(lambda key: False) You can force get_by_id() and put() to use memcache, eg: product = Product.get_by_id(p_id, use_memcache=True) ... product.put(use_memcache=True) Alternatively, you can modify the NDB context if you are batching puts together with mapreduce.operation. However I don't know enough to say whether this has other undesired effects: ndb_ctx = ndb.get_context() ndb_ctx.set_memcache_policy(lambda key: True) ... yield operation.db.Put(product) As for the docstring about "soft memory limit exits", I don't understand why that would occur if only memcache was enabled (ie. no in-context cache). It actually seems like you want memcache to be enabled for puts, otherwise your app ends up reading stale data from NDB's memcache after your mapper has modified the data underneath. 这篇关于使用MapReduce时，ndb模型不会保存在memcache中的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用MapReduce时，ndb模型不会保存在memcache中 [英] ndb Models are not saved in memcache when using MapReduce

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

使用MapReduce时，ndb模型不会保存在memcache中 [英] ndb Models are not saved in memcache when using MapReduce

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭