如何在使用 MapReduce API 映射到云存储之前过滤数据存储数据? [英] How to filter datastore data before mapping to cloud storage using the MapReduce API?

查看:15
本文介绍了如何在使用 MapReduce API 映射到云存储之前过滤数据存储数据?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

关于代码实验室 here,我们如何在 mapreduce 作业中过滤数据存储数据,而不是获取特定实体类型的所有对象?

Regarding the code lab here, how can we filter datastore data within the mapreduce jobs rather than fetching all objects for a certain entity kind?

在下面的映射器管道定义中,唯一的一个输入读取器参数是要处理的实体类型,我在 InputReader 类中看不到其他类型过滤器的参数可以提供帮助.

In the mapper pipeline definition below, the only one input reader parameter is the entity kind to process and I can't see other parameters of type filter in the InputReader class that could help.

output = yield mapreduce_pipeline.MapperPipeline(
  "Datastore Mapper %s" % entity_type,
  "main.datastore_map",
  "mapreduce.input_readers.DatastoreInputReader",
  output_writer_spec="mapreduce.output_writers.FileOutputWriter",
  params={
      "input_reader":{
          "entity_kind": entity_type,
          },
      "output_writer":{
          "filesystem": "gs",
          "gs_bucket_name": GS_BUCKET,
          "output_sharding":"none",
          }
      },
      shards=100)

由于 Google BigQuery 在非规范化数据模型上表现得更好,能够从多个数据存储实体类型 (JOIN) 构建一个表会很好,但我也看不到如何做?

Since Google BigQuery plays better with unormalized data model, it would be nice to be able to build one table from several datastore entity kinds (JOINs) but I can't see how to do so as well?

推荐答案

根据您的应用程序,您可以通过传递过滤器参数来解决此问题,该参数是应用于查询的过滤器的可选列表.每个过滤器是一个元组:(, , ."

Depending on your application, you might be able to solve this by passing a filter parameter which is "an optional list of filters to apply to the query. Each filter is a tuple: (<property_name_as_str>, <query_operation_as_str>, <value>."

因此,在您输入的阅读器参数中:

So, in your input reader parameters:

"input_reader":{
          "entity_kind": entity_type,
          "filters": [("datastore_property", "=", 12345),
                      ("another_datastore_property", ">", 200)]
}

这篇关于如何在使用 MapReduce API 映射到云存储之前过滤数据存储数据?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆