如何在使用MapReduce API映射到云存储之前过滤数据存储区数据？ [英] How to filter datastore data before mapping to cloud storage using the MapReduce API?

查看：114 发布时间：2018/5/7 17:23:30 google-cloud-datastore google-bigquery google-cloud-storage

本文介绍了如何在使用MapReduce API映射到云存储之前过滤数据存储区数据？的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

关于代码实验室 here ，我们如何过滤mapreduce作业中的数据存储区数据，而不是获取某个实体类型的所有对象？

在下面的mapper管道定义中，只有一个输入读取器参数是要处理的实体类型，并且我无法在InputReader类中看到类型过滤器的其他参数可以提供帮助。

  output = yield mapreduce_pipeline.MapperPipeline（
Datastore Mapper％s％entity_type，
main.datastore_map，
mapreduce.input_readers.DatastoreInputReader，
 output_writer_spec =mapreduce.output_writers.FileOutputWriter，
 params = {
input_reader：{
entity_kind：entity_type，
}，
output_writer ：{
filesystem：gs ，
gs_bucket_name：GS_BUCKET，
output_sharding：none，
} 
}，
 shards = 100）
 解决方案
根据您的应用程序，您可能可以解决此问题通过传递一个过滤器参数，该参数是可选的过滤器列表以应用于查询。每个过滤器都是一个元组：（< property_name_as_str>，< query_operation_as_str>，<值> 。
 
 
 所以，在你的输入阅读器参数中： 
 
 
 input_reader：{
entity_kind：entity_type，
filters：[（datastore_property，=，12345），
（another_datastore_property，>，200）] 
} 
  
 
Regarding the code lab here, how can we filter datastore data within the mapreduce jobs rather than fetching all objects for a certain entity kind?


In the mapper pipeline definition below, the only one input reader parameter is the entity kind to process and I can't see other parameters of type filter in the InputReader class that could help.
output = yield mapreduce_pipeline.MapperPipeline(
  "Datastore Mapper %s" % entity_type,
  "main.datastore_map",
  "mapreduce.input_readers.DatastoreInputReader",
  output_writer_spec="mapreduce.output_writers.FileOutputWriter",
  params={
      "input_reader":{
          "entity_kind": entity_type,
          },
      "output_writer":{
          "filesystem": "gs",
          "gs_bucket_name": GS_BUCKET,
          "output_sharding":"none",
          }
      },
      shards=100)
Since Google BigQuery plays better with unormalized data model, it would be nice to be able to build one table from several datastore entity kinds (JOINs) but I can't see how to do so as well?
 解决方案 
Depending on your application, you might be able to solve this by passing a filter parameter which is "an optional list of filters to apply to the query. Each filter is a tuple: (<property_name_as_str>, <query_operation_as_str>, <value>."

So, in your input reader parameters:
"input_reader":{
          "entity_kind": entity_type,
          "filters": [("datastore_property", "=", 12345),
                      ("another_datastore_property", ">", 200)]
}


                        
这篇关于如何在使用MapReduce API映射到云存储之前过滤数据存储区数据？的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

如何在使用MapReduce API映射到云存储之前过滤数据存储区数据？ [英] How to filter datastore data before mapping to cloud storage using the MapReduce API?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录关闭

如何在使用MapReduce API映射到云存储之前过滤数据存储区数据？ [英] How to filter datastore data before mapping to cloud storage using the MapReduce API?

问题描述

相关文章

其他开发最新文章

热门教程

热门工具

登录 关闭

登录关闭