Apache Beam Google Datastore ReadFromDatastore实体protobuf [英] Apache Beam Google Datastore ReadFromDatastore entity protobuf

查看:984
本文介绍了Apache Beam Google Datastore ReadFromDatastore实体protobuf的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图使用apache beam的google datastore api来ReadFromDatastore

  p = beam.Pipeline(options = options)
(p
|'Read from Datastore'>> ReadFromDatastore(gcloud_options.project,query)
|'reformat'>> beam.Map(reformat)
| 'Write To Datastore'>> WriteToDatastore(gcloud_options.project))

传入的对象我的格式化功能是类型



google.cloud.proto.datastore.v1.entity_pb2.Entity



它是在protobuf格式,这是很难修改或阅读。



我想我可以转换entity_pb2.Entity为一个字典与

  entity = dict(google.cloud.datastore.helpers._property_tuples(entity_pb))

但由于某种原因,试图导入以下两个库会导致一些错误:

 导入google.cloud.datastore.helpers 
f rom apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore

错误:

 追踪(最近最后一次通话):
文件/home/nburn42/MotoGarage/MotoGarage/MotoGarageBackgroundJobs/format_data.py,行16,在< module>
导入google.cloud.datastore.helpers
文件/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py,第57行,位于<模块>
from google.cloud.datastore.batch import batch
文件/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py​​,第24行,在<模块>
from google.cloud.datastore import helpers
文件/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py,第29行,位于<模块>
from google.cloud.grpc.datastore.v1 import entity_pb2 as _entity_pb2
文件/usr/local/lib/python2.7/dist-packages/google/cloud/grpc/datastore/v1/entity_pb2 .py,第28行,在< module>
dependencies = [google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,])
文件/usr/local/lib/python2.7/dist-packages/google/protobuf/描述符.py,第824行,在__new__
中返回_message.default_pool.AddSerializedFile(serialized_pb)
TypeError:无法将原始文件构建到描述符池中!
文件google / cloud / grpc / datastore / v1 / entity.proto的原始描述符无效:
google.datastore.v1.PartitionId.project_id:google.datastore.v1.PartitionId.project_id已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.PartitionId.namespace_id:google.datastore.v1.PartitionId.namespace_id已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.PartitionId:google.datastore.v1.PartitionId已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.partition_id:google.datastore.v1.Key.partition_id已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.path:google.datastore.v1.Key.path已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.PathElement.id_type:google.datastore.v1.Key.PathElement.id_type已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.PathElement.kind:google.datastore.v1.Key.PathElement.kind已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.PathElement.id:google.datastore.v1.Key.PathElement.id已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.PathElement.name:google.datastore.v1.Key.PathElement.name已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.PathElement:google.datastore.v1.Key.PathElement已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key:google.datastore.v1.Key已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.ArrayValue.values:google.datastore.v1.ArrayValue.values已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.ArrayValue:google.datastore.v1.ArrayValue已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.value_type:google.datastore.v1.Value.value_type已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.null_value:google.datastore.v1.Value.null_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.boolean_value:google.datastore.v1.Value.boolean_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.integer_value:google.datastore.v1.Value.integer_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.double_value:google.datastore.v1.Value.double_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.timestamp_value:google.datastore.v1.Value.timestamp_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.key_value:google.datastore.v1.Value.key_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.string_value:google.datastore.v1.Value.string_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.blob_value:google.datastore.v1.Value.blob_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.geo_point_value:google.datastore.v1.Value.geo_point_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.entity_value:google.datastore.v1.Value.entity_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.array_value:google.datastore.v1.Value.array_value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.meaning:google.datastore.v1.Value.meaning已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value.exclude_from_indexes:google.datastore.v1.Value.exclude_from_indexes已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Value:google.datastore.v1.Value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Entity.key:google.datastore.v1.Entity.key已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Entity.properties:google.datastore.v1.Entity.properties已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Entity.PropertiesEntry.key:google.datastore.v1.Entity.PropertiesEntry.key已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Entity.PropertiesEntry.value:google.datastore.v1.Entity.PropertiesEntry.value已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Entity.PropertiesEntry:google.datastore.v1.Entity.PropertiesEntry已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Entity:google.datastore.v1.Entity已在文件google / cloud / proto / datastore / v1 / entity.proto中定义。
google.datastore.v1.Key.partition_id:google.datastore.v1.PartitionId似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,谷歌/云/ GRPC /数据存储/ V1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.Key.path:google.datastore.v1.Key.PathElement似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,不会导入通过google / cloud / grpc / datastore / v1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.ArrayValue.values:google.datastore.v1.Value似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,谷歌/云/ GRPC /数据存储/ V1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.Value.key_value:google.datastore.v1.Key似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,而不是由谷歌/云/ GRPC /数据存储/ V1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.Value.entity_value:google.datastore.v1.Entity似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,谷歌/云/ GRPC /数据存储/ V1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.Value.array_value:google.datastore.v1.ArrayValue似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,而不是由谷歌/云/ GRPC /数据存储/ V1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.Entity.PropertiesEntry.value:google.datastore.v1.Value似乎已在google / cloud / proto / datastore / v1 / entity.proto中定义,但未导入通过google / cloud / grpc / datastore / v1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.Entity.key:google.datastore.v1.Key似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,谷歌/云/ GRPC /数据存储/ V1 / entity.proto。要在这里使用它,请添加必要的导入。
google.datastore.v1.Entity.properties:google.datastore.v1.Entity.PropertiesEntry似乎是在google / cloud / proto / datastore / v1 / entity.proto中定义的,未导入通过google / cloud / grpc / datastore / v1 / entity.proto。要在这里使用它,请添加必要的导入。

有什么我可以将entity_pb2.Entity转换为可用的东西吗?

ReadFromDatastore对于真正的使用来说现在太新了吗?

我应该使用另一种方法吗?



感谢,

Nathan

解决方案

指定查询的另一种方法(更简单)如下:

  from google.cloud import datastore 
from google.cloud.datastore导入查询作为datastore_query
从apache_beam.io.gcp.datastore.v1.datastoreio导入ReadFromDatastore

p = beam.Pipeline(options = pipeline_options)
ds_client = datastore.Client(project = project)
query = ds_client.query(kind = kind)
#可能的过滤器:query.add_filter('column','operator',criteria)
#query .add_filter('age','>',18)
#query.add_filter('name','=',John)
query = datastore_query._pb_from_query(query)

p | 'ReadFromDatastore'>> ReadFromDatastore(project = project,query = query)
p.run()。wait_until_finish()



<将作业传输到DataflowRunner(在云中)时,请确保您的本地要求符合您传送给Google云的setup.py文件。我已经体会到,您必须在本地计算机上安装apache beam 2.1.0,然后在setup.py文件中指定相同的版本,以便在云工作人员身上运行。

I am trying to use apache beam's google datastore api to ReadFromDatastore

p = beam.Pipeline(options=options)
(p
 | 'Read from Datastore' >> ReadFromDatastore(gcloud_options.project, query)
 | 'reformat'            >> beam.Map(reformat)
 | 'Write To Datastore'  >> WriteToDatastore(gcloud_options.project))

The object that gets passed to my reformat function is type

google.cloud.proto.datastore.v1.entity_pb2.Entity

It is in protobuf format which is hard to modify or read.

I think I can convert a entity_pb2.Entity to a dict with

entity= dict(google.cloud.datastore.helpers._property_tuples(entity_pb))

But for some reason trying to import the following two libraries gives me some errors:

import google.cloud.datastore.helpers  
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore 

Error:

Traceback (most recent call last):
  File "/home/nburn42/MotoGarage/MotoGarage/MotoGarageBackgroundJobs/format_data.py", line 16, in <module>
    import google.cloud.datastore.helpers
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/__init__.py", line 57, in <module>
    from google.cloud.datastore.batch import Batch
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/batch.py", line 24, in <module>
    from google.cloud.datastore import helpers
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/datastore/helpers.py", line 29, in <module>
    from google.cloud.grpc.datastore.v1 import entity_pb2 as _entity_pb2
  File "/usr/local/lib/python2.7/dist-packages/google/cloud/grpc/datastore/v1/entity_pb2.py", line 28, in <module>
    dependencies=[google_dot_api_dot_annotations__pb2.DESCRIPTOR,google_dot_protobuf_dot_struct__pb2.DESCRIPTOR,google_dot_protobuf_dot_timestamp__pb2.DESCRIPTOR,google_dot_type_dot_latlng__pb2.DESCRIPTOR,])
  File "/usr/local/lib/python2.7/dist-packages/google/protobuf/descriptor.py", line 824, in __new__
    return _message.default_pool.AddSerializedFile(serialized_pb)
TypeError: Couldn't build proto file into descriptor pool!
Invalid proto descriptor for file "google/cloud/grpc/datastore/v1/entity.proto":
  google.datastore.v1.PartitionId.project_id: "google.datastore.v1.PartitionId.project_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.PartitionId.namespace_id: "google.datastore.v1.PartitionId.namespace_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.PartitionId: "google.datastore.v1.PartitionId" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.partition_id: "google.datastore.v1.Key.partition_id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.path: "google.datastore.v1.Key.path" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.id_type: "google.datastore.v1.Key.PathElement.id_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.kind: "google.datastore.v1.Key.PathElement.kind" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.id: "google.datastore.v1.Key.PathElement.id" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement.name: "google.datastore.v1.Key.PathElement.name" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.PathElement: "google.datastore.v1.Key.PathElement" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key: "google.datastore.v1.Key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.ArrayValue.values: "google.datastore.v1.ArrayValue.values" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.ArrayValue: "google.datastore.v1.ArrayValue" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.value_type: "google.datastore.v1.Value.value_type" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.null_value: "google.datastore.v1.Value.null_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.boolean_value: "google.datastore.v1.Value.boolean_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.integer_value: "google.datastore.v1.Value.integer_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.double_value: "google.datastore.v1.Value.double_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.timestamp_value: "google.datastore.v1.Value.timestamp_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.key_value: "google.datastore.v1.Value.key_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.string_value: "google.datastore.v1.Value.string_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.blob_value: "google.datastore.v1.Value.blob_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.geo_point_value: "google.datastore.v1.Value.geo_point_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.entity_value: "google.datastore.v1.Value.entity_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.array_value: "google.datastore.v1.Value.array_value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.meaning: "google.datastore.v1.Value.meaning" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value.exclude_from_indexes: "google.datastore.v1.Value.exclude_from_indexes" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Value: "google.datastore.v1.Value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.key: "google.datastore.v1.Entity.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.properties" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.PropertiesEntry.key: "google.datastore.v1.Entity.PropertiesEntry.key" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Entity.PropertiesEntry.value" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity.PropertiesEntry: "google.datastore.v1.Entity.PropertiesEntry" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Entity: "google.datastore.v1.Entity" is already defined in file "google/cloud/proto/datastore/v1/entity.proto".
  google.datastore.v1.Key.partition_id: "google.datastore.v1.PartitionId" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Key.path: "google.datastore.v1.Key.PathElement" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.ArrayValue.values: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Value.key_value: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Value.entity_value: "google.datastore.v1.Entity" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Value.array_value: "google.datastore.v1.ArrayValue" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Entity.PropertiesEntry.value: "google.datastore.v1.Value" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Entity.key: "google.datastore.v1.Key" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.
  google.datastore.v1.Entity.properties: "google.datastore.v1.Entity.PropertiesEntry" seems to be defined in "google/cloud/proto/datastore/v1/entity.proto", which is not imported by "google/cloud/grpc/datastore/v1/entity.proto".  To use it here, please add the necessary import.

Is there something I can do to convert a entity_pb2.Entity to something usable?
Is the ReadFromDatastore just too new for real use right now?
Is there another approach I should be using?

Thanks,
Nathan

解决方案

An alternative (and easier) way to specify the query is the following:

from google.cloud import datastore
from google.cloud.datastore import query as datastore_query
from apache_beam.io.gcp.datastore.v1.datastoreio import ReadFromDatastore

p = beam.Pipeline(options=pipeline_options)
ds_client = datastore.Client(project=project)
query = ds_client.query(kind=kind)
# possible filter: query.add_filter('column','operator',criteria) 
# query.add_filter('age','>',18)
# query.add_filter('name','=',"John")
query = datastore_query._pb_from_query(query)

p | 'ReadFromDatastore' >> ReadFromDatastore(project=project, query=query)
p.run().wait_until_finish()

When transmitting the job to the DataflowRunner (in the cloud), make sure your local requirements are in line with the setup.py file you are transmitting to google cloud. I have experienced that you must install apache beam 2.1.0 on your local machine and then specifying the same version in your setup.py file in order for it to work on the cloud workers.

这篇关于Apache Beam Google Datastore ReadFromDatastore实体protobuf的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆