我怎样才能JSON从谷歌的自然语言API序列化对象? (没有__dict__属性) [英] How can I JSON serialize an object from google's natural language API? (No __dict__ attribute)

查看:555
本文介绍了我怎样才能JSON从谷歌的自然语言API序列化对象? (没有__dict__属性)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我将Google Natural Language API用于具有情感分析功能的项目标记文本。我想将我的NL结果存储为JSON。如果向Google发出直接HTTP请求,则返回JSON响应。



但是,使用提供的Python库时,会返回对象,而该对象不是直接JSON可序列化。



以下是我的代码示例:

  import os 
import sys
import oauth2client.client $ b $ from google.cloud.gapic.language.v1beta2 import enums,language_service_client $ b $ from google.cloud.proto.language.v1beta2 import language_service_pb2

class LanguageReader:
#类,用于解析,存储和报告文本中的语言数据
$ b $ def __init __(self,content = None):

尝试:
#尝试从env变量
中认证凭证oauth2client.client.GoogleCredentials.get_application_default()
,除了oauth2client.client.ApplicationDefaultCredentialsError:
print(== =错误:Google凭据无法验证cated! ==)
print(此流程的当前环境变量是:{}。格式(os.environ ['GOOGLE_APPLICATION_CREDENTIALS']))
print(Run:)
打印($ export GOOGLE_APPLICATION_CREDENTIALS = / YOUR_PATH_HERE / YOUR_JSON_KEY_HERE.json)
print(手动设置认证凭证)
sys.exit()

self。 language_client = language_service_client.LanguageServiceClient()
self.document = language_service_pb2.Document()
self.document.type = enums.Document.Type.PLAIN_TEXT
self.encoding = enums.EncodingType.UTF32

self.results =无

如果内容不是无:
self.read_content(内容)

def read_content(self,内容):
self.document.content = content
self.language_client.analyze_sentiment(self.document,self.encoding)
self.results = self.language_c lient.analyze_sentiment(self.document,self.encoding)

现在如果您要运行:

  sample_text =我喜欢R& B音乐。马文Gaye是最好的。 'What's Going On'是我最喜欢的歌曲之一。 Marvin Gaye去世后,感到非常伤心。
resp = LanguageReader(sample_text).results
print resp

你会得到:

$ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ $ 0.40000000596
}
语言:en
句子{
text {
content:我喜欢R& B音乐。
}
情绪{
幅度:0.800000011921
得分:0.800000011921
}
}
句子{
text {
content:Marvin Gaye is the
begin_offset:18
}
情感{
幅度:0.800000011921
得分:0.800000011921
}
}
句子{
text {
content:\'What\'s Going On \'是我最喜欢的歌曲之一。
begin_offset:43
}
情绪{
量级:0.40000000596
得分:0.40000000596
}
}
句子{
text {
content:真是太难过了当马文Gaye死亡。
begin_offset:90
}
情绪{
幅度:0.20000000298
得分:-0.20000000298
}
}

哪个不是JSON。它是google.cloud.proto.language.v1beta2.language_service_pb2.AnalyzeSentimentResponse对象的一个​​实例。它没有__dict__属性属性,所以它不能通过使用json.dumps()来序列化。



如何指定响应应该以JSON还是序列化对象转换为JSON?

解决方案

编辑:@Zach指出Google的 protobuf 数据交换格式。看起来最好的选择是使用这些 protobuf.json_format 方法:

  from google.protobuf.json_format import MessageToDict,MessageToJson 
$ b $ self.dict = MessageToDict(self.results)
self.json = MessageToJson(self.results)

来自文档字符串:

  MessageToJson(message,including_default_value_fields = False,preserving_proto_field_name = False)
将protobuf消息转换为JSON格式。

参数:
消息:协议缓冲要序列化的消息实例。
includes_default_value_fields:如果为True,则单数原始字段,
重复字段和映射字段将始终被序列化。如果
False,只能序列化非空字段。单个消息字段
和其中一个字段不受此选项的影响。
preserving_proto_field_name:如果为True,请使用.proto文件中定义的原始原始字段
名称。如果为False,则将字段
名称转换为lowerCamelCase。

返回:
包含JSON格式的协议缓冲区消息的字符串。


I'm using the Google Natural Language API for a project tagging text with sentiment analysis. I want to store my NL results as JSON. If a direct HTTP request is made to Google then a JSON response is returned.

However when using the provided Python libraries an object is returned instead, and that object is not directly JSON serializable.

Here is a sample of my code:

import os
import sys
import oauth2client.client
from google.cloud.gapic.language.v1beta2 import enums, language_service_client
from google.cloud.proto.language.v1beta2 import language_service_pb2

class LanguageReader:
    # class that parses, stores and reports language data from text

    def __init__(self, content=None):

        try:
            # attempts to autheticate credentials from env variable
            oauth2client.client.GoogleCredentials.get_application_default()
        except oauth2client.client.ApplicationDefaultCredentialsError:
            print("=== ERROR: Google credentials could not be authenticated! ===")
            print("Current enviroment variable for this process is: {}".format(os.environ['GOOGLE_APPLICATION_CREDENTIALS']))
            print("Run:")
            print("   $ export GOOGLE_APPLICATION_CREDENTIALS=/YOUR_PATH_HERE/YOUR_JSON_KEY_HERE.json")
            print("to set the authentication credentials manually")
            sys.exit()

        self.language_client = language_service_client.LanguageServiceClient()
        self.document = language_service_pb2.Document()
        self.document.type = enums.Document.Type.PLAIN_TEXT
        self.encoding = enums.EncodingType.UTF32

        self.results = None

        if content is not None:
                self.read_content(content)

    def read_content(self, content):
        self.document.content = content
        self.language_client.analyze_sentiment(self.document, self.encoding)
        self.results = self.language_client.analyze_sentiment(self.document, self.encoding)

Now if you were to run:

sample_text="I love R&B music. Marvin Gaye is the best. 'What's Going On' is one of my favorite songs. It was so sad when Marvin Gaye died."
resp = LanguageReader(sample_text).results
print resp

You would get:

document_sentiment {
  magnitude: 2.40000009537
  score: 0.40000000596
}
language: "en"
sentences {
  text {
    content: "I love R&B music."
  }
  sentiment {
    magnitude: 0.800000011921
    score: 0.800000011921
  }
}
sentences {
  text {
    content: "Marvin Gaye is the best."
    begin_offset: 18
  }
  sentiment {
    magnitude: 0.800000011921
    score: 0.800000011921
  }
}
sentences {
  text {
    content: "\'What\'s Going On\' is one of my favorite songs."
    begin_offset: 43
  }
  sentiment {
    magnitude: 0.40000000596
    score: 0.40000000596
  }
}
sentences {
  text {
    content: "It was so sad when Marvin Gaye died."
    begin_offset: 90
  }
  sentiment {
    magnitude: 0.20000000298
    score: -0.20000000298
  }
}

Which is not JSON. It's an instance of the google.cloud.proto.language.v1beta2.language_service_pb2.AnalyzeSentimentResponse object. And it has no __dict__ attribute attribute so it is not serializable by using json.dumps().

How can I either specify that the response should be in JSON or serialize the object to JSON?

解决方案

Edit: @Zach noted Google's protobuf Data Interchange Format. It seems the preferred option would be to use these protobuf.json_format methods:

from google.protobuf.json_format import MessageToDict, MessageToJson 

self.dict = MessageToDict(self.results)
self.json = MessageToJson(self.results)

From the docstring:

MessageToJson(message, including_default_value_fields=False, preserving_proto_field_name=False)
    Converts protobuf message to JSON format.

    Args:
      message: The protocol buffers message instance to serialize.
      including_default_value_fields: If True, singular primitive fields,
          repeated fields, and map fields will always be serialized.  If
          False, only serialize non-empty fields.  Singular message fields
          and oneof fields are not affected by this option.
      preserving_proto_field_name: If True, use the original proto field
          names as defined in the .proto file. If False, convert the field
          names to lowerCamelCase.

    Returns:
      A string containing the JSON formatted protocol buffer message.

这篇关于我怎样才能JSON从谷歌的自然语言API序列化对象? (没有__dict__属性)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆