Patentsview API Python 3.4版本 [英] Patentsview API Python 3.4

查看:42
本文介绍了Patentsview API Python 3.4版本的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我是python的初学者,目前正在使用Python进行小型项目. 我想为patentsview.org建立一个动态的专利研究脚本.

I am beginner in python, currently working on a small project with Python. I want to build a dynamic script for patent research for patentsview.org.

这是我的代码:

import urllib.parse
import urllib.request


#http://www.patentsview.org/api/patents/query?q={"_and":
[{"inventor_last_name":author},{"_text_any":{"patent_title":[title]}}]}&o=
{"matched_subentities_only": "true"}
author = "Jobs"
andreq = "_and"
invln = "inventor_last_name"
text = "_text_any"
patent = "patent_title"
match = "matched_subentities_only"
true = "true"
title = "computer"
urlbasic = "http://www.patentsview.org/api/patents/query"
patentall = {patent:title}
textall = {text:patentall}
invall = {invln:author}
andall = invall.copy()
andall.update(textall)
valuesq = {andreq:andall}
valuesqand = {andreq:andall}
valuesq = {andreq:valuesqand}
valueso = {match:true}

#########
url = "http://www.patentsview.org/api/patents/query"
values = {"q":valuesq,
          "o":valueso}
print(values)


data = urllib.parse.urlencode(values)
print(data)
############
data = data.encode("UTF-8")
print(data)
req = urllib.request.Request(url,data)
resp = urllib.request.urlopen(req)
respData = resp.read()
saveFile = open("patents.txt", "w")
saveFile.write(str(respData))
saveFile.close()

我认为我对动态URL有了正确的开始-但编码似乎给了我一个HTTP错误400:错误的请求. 如果我不编码,则该网址将类似于www.somethingsomething.org/o:{....},这显然会产生错误. 这是错误:

I think I got the right start for the dynamic URL - but the encoding seems to give me a HTTP Error 400: Bad request. If i dont encode, the url will be like www.somethingsomething.org/o:{....} which obviously produces an error. Here is the error:

Traceback (most recent call last):
  File "C:/Users/Max/PycharmProjects/KlayerValter/testen.py", line 38, in 
<module>
resp = urllib.request.urlopen(req)
File "C:\Python34\lib\urllib\request.py", line 161, in urlopen
return opener.open(url, data, timeout)
  File "C:\Python34\lib\urllib\request.py", line 469, in open
response = meth(req, response)
  File "C:\Python34\lib\urllib\request.py", line 579, in http_response
'http', request, response, code, msg, hdrs)
File "C:\Python34\lib\urllib\request.py", line 507, in error
return self._call_chain(*args)
  File "C:\Python34\lib\urllib\request.py", line 441, in _call_chain
result = func(*args)
File "C:\Python34\lib\urllib\request.py", line 587, in http_error_default
raise HTTPError(req.full_url, code, msg, hdrs, fp)
urllib.error.HTTPError: HTTP Error 400: Bad Request

Process finished with exit code 1

如果我编码,由于所有括号都被转换,因此我会收到相同的错误. Patentssview的API的工作方式如下:

If I encode, i get the same error since all brackets get converted. The API of patentsview works as follows:

http://www.patentsview.org/api/patents/query?q={"_or":[{"_and":
[{"inventor_last_name":"Whitney"},{"_text_phrase":{"patent_title":"cotton 
gin"}}]},{"_and":[{"inventor_last_name":"Hopper"},{"_text_all":
{"patent_title":"COBOL"}}]}]}

对于动态编程,我必须提供所有库名称.如果还有更好的解决方案,请帮助.

For dynamic programming I had to come up with all the library names. If there is also a better solution, please help.

最好的问候.

推荐答案

api接受并返回json数据,因此您应该使用 json.loads 字典,或仅写入文件.

The api accepts and returns json data, so you should use json.dumps to encode your post data. Then use json.loads on the response if you want a dictionary, or just write to file.

from urllib.request import Request, urlopen
import json

url = "http://www.patentsview.org/api/patents/query"
author = "Jobs"
title = "computer"
data = {
    'q':{
        "_and":[
            {"inventor_last_name":author},
            {"_text_any":{"patent_title":title}}
        ]
    }, 
    'o':{"matched_subentities_only": "true"}
}
resp = urlopen(Request(url, json.dumps(data).encode()))
data = resp.read()
#data = json.loads(data)

如克里斯蒂安(Christian)所建议,您可以简单地使用 requests urllib好得多.

As suggested by Christian, you could simply use requests, it's much better than urllib.

data = requests.post(url, json=data).json()


对于代码中的所有这些变量,它们都组成了一个字典,如下所示:


As for all those variables in your code, they compose a dictionary like the one below:

values = {"q":{andreq:{andreq:{invln:author, text:{patent:title}}}}, "o":{match:true}}

我不明白为什么您会经历所有麻烦来构建字典,但是我可能是错的.但是,您可以将代码包装在以authortitle作为参数的函数中.


使用requests,您不必在数据上使用json.dumps,只需使用json参数.如果要将响应内容保存到文件中,则应使用contenttext属性.

I don't see why you would go through all that trouble to build a dictionary but i could be wrong. However you could wrap your code in a function with author and title as arguments.


With requests you don't have to use json.dumps on your data, just use the json parameter. If you want to save the response content to file you should use the content or text attribute.

import requests

title = "computer" 
author = "Jobs" 
url = "http://www.patentsview.org/api/patents/query" 
data = { 
    "q":{ "_and":[ {"inventor_last_name":author}, {"_text_any":{"patent_title":title}}] }, 
    "o":{"matched_subentities_only":"true"} 
} 
resp = requests.post(url, json=data) 
with open("patents.txt", "w") as f:
    f.write(resp.text)

这篇关于Patentsview API Python 3.4版本的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆