获取 AttributeError 错误“str"对象没有属性“get" [英] Getting AttributeError error 'str' object has no attribute 'get'

查看:42
本文介绍了获取 AttributeError 错误“str"对象没有属性“get"的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在处理 JSON 响应时遇到错误:

I am getting an error while working with JSON response:

Error: AttributeError: 'str' object has no attribute 'get'

可能是什么问题?

对于其余值,我也收到以下错误:

I am also getting the following errors for the rest of the values:

***类型错误:'builtin_function_or_method' 对象不可下标

'电话':值['_source']['primaryPhone'],KeyError: 'primaryPhone'***

# -*- coding: utf-8 -*-
import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'main'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

def parse(self, response):

    resp = json.loads(response.body)
    values = resp['hits']['hits']

    for value in values:

        yield {
            'Full Name': value['_source']['fullName'],
            'Phone': value['_source']['primaryPhone'],
            "Email": value['_source']['primaryEmail'],
            "City": value.get['_source']['city'],
            "Zip Code": value.get['_source']['zipcode'],
            "Website": value['_source']['websiteURL'],
            "Facebook": value['_source']['facebookURL'],
            "LinkedIn": value['_source']['LinkedIn_URL'],
            "Twitter": value['_source']['Twitter'],
            "BIO": value['_source']['Bio']
        }

推荐答案

它的嵌套比您想象的更深.这就是您收到错误的原因.

It's nested deeper than what you think it is. That's why you're getting an error.

import scrapy
import json


class MainSpider(scrapy.Spider):
    name = 'test'
    start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

    def parse(self, response):
        resp = json.loads(response.body)
        values = resp['hits']['hits']

        for value in values:
            yield {
                'Full Name': value['_source']['fullName'],
                'Primary Phone':value['_source']['primaryPhone']
            }

说明

resp 变量正在创建一个 python 字典,但是在这个 JSON 数据中没有 resp['hits']['hits']['fullName'].您要查找的数据,对于 fullName 实际上是 resp['hits']['hits'][i]['_source']['fullName'].i 是一个数字,因为 resp['hits']['hits'] 是一个列表.

Explanation

The resp variable is creating a python dictionary, but there is no resp['hits']['hits']['fullName'] within this JSON data. The data you're looking for, for fullName is actually resp['hits']['hits'][i]['_source']['fullName']. i being an number because resp['hits']['hits'] is a list.

resp['hits'] 是一个字典,因此 values 变量很好.但是 resp['hits']['hits'] 是一个列表,因此你不能使用 get 请求,它只接受数字作为 [] 中的值,而不是字符串.因此出现错误.

resp['hits'] is a dictionary and therefore the values variable is fine. But resp['hits']['hits'] is a list, therefore you can't use the get request, and it's only accepts numbers as values within [], not strings. Hence the error.

  1. 使用 response.json() 而不是 json.loads(response.body),从 Scrapy v2.2 开始,scrapy 现在内部支持 json.在幕后,它已经导入了 json.

  1. Use response.json() instead of json.loads(response.body), since Scrapy v2.2, scrapy now has support for json internally. Behind the scenes it already imports json.

还检查了 json 数据,为了方便起见,我使用了请求,只是不断嵌套,直到获得您需要的数据.

Also check the json data, I used requests for ease and just getting nesting down till I got the data you needed.

生成字典适用于这种类型的数据,因为它结构良好,但任何其他需要修改或更改的数据或在某些地方出错的数据.使用 Items 字典或 ItemLoader.与生成字典相比,这两种生成输出的方式具有更大的灵活性.我几乎从来没有出过字典,只有当你有高度结构化的数据时.

Yielding a dictionary is fine for this type of data as it's well structured, but any other data that needs modifying or changing or is wrong in places. Use either Items dictionary or ItemLoader. There's a lot more flexibility in those two ways of yielding an output than yielding a dictionary. I almost never yield a dictionary, the only time is when you have highly structured data.

更新代码

查看JSON数据,有相当多的缺失数据.这是网页抓取的一部分,您会发现这样的错误.这里我们使用 try 和 except 块,因为当我们得到 KeyError 时,这意味着 python 无法识别与值关联的键.我们必须处理那个异常,我们在这里通过说产生一个字符串 'No XXX'

Updated Code

Looking at the JSON data, there are quite a lot of missing data. This is part of web scraping you will find errors like this. Here we use a try and except block, for when we get a KeyError which means python hasn't been able to recognise the key associated with a value. We have to handle that exception, which we do here by saying to yield a string 'No XXX'

一旦开始出现间隙等,最好考虑使用 Items 字典或 Itemloaders.

Once you start getting gaps etc it's better to consider an Items dictionary or Itemloaders.

现在值得查看有关 Items 的 Scrapy 文档.从本质上讲,Scrapy 做了两件事,它从网站中提取数据,并提供一种存储这些数据的机制.它这样做的方式是将其存储在名为 Items 的字典中.代码与生成字典没有太大区别,但是 Items 字典允许您使用scrapy 可以执行的额外操作更轻松地操作提取的数据.您需要首先使用您想要的字段编辑您的 items.py.我们创建一个名为TestItem 的类,我们使用scrapy.Field() 定义每个字段.然后我们可以在我们的蜘蛛脚本中导入这个类.

Now it's worth looking at the Scrapy docs about Items. Essentially Scrapy does two things, it extracted data from websites, and it provides a mechanism for storing this data. The way it does this is storing it in a dictionary called Items. The code isn't that much different from yielding a dictionary but Items dictionary allows you to manipulate the extracted data more easily with extra things scrapy can do. You need to edit your items.py first with the fields you want. We create a class called TestItem, we define each field using scrapy.Field(). We then can import this class in our spider script.

import scrapy


class TestItem(scrapy.Item):
    # define the fields for your item here like:
    # name = scrapy.Field()
    full_name = scrapy.Field()
    Phone = scrapy.Field()
    Email = scrapy.Field()
    City = scrapy.Field()
    Zip_code = scrapy.Field()
    Website = scrapy.Field()
    Facebook = scrapy.Field()
    Linkedin = scrapy.Field()
    Twitter = scrapy.Field()
    Bio = scrapy.Field()

这里我们指定了我们想要的字段,不幸的是你不能使用带空格的字符串,因此为什么全名是 full_name.field() 为我们创建了项目字典的字段.

Here we're specifying what we want the fields to be, you can't use a string with spaces unfortunately hence why full name is full_name. The field() creates the field of the item dictionary for us.

我们使用 from ..items import TestItem 将此项目字典导入到我们的蜘蛛脚本中.from ..items 意味着我们正在将 items.py 从父文件夹带到蜘蛛脚本,并且我们正在导入类 TestItem.这样我们的蜘蛛就可以用我们的 json 数据填充 items 字典.

We import this item dictionary into our spider script with from ..items import TestItem. The from ..items means we're taking the items.py from the parent folder to the spider script and we're importing the class TestItem. That way our spider can populate the items dictionary with our json data.

请注意,就在 for 循环之前,我们通过 item = TestItem() 实例化了类 TestItem.实例化意味着调用类,在这种情况下它制作了一个字典.这意味着我们正在创建项目字典,然后我们用键和值填充该字典.您必须在添加键和值之前执行此操作,正如您在 for 循环中看到的那样.

Note that just before the for loop we instantiate the class TestItem by item = TestItem(). Instantiate means to call upon the class, in this case it makes a dictionary. This means we are creating the item dictionary and then we populate that dictionary with keys and values. You have to does this before you add your keys and values as you can see from within the for loop.

import scrapy
import json
from ..items import TestItem

class MainSpider(scrapy.Spider):
   name = 'test'
   start_urls = ['https://experts.expcloud.com/api4/std?searchterms=AB&size=216&from=0']

   def parse(self, response):
       resp = json.loads(response.body)
       values = response.json()['hits']['hits']
       item = TestItem()
       for value in values:
        try:
            item['full_name'] = value['_source']['fullName']
        except KeyError:
            item['full_name'] = 'No Name'
        try:
            item['Phone'] = value['_source']['primaryPhone']
        except KeyError:
            item['Phone'] = 'No Phone number'
        try:
            item["Email"] =  value['_source']['primaryEmail']
        except KeyError:
            item['Email'] = 'No Email'
        try:
            item["City"] = value['_source']['activeLocations'][0]['city']
        except KeyError:
            item['City'] = 'No City'
        try:
             item["Zip_code"] = value['_source']['activeLocations'][0]['zipcode']
        except KeyError:
            item['Zip_code'] = 'No Zip code'
                
        try:
            item["Website"] = value['AgentMarketingCenter'][0]['Website']
        except KeyError:
            item['Website'] = 'No Website'
               
        try:
            item["Facebook"] = value['_source']['AgentMarketingCenter'][0]['Facebook_URL']
        except KeyError:
            item['Facebook'] = 'No Facebook'
                
        try:
            item["Linkedin"] = value['_source']['AgentMarketingCenter'][0]['LinkedIn_URL']
        except KeyError:
            item['Linkedin'] = 'No Linkedin'    
        try:
            item["Twitter"] = value['_source']['AgentMarketingCenter'][0]['Twitter']
        except KeyError:
            item['Twitter'] = 'No Twitter'
        
        try:
             item["Bio"]: value['_source']['AgentMarketingCenter'][0]['Bio']
        except KeyError:
            item['Bio'] = 'No Bio'
               
        yield item
                    

这篇关于获取 AttributeError 错误“str"对象没有属性“get"的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆