如何从网站获取一些属性进行抓取 [英] How can I get some attributes from a website for scraping

查看:27
本文介绍了如何从网站获取一些属性进行抓取的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试抓取网站时遇到问题.我尝试了几种方法来获取餐厅名称、美食、地址和星级,但我不断收到错误 'NoneType' object has no attribute 'text' 这表明此 tr.find("a", class_="sc-dakcWe sc-liNYZW cPIBpC") 对所有迭代返回 None.

I have an issue while trying to scrape a website. I have tried a couple of methods to get the restaurant_name, cuisine, address and star ratings but I keep getting the error 'NoneType' object has no attribute 'text' which shows that this tr.find("a", class_="sc-dakcWe sc-liNYZW cPIBpC") returns None for all the iterations.

我正在使用 zomato 餐厅,一个示例网址是 https://www.zomato.com/kanpur/top-restaurants

I am using the zomato restaurant and an example url for this is https://www.zomato.com/kanpur/top-restaurants

python 代码

import requests
from bs4 import BeautifulSoup

city = input("Enter your city: ")
url = "https://www.zomato.com/" + city + "/top-restaurants"
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"}
response = requests.get(url, headers=header)

html = response.text
soup = BeautifulSoup(html, "html.parser")
print(soup.title.text)

top_rest = soup.find("div", class_="bke1zw-0 cMipmx")
list_tr = top_rest.find_all("div", class_="bke1zw-1")

restaurants = []
address = []
cuisine = []
ratings = []

for tr in list_tr:
    restaurant_name = tr.find("a", class_="sc-dakcWe sc-liNYZW cPIBpC").text.replace("\n", " ")
    print(restaurant_name)
    print("\n")
    address_name = tr.find("a", class_="sc-hwNDZK sc-fAUOfn iNshJR").text.replace("\n", " ")
    cuisine_name = tr.find("a", class_="sc-hwNDZK sc-cbKXXB fBpaBs").text.replace("\n", " ")
    ratings_name = tr.find("p", class_="sc-1hez2tp-0 sc-jjgyjb bQGuFm").text.replace("\n", " ")
    
    restaurants.append(restaurant_name.strip())
    address.append(address_name.strip())
    cuisine.append(cuisine_name.strip())
    ratings.append(ratings_name.strip())

print(restaurants, address, cuisine, rating)

推荐答案

这是我如何做到的,不确定它是否最有效,但绝对有效:

Here's how I was able to do it, not sure if it is the most efficient but it definitely works:

import requests
from bs4 import BeautifulSoup
import re

city = input("Enter your city: ")
url = "https://www.zomato.com/" + city + "/top-restaurants"
header = {"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/91.0.4472.77 Safari/537.36"}
response = requests.get(url, headers=header)

html = response.text
soup = BeautifulSoup(html, "html.parser")
print(soup.title.text)

top_rest = soup.find("div", class_="bke1zw-0 cMipmx")

restaurants = []

restaurant_divs = top_rest.select("div > section > div")

for rdiv in restaurant_divs:
    name = rdiv.find("a", recursive=False).text.strip()
    rating_div, address_div, cuisine_div = rdiv.find_all("div", recursive=False)
    ratings = re.findall(r"([\d\.]+)\(([\d,]+)\)", rating_div.text)
    black_rating = (float(ratings[0][0]), int(ratings[0][1].replace(',', ''))) if ratings else (None, None)
    red_rating = (float(ratings[1][0]), int(ratings[1][1].replace(',', ''))) if len(ratings) > 1 else (None, None)
    restaurants.append({"name": name,
        "ratings": {
            "black": {
                "score": black_rating[0],
                "votes": black_rating[1]
                },
            "red": {
                "score": red_rating[0],
                "votes": red_rating[1]
                }
            },
        "location": address_div.text,
        "cuisine": cuisine_div.text
        })

请注意,某些 unicode 字符显示不正确,但可以稍后修复.

Note that some unicode characters aren't displayed correctly, but that can be fixed later.

我也冒昧地重新格式化了回复,请看:

I also took some liberty and reformatted the response, please take a look:

Top Restaurants in Kanpur | Zomato
[
  {
    "name": "Grill Inn",
    "ratings": {
      "black": {
        "score": 3.4,
        "votes": 77
      },
      "red": {
        "score": 4.1,
        "votes": 1668
      }
    },
    "location": "Shyam Nagar",
    "cuisine": "Quick Bites\u00a0-\u00a0Italian"
  },
  {
    "name": "Shri Bhojnalaya Restaurant & Sweets",
    "ratings": {
      "black": {
        "score": 4.2,
        "votes": 492
      },
      "red": {
        "score": 3.8,
        "votes": 8151
      }
    },
    "location": "Vijay Nagar",
    "cuisine": "Quick Bites\u00a0-\u00a0North Indian,\u00a0Chinese,\u00a0Fast Fo
od,\u00a0Desserts"
  },
  {
    "name": "Barbeque Nation",
    "ratings": {
      "black": {
        "score": 4.9,
        "votes": 716
      },
      "red": {
        "score": 4.1,
        "votes": 275
      }
    },
    "location": "Mall Road",
    "cuisine": "Casual Dining\u00a0-\u00a0North Indian,\u00a0Mughlai,\u00a0Leban
ese,\u00a0Arabian,\u00a0Mediterranean"
  },
  {
    "name": "Kukkad at Nukkad",
    "ratings": {
      "black": {
        "score": 3.3,
        "votes": 272
      },
      "red": {
        "score": 4.2,
        "votes": 500
      }
    },
    "location": "Swaroop Nagar",
    "cuisine": "Casual Dining\u00a0-\u00a0Mughlai,\u00a0North Indian"
  },
  {
    "name": "Tadka The Food Hub",
    "ratings": {
      "black": {
        "score": 3.8,
        "votes": 181
      },
      "red": {
        "score": 3.9,
        "votes": 6594
      }
    },
    "location": "Kidwai Nagar Market",
    "cuisine": "Quick Bites\u00a0-\u00a0Chinese,\u00a0North Indian,\u00a0South I
ndian"
  },
  {
    "name": "Smile Pizza",
    "ratings": {
      "black": {
        "score": 3.8,
        "votes": 94
      },
      "red": {
        "score": 4.0,
        "votes": 3142
      }
    },
    "location": "Kidwai Nagar",
    "cuisine": "Quick Bites\u00a0-\u00a0Pizza,\u00a0Fast Food"
  },
  {
    "name": "Arabian Broost Chicken",
    "ratings": {
      "black": {
        "score": 4.3,
        "votes": 420
      },
      "red": {
        "score": 4.2,
        "votes": 7051
      }
    },
    "location": "Chamanganj",
    "cuisine": "Quick Bites\u00a0-\u00a0Arabian"
  },
  {
    "name": "Chachi Vaishno Dhaba",
    "ratings": {
      "black": {
        "score": 4.2,
        "votes": 346
      },
      "red": {
        "score": 3.9,
        "votes": 6354
      }
    },
    "location": "Nandlal Chawraha",
    "cuisine": "Dhaba\u00a0-\u00a0North Indian"
  },
  {
    "name": "Barra House",
    "ratings": {
      "black": {
        "score": 4.2,
        "votes": 274
      },
      "red": {
        "score": 3.8,
        "votes": 6233
      }
    },
    "location": "Kanpur Cantt",
    "cuisine": "Quick Bites\u00a0-\u00a0North Indian,\u00a0Mughlai"
  },
  {
    "name": "Pashtun's",
    "ratings": {
      "black": {
        "score": 3.7,
        "votes": 28
      },
      "red": {
        "score": 3.9,
        "votes": 254
      }
    },
    "location": "Swaroop Nagar",
    "cuisine": "Quick Bites\u00a0-\u00a0Kebab,\u00a0North Indian"
  },
  {
    "name": "Agra Vala Sweets",
    "ratings": {
      "black": {
        "score": 4.0,
        "votes": 174
      },
      "red": {
        "score": 4.4,
        "votes": 5495
      }
    },
    "location": "Ashok Nagar",
    "cuisine": "Quick Bites,\u00a0Sweet Shop\u00a0-\u00a0Street Food,\u00a0Mitha
i"
  },
  {
    "name": "Al-Baik.Com",
    "ratings": {
      "black": {
        "score": 3.9,
        "votes": 77
      },
      "red": {
        "score": null,
        "votes": null
      }
    },
    "location": "Shyam Nagar",
    "cuisine": "Quick Bites\u00a0-\u00a0Fast Food"
  },
  {
    "name": "The Imperial Cord",
    "ratings": {
      "black": {
        "score": 3.8,
        "votes": 2519
      },
      "red": {
        "score": null,
        "votes": null
      }
    },
    "location": "Kakadeo",
    "cuisine": "Quick Bites\u00a0-\u00a0Fast Food,\u00a0North Indian"
  },
  {
    "name": "Google Fast Food",
    "ratings": {
      "black": {
        "score": 4.1,
        "votes": 144
      },
      "red": {
        "score": 4.2,
        "votes": 2006
      }
    },
    "location": "Nandlal Chawraha",
    "cuisine": "Quick Bites\u00a0-\u00a0Fast Food,\u00a0Chinese,\u00a0North Indi
an"
  },
  {
    "name": "Baba Foods",
    "ratings": {
      "black": {
        "score": 4.5,
        "votes": 1229
      },
      "red": {
        "score": 4.2,
        "votes": 2668
      }
    },
    "location": "Swaroop Nagar",
    "cuisine": "Quick Bites\u00a0-\u00a0North Indian,\u00a0Biryani,\u00a0Beverag
es,\u00a0Desserts"
  },
  {
    "name": "R S Bhojnalaya",
    "ratings": {
      "black": {
        "score": 4.2,
        "votes": 530
      },
      "red": {
        "score": null,
        "votes": null
      }
    },
    "location": "Kakadeo",
    "cuisine": "Bhojanalya\u00a0-\u00a0North Indian"
  },
  {
    "name": "Kerela Cafe",
    "ratings": {
      "black": {
        "score": 4.1,
        "votes": 299
      },
      "red": {
        "score": 3.9,
        "votes": 4191
      }
    },
    "location": "IIT Kanpur",
    "cuisine": "Quick Bites\u00a0-\u00a0South Indian"
  },
  {
    "name": "Mama Hotel",
    "ratings": {
      "black": {
        "score": 4.1,
        "votes": 364
      },
      "red": {
        "score": null,
        "votes": null
      }
    },
    "location": "Swaroop Nagar",
    "cuisine": "Quick Bites\u00a0-\u00a0North Indian,\u00a0Beverages"
  },
  {
    "name": "Gyan Vaishnav",
    "ratings": {
      "black": {
        "score": 4.6,
        "votes": 927
      },
      "red": {
        "score": null,
        "votes": null
      }
    },
    "location": "Ashok Nagar",
    "cuisine": "Casual Dining\u00a0-\u00a0North Indian"
  },
  {
    "name": "New Pizza Yum",
    "ratings": {
      "black": {
        "score": 3.8,
        "votes": 201
      },
      "red": {
        "score": 4.1,
        "votes": 2624
      }
    },
    "location": "Kakadeo",
    "cuisine": "Quick Bites\u00a0-\u00a0Pizza"
  },
  {
    "name": "Offline Cafe",
    "ratings": {
      "black": {
        "score": 4.1,
        "votes": 624
      },
      "red": {
        "score": 3.7,
        "votes": 1995
      }
    },
    "location": "Tilak Nagar",
    "cuisine": "Caf\u00e9\u00a0-\u00a0Cafe,\u00a0North Indian,\u00a0Fast Food"
  },
  {
    "name": "The Chocolate Room",
    "ratings": {
      "black": {
        "score": 4.0,
        "votes": 463
      },
      "red": {
        "score": 4.0,
        "votes": 5868
      }
    },
    "location": "Swaroop Nagar",
    "cuisine": "Caf\u00e9,\u00a0Dessert Parlor\u00a0-\u00a0Cafe,\u00a0Desserts"
  },
  {
    "name": "Mocha",
    "ratings": {
      "black": {
        "score": 4.7,
        "votes": 1146
      },
      "red": {
        "score": 4.0,
        "votes": 1138
      }
    },
    "location": "Mall Road",
    "cuisine": "Caf\u00e9,\u00a0Casual Dining\u00a0-\u00a0Cafe"
  }
]

这篇关于如何从网站获取一些属性进行抓取的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆