Python Web刮擦同一个类的HTML [英] Python web scraping HTML with same class

查看:24
本文介绍了Python Web刮擦同一个类的HTML的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想问一下如何从

I would like to ask how can i extract the event's fees from this website using python libraries (beautifulSoup) for web scraping.

但是,活动的费用与其他属性共用同一类.我想问一下是否有任何建议只提取费用.我尝试了 find_next find_next_sibling find next_parent ,但仍然没有用.以下是价格类所在的原始html代码:

However, the event's fee share the same class with other properties. I would like to ask is there any suggestions to extract only the fees. I have try find_next, find_next_sibling and find next_parent but still no use. Below is the raw html code where the price's class located:

<div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped">Free</div>

如果能提供帮助,我将不胜感激.

I would appreciate if any help provided.

下面是我尝试的代码.我只在数组中得到一个标签列表.

Below is the code that i have try. I only get a list of tag in my array.

import pandas as pd
import requests
from bs4 import BeautifulSoup

url = 'https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=1'

response = requests.get(url)

soup = BeautifulSoup(response.text, "html.parser")

#Finding common container for each event
containers = soup.find_all('article', class_ = 'eds-l-pad-all-4 eds-event-card-content eds-event-card-content--list eds-event-card-content--standard eds-event-card-content--fixed eds-l-pad-vert-3')

event_fees = []

for container in containers:
        fees = soup.select('div', class_ ='eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped')
        event_fees.append(fees.txt)

推荐答案

有关价格的数据是从外部URL加载的.您可以使用 requests / json 模块来获取它:

The data about prices is loaded from external URL. You can use requests/json modules to get it:

import re
import json
import requests


url = "https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=1"
events_url = 'https://www.eventbrite.com/api/v3/destination/events/?event_ids={event_ids}&expand=event_sales_status,primary_venue,image,saves,my_collections,ticket_availability&page_size=99999'
html_text = requests.get(url).text

data1 = json.loads( re.search(r'window\.__SERVER_DATA__ = ({.*});', html_text).group(1) )

# uncomment this to print all data:
# print(json.dumps(data1, indent=4))

event_ids = ','.join(r['id'] for r in data1['search_data']['events']['results'])
data2 = requests.get(events_url.format(event_ids=event_ids)).json()

# uncomment this to print all data:
# print(json.dumps(data2, indent=4))

for e in data2['events']:
    print(e['name'])
    print(e['ticket_availability']['minimum_ticket_price']['display'],'-',e['ticket_availability']['maximum_ticket_price']['display'])
    print('-' * 80)

打印:

Mega Career Fair & Post Graduate Education Fair 2020 - Mid Valley KL
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Post Graduate Education Fair 2020 - Mid Valley KL
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Traders Fair 2021 - Malaysia (Financial Education Event)
0.00 USD - 199.00 USD
--------------------------------------------------------------------------------
THE FIT Malaysia
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Walk-In Interview with Career Partners of HRDF
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Entrepreneurship for Beginners - Startup | Entrepreneur Hackathon Webinar
0.00 EUR - 0.00 EUR
--------------------------------------------------------------------------------
Good Shepherd Catholic Church  English Mass Registration- Scroll Down  pls
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
CGH 10:00am Assumption Mass Registration
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Kuala Lumpu Video Speed Dating - Filter Off
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Wiki Finance EXPO Kuala Lumpur 2021
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
English Sunday Service - 16 AUGUST
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Good Shepherd Catholic  Bahasa Malaysia Mass Registration. Pls scroll down
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
How To Improve Your Focus and Limit Distractions - Kuala Lumpur
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
ANNUAL GENERAL MEETING
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
ITS ALL ABOUT PORTRAIT
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
First service (English)
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
KL International Flea Market 2020 / Bazaar Antarabangsa Kuala Lumpur
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Branding Strategies For Startups
10.50 MYR - 31.50 MYR
--------------------------------------------------------------------------------
SHC 9.15am Sunday Mass Registration
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
SHC 9.15am Sunday Mass (Tamil) திருஇருதய ஆண்டவர் ஆலயத்தில்  காலை  9.15க்கு
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------

这篇关于Python Web刮擦同一个类的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆