Python Web刮擦同一个类的HTML [英] Python web scraping HTML with same class
问题描述
I would like to ask how can i extract the event's fees from this website using python libraries (beautifulSoup) for web scraping.
但是,活动的费用与其他属性共用同一类.我想问一下是否有任何建议只提取费用.我尝试了 find_next
, find_next_sibling
和 find next_parent
,但仍然没有用.以下是价格类所在的原始html代码:
However, the event's fee share the same class with other properties. I would like to ask is there any suggestions to extract only the fees. I have try find_next
, find_next_sibling
and find next_parent
but still no use. Below is the raw html code where the price's class located:
<div class="eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped">Free</div>
如果能提供帮助,我将不胜感激.
I would appreciate if any help provided.
下面是我尝试的代码.我只在数组中得到一个标签列表.
Below is the code that i have try. I only get a list of tag in my array.
import pandas as pd
import requests
from bs4 import BeautifulSoup
url = 'https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=1'
response = requests.get(url)
soup = BeautifulSoup(response.text, "html.parser")
#Finding common container for each event
containers = soup.find_all('article', class_ = 'eds-l-pad-all-4 eds-event-card-content eds-event-card-content--list eds-event-card-content--standard eds-event-card-content--fixed eds-l-pad-vert-3')
event_fees = []
for container in containers:
fees = soup.select('div', class_ ='eds-event-card-content__sub eds-text-bm eds-text-color--ui-600 eds-l-mar-top-1 eds-event-card-content__sub--cropped')
event_fees.append(fees.txt)
推荐答案
有关价格的数据是从外部URL加载的.您可以使用 requests
/ json
模块来获取它:
The data about prices is loaded from external URL. You can use requests
/json
modules to get it:
import re
import json
import requests
url = "https://www.eventbrite.com/d/malaysia--kuala-lumpur--85675181/all-events/?page=1"
events_url = 'https://www.eventbrite.com/api/v3/destination/events/?event_ids={event_ids}&expand=event_sales_status,primary_venue,image,saves,my_collections,ticket_availability&page_size=99999'
html_text = requests.get(url).text
data1 = json.loads( re.search(r'window\.__SERVER_DATA__ = ({.*});', html_text).group(1) )
# uncomment this to print all data:
# print(json.dumps(data1, indent=4))
event_ids = ','.join(r['id'] for r in data1['search_data']['events']['results'])
data2 = requests.get(events_url.format(event_ids=event_ids)).json()
# uncomment this to print all data:
# print(json.dumps(data2, indent=4))
for e in data2['events']:
print(e['name'])
print(e['ticket_availability']['minimum_ticket_price']['display'],'-',e['ticket_availability']['maximum_ticket_price']['display'])
print('-' * 80)
打印:
Mega Career Fair & Post Graduate Education Fair 2020 - Mid Valley KL
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Post Graduate Education Fair 2020 - Mid Valley KL
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Traders Fair 2021 - Malaysia (Financial Education Event)
0.00 USD - 199.00 USD
--------------------------------------------------------------------------------
THE FIT Malaysia
0.00 MYR - 0.00 MYR
--------------------------------------------------------------------------------
Walk-In Interview with Career Partners of HRDF
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Entrepreneurship for Beginners - Startup | Entrepreneur Hackathon Webinar
0.00 EUR - 0.00 EUR
--------------------------------------------------------------------------------
Good Shepherd Catholic Church English Mass Registration- Scroll Down pls
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
CGH 10:00am Assumption Mass Registration
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Kuala Lumpu Video Speed Dating - Filter Off
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Wiki Finance EXPO Kuala Lumpur 2021
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
English Sunday Service - 16 AUGUST
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Good Shepherd Catholic Bahasa Malaysia Mass Registration. Pls scroll down
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
How To Improve Your Focus and Limit Distractions - Kuala Lumpur
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
ANNUAL GENERAL MEETING
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
ITS ALL ABOUT PORTRAIT
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
First service (English)
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
KL International Flea Market 2020 / Bazaar Antarabangsa Kuala Lumpur
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
Branding Strategies For Startups
10.50 MYR - 31.50 MYR
--------------------------------------------------------------------------------
SHC 9.15am Sunday Mass Registration
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
SHC 9.15am Sunday Mass (Tamil) திருஇருதய ஆண்டவர் ஆலயத்தில் காலை 9.15க்கு
0.00 USD - 0.00 USD
--------------------------------------------------------------------------------
这篇关于Python Web刮擦同一个类的HTML的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!