BeautifulSoup4-在两个不同标签之间连接多个html元素 [英] BeautifulSoup4 - Concatenating multiple html elements between two different tags

查看:41
本文介绍了BeautifulSoup4-在两个不同标签之间连接多个html元素的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在使用Python&bs4

I am scraping a page using Python & bs4

我从bs4获得的html源代码如下(出于可读性目的而进行了一些清理):

The html source code that I get from bs4 is as follows (cleaned up a bit for readability purpose):

<p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica, sans-serif">
<span style="font-size:14.0px"><span style="font-family:Arial, Helvetica, sans-serif">

<strong>COMPANY DESCRIPTION</strong><br>
Here goes the first para of company description</span></span></p>

<p style="text-align:justify;font-size:12.0px;font-family:Arial, Helvetica, sans-serif">
<span style="font-size:14.0px"><span style="font-family:Arial, Helvetica, sans-serif">
Here goes the second para of company description</span></span></p>

<p><strong>PURPOSE AND OBJECTIVES</strong></p>
<p>To address requirements in the area of Supply Chain Management Extended Warehouse Management solutions, Build competencies at Solution Delivery Center to deliver solutions<br>

<strong>EXPECTATIONS AND TASKS&nbsp;</strong></p>
<ul>
    <li>Independently handle large implementation projects with focus on Warehouse Management processes such as inbound, outbound and internal processes. RF Device functions and Barcode support experience is desirable</li>
    <li>Able to lead EWM discussions, assessments and detail requirement studies with customers</li>
</ul>

<strong>KEY PERFORMANCE INDICATORS</strong></p>
<ul>
    <li>Customer Feedback/customer satisfaction scores</li>
    <li>Productive days/utilization as defined by the organization for projects/assessments/etc.</li>
    <li>Knowledge Management and creation of effective reusable components</li>
</ul>

<strong>EXPERIENCE REQUIREMENTS</strong></p>
<ul>
    <li>Minimum of 4+ years industry experience and a minimum of 5 to 6 years of SAP EWM experience</li>
    <li>Domain knowledge in Supply Chain Management in the areas of Planning, Manufacturing &amp; warehousing processes is a must</li>
</ul>

<p><strong>EDUCATION AND QUALIFICATIONS/SKILLS AND COMPETENCIES</strong></p>
<ul>
    <li>Degree in Engineering or IT</li>
    <li>SAP Certification in Extended Warehouse Management (EWM) desirable</li>
</ul>

<p><span style="font-family:Arial,Helvetica,sans-serif"><span style="font-size:14.0px"><strong>WHAT YOU GET FROM US </strong></span></span></p>

观察:

在上面的代码中,所有节标题都在< strong></strong> 标签.标题在不同页面上可能会有所不同.

In above code all the section headings are between <strong> </strong> tags. The headings can vary across different pages.

我的要求:

  • 合并所有html文本&公司说明后从第二个< strong> 标记开始的标记,即来自目的和目标&在包含您从美国获得的信息的标签之前结束.
  • 我没有在寻找任何使用Selenium的解决方案,因为它会比较慢.
  • To combine all the html text & tags starting from 2nd <strong> tag after COMPANY DESCRIPTION i.e. from PURPOSE AND OBJECTIVES & end before the tag containing WHAT YOU GET FROM US.
  • I am not looking for any solution using Selenium as it will be comparatively slow.

我要抓取的页面是我正在抓取的链接

这是我的python代码:

Here is my peice of python code:

def scrape_url(url, method='bs4'):
    session = requests.session()
    page = session.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    return soup

url = 'https://jobs.sap.com/job/Mumbai-Senior-Account-Executive-Job-MH/539212101/'
soup = scrape_url(url)
job_page = soup.body.find('div', attrs={'class': 'job'})
print(job_page)

推荐答案

首先使用正则表达式标识带有文本的标签,然后使用 find_next_siblings()获取所有下一个兄弟姐妹,然后检查是否任何兄弟姐妹都包含文本您从美国获得的信息

First identify the tag with text using regular expression and then use find_next_siblings() to get all next siblings and then check if any siblings contains the text WHAT YOU GET FROM US

代码:

import re
import requests
from bs4 import BeautifulSoup
def scrape_url(url, method='bs4'):
    session = requests.session()
    page = session.get(url)
    soup = BeautifulSoup(page.text, 'html.parser')
    return soup

url = 'https://jobs.sap.com/job/Kuala-Lumpur-Business-Processes-Consultant-%28FICO%29-Job-14/541909901/'
soup = scrape_url(url)
findtag=soup.find('p',text=re.compile("PURPOSE AND OBJECTIVES"))
print(findtag.text)
for item in findtag.find_next_siblings():    
    if 'WHAT YOU GET FROM US' in item.text:
        break
    else:
        print(item.text.strip())

输出:在控制台上

PURPOSE AND OBJECTIVES

To address requirements in the area of Supply Chain Management Extended Warehouse Management solutions, Build competencies at Solution Delivery Center to deliver solutions especially in areas relating to SAP EWM

EXPECTATIONS AND TASKS

Independently handle large implementation projects with focus on Warehouse Management processes such as inbound, outbound and internal processes. RF Device functions and Barcode support experience is desirable
Able to lead EWM discussions, assessments and detail requirement studies with customers
Leading the team that are assigned to, in functional capacity, adding value to the project and to the final deliverables
Be actively involved in the preparation, conception, realization and Go Live of customer implementation projects
Demonstrate the ability to plan, run, and manage blueprint workshops / meetings with internal and external clients
Responsible for defining the scope of a project / opportunities, estimating efforts and project timelines
Participating in RFP discussions and estimating under guidance from a Bid Manager
Providing a creative source of ideas/solutions to address problems
Delivering billable components that meets a customer’s needs
KEY PERFORMANCE INDICATORS

Customer Feedback/customer satisfaction scores
Productive days/utilization as defined by the organization for projects/assessments/etc.
Knowledge Management and creation of effective reusable components
EXPERIENCE REQUIREMENTS

Minimum of 4+ years industry experience and a minimum of 5 to 6 years of SAP EWM experience
Domain knowledge in Supply Chain Management in the areas of Planning, Manufacturing & warehousing processes is a must
Must have strong ERP implementation experience
Experience in SAP Material Flow Systems (MFS) or any other third party automation tools will be desirable
Experience in EWM technical knowledge will be an added advantage
Knowledge on SAP S/4HANA Public Cloud solution and SAP IOT/Leonardo portfolio will be preferred but not mandatory
Good understanding of S/4HANA Order to Cash and Procure to Pay business processes
Good understanding of SAP ACTIVATE implementation methodology
Use of Solution Manager as a part of implementation life cycle is desirable
Good Communication skill in English.

EDUCATION AND QUALIFICATIONS/SKILLS AND COMPETENCIES

Degree in Engineering or IT
SAP Certification in Extended Warehouse Management (EWM) desirable
Minimum 4 to 5 full life cycle SAP EWM implementations
Strong knowledge in SAP SCM Extended Warehouse Management Solutions and S/4HANA Embedded EWM Solution
Good integration knowledge with other components with SAP S/4HANA (WM, SD, MM, PP) and other SAP or Non-SAP legacy applications
Knowledge of SCOR, APICS certification preferable
Strong client-facing experience and well-developed customer focus
Solid oral and written communication skills, with the demonstrated ability to communicate complex technical topics to management and non-technical audiences
Mobility is must – candidate must be ready to travel to project locations (short term and long term)

这篇关于BeautifulSoup4-在两个不同标签之间连接多个html元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆