关于网页抓取的问题 [英] issue on web scraping

查看：77 发布时间：2020/9/20 8:36:28 python python-3.x beautifulsoup

本文介绍了关于网页抓取的问题的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我在使用美丽汤进行网页爬取时遇到问题，这是URL http ://desiopt.com/company/4316/VST-CONSULTING-INC/，我正在尝试通过网络抓取公司信息的详细信息.

I am having a problem on Web Scraping using Beautiful Soup This is the URL http://desiopt.com/company/4316/VST-CONSULTING-INC/ which i'm trying to web scraping of company Info details.

from selenium import webdriver
import bs4
import pandas as pd
from bs4 import BeautifulSoup
import re
driver =  webdriver.Chrome(executable_path=r"C:/Users/Chandra Sekhar/Desktop/chrome-driver/chromedriver.exe")
titles=[]
driver.get("http://desiopt.com/company/4316/VST-CONSULTING-INC/")
content = driver.page_source
soup = BeautifulSoup(content)
for a in soup.findAll('div',href=True, attrs={'class':'headerBgBlock'}):
    title=a.find('div', attrs={'class':'userInfo'})
    print(title.text)
    df = pd.DataFrame({'Product Title':titles})
    df['Price'] = df['Price'].map(lambda x: re.sub(r'\W+', '', x))
    df.to_csv('products1.csv', index=False)

推荐答案

import requests
from bs4 import BeautifulSoup

r = requests.get('http://desiopt.com/company/4316/VST-CONSULTING-INC/')
soup = BeautifulSoup(r.text, 'html.parser')


for item in soup.findAll('div', attrs={'class': 'compProfileInfo'}):
    for a in item.findAll('span'):
          print(a.text.strip())

输出:

VST CONSULTING INC
Phone
732-491-8681
Email
bindu@vstconsulting.com
Web Site
www.vstconsulting.com

这篇关于关于网页抓取的问题的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

关于网页抓取的问题 [英] issue on web scraping

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

关于网页抓取的问题 [英] issue on web scraping

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭