使用beautifulsoup从页面中抓取表格，找不到表格 [英] Scraping a table from a page using beautifulsoup, table is not found

查看：71 发布时间：2021/4/15 19:09:37 python web-scraping beautifulsoup

本文介绍了使用beautifulsoup从页面中抓取表格，找不到表格的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我一直在努力从此处，但在我看来，BeautifulSoup找不到任何桌子.

I've been trying to scrape the table from here but it seems to me that BeautifulSoup doesn't find any table.

我写道:

import requests
import pandas as pd
from bs4 import BeautifulSoup
import csv

url = "http://www.payscale.com/college-salary-report/bachelors?page=65" 
r=requests.get(url)
data=r.text

soup=BeautifulSoup(data,'xml')
table=soup.find_all('table')
print table   #prints nothing..

基于其他类似的问题，我认为HTML某种程度上已损坏，但我不是专家.在这些地方找不到答案:(美丽的汤缺少一些html表标签)，(从网站中提取表格)，(使用BeautifulSoup刮擦表)，甚至是(">Python+BeautifulSoup:从网页上抓取特定表)

Based on other similar questions, I assume that the HTML is broken in someway, but I'm not an expert.. Couldn't find an answer in those: (Beautiful soup missing some html table tags), (Extracting a table from a website), (Scraping a table using BeautifulSoup), or even (Python+BeautifulSoup: scraping a particular table from a webpage)

感谢一堆！

推荐答案

您正在解析 html ，但是您使用了 xml 解析器.
您应该使用 soup = BeautifulSoup(data，"html.parser")
您所需的数据在 script 标记中，实际上实际上没有 table 标记.因此，您需要在 script 中查找文本.
注意:如果您使用的是Python 2.x，请使用"HTMLParser"而不是"html.parser".

You are parsing html but you used xml parser.
You should use soup=BeautifulSoup(data,"html.parser")
Your necessary data is in script tag, in fact there is no table tag actually. So, you need to find texts inside script.
N.B: If you are using Python 2.x then use "HTMLParser" instead of "html.parser".

这是代码.

import csv
import requests
from bs4 import BeautifulSoup

url = "http://www.payscale.com/college-salary-report/bachelors?page=65" 
r=requests.get(url)
data=r.text

soup=BeautifulSoup(data,"html.parser")
scripts = soup.find_all("script")

file_name = open("table.csv","w",newline="")
writer = csv.writer(file_name)
list_to_write = []

list_to_write.append(["Rank","School Name","School Type","Early Career Median Pay","Mid-Career Median Pay","% High Job Meaning","% STEM"])

for script in scripts:
    text = script.text
    start = 0
    end = 0
    if(len(text) > 10000):
        while(start > -1):
            start = text.find('"School Name":"',start)
            if(start == -1):
                break
            start += len('"School Name":"')
            end = text.find('"',start)
            school_name = text[start:end]

            start = text.find('"Early Career Median Pay":"',start)
            start += len('"Early Career Median Pay":"')
            end = text.find('"',start)
            early_pay = text[start:end]

            start = text.find('"Mid-Career Median Pay":"',start)
            start += len('"Mid-Career Median Pay":"')
            end = text.find('"',start)
            mid_pay = text[start:end]

            start = text.find('"Rank":"',start)
            start += len('"Rank":"')
            end = text.find('"',start)
            rank = text[start:end]

            start = text.find('"% High Job Meaning":"',start)
            start += len('"% High Job Meaning":"')
            end = text.find('"',start)
            high_job = text[start:end]

            start = text.find('"School Type":"',start)
            start += len('"School Type":"')
            end = text.find('"',start)
            school_type = text[start:end]

            start = text.find('"% STEM":"',start)
            start += len('"% STEM":"')
            end = text.find('"',start)
            stem = text[start:end]

            list_to_write.append([rank,school_name,school_type,early_pay,mid_pay,high_job,stem])
writer.writerows(list_to_write)
file_name.close()

这将在csv中生成您所需的表.完成操作后，别忘了关闭文件.

This will generate your necessary table in csv. Don't forget to close the file when you are done.

这篇关于使用beautifulsoup从页面中抓取表格，找不到表格的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

使用beautifulsoup从页面中抓取表格，找不到表格 [英] Scraping a table from a page using beautifulsoup, table is not found

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

使用beautifulsoup从页面中抓取表格，找不到表格 [英] Scraping a table from a page using beautifulsoup, table is not found

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭