只用"div"刮擦桌子 [英] Scrape of table with only 'div's

查看:105
本文介绍了只用"div"刮擦桌子的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

当尝试抓取网页时,该表没有<tr>标记,而是所有<div>标记.

When trying to scrape a web page, this table has no <tr> tags, and is all <div> tags.

我要抓取的站点检查器如下所示: 检查器屏幕截图

The site inspector that I'm trying to scrape looks as follows: inspector screenshot

我希望能够从table-row类中获取信息,但是抓取文件永远不会返回任何内容.使用下面的代码,当我抓取.table-header或只是.practiceDataTable时,就可以从中获取数据.

I'd like to be able to grab the info from the table-row class, but the scrape never returns anything. With the code below, when I scrape the .table-header, or just .practiceDataTable, I'm able to get the data from that.

import bs4
import requests

res = requests.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/')

soup = bs4.BeautifulSoup(res.text, 'lxml')

soup.select('.nrwgt-lbh .practiceDataTable')

for i in soup.select('.nrwgt-lbh .practiceDataTable .table-row'):
    print(i.text)

我还注意到,在检查器中,类"practiceDataTable"在其后有一个空格,然后是"dataTable",但是当我在代码中的任何地方使用该空格时,该代码将无法正常工作.

I also noticed that in the inspector, the class "practiceDataTable" has a space after it and then "dataTable", but when I use that anywhere in the code, the code doesn't work.

推荐答案

urllib.urlopen对象的源进行的检查显示该站点是动态的,因为找不到具有类table-row的更新的div对象.因此,您需要使用浏览器操作工具,例如selenium:

An inspection of the source from a urllib.urlopen object shows that the site is dynamic, as no updated div object with class table-row can be found. Thus, you need to use a browser manipulation tool such as selenium:

from bs4 import BeautifulSoup as soup
import re
import urllib
from selenium import webdriver
d = webdriver.Chrome()
classes = ['position', 'chase', 'car-number', 'driver', 'manufacturer', 'start-position not-mobile', 'laps not-mobile', 'laps-led not-mobile', 'final-status', 'points not-mobile', 'bonus not-mobile']
d.get('https://www.nascar.com/results/race_center/2018/monster-energy-nascar-cup-series/auto-club-400/stn/race/')
new_data = [filter(None, [b.text for b in i.find_all('div', {'class':re.compile('|'.join(classes))})]) for i in soup(d.page_source, 'lxml').find_all('div', {'class':'table-row'})]

输出:

[[u'00', u'JeffreyEarnhardt'], [u'1', u'JamieMcMurray'], [u'2', u'BradKeselowski'], [u'3', u'AustinDillon'], [u'4', u'KevinHarvick'], [u'6', u'TrevorBayne'], [u'9', u'ChaseElliott'], [u'10', u'AricAlmirola'], [u'11', u'DennyHamlin'], [u'12', u'RyanBlaney'], [u'13', u'TyDillon'], [u'14', u'ClintBowyer'], [u'15', u'RossChastain'], [u'17', u'RickyStenhouse Jr.'], [u'18', u'KyleBusch'], [u'19', u'DanielSuarez'], [u'20', u'ErikJones'], [u'21', u'PaulMenard'], [u'22', u'JoeyLogano'], [u'23', u'GrayGaulding'], [u'24', u'WilliamByron'], [u'31', u'RyanNewman'], [u'32', u'MattDiBenedetto'], [u'34', u'MichaelMcDowell'], [u'37', u'ChrisBuescher'], [u'38', u'DavidRagan'], [u'41', u'KurtBusch'], [u'42', u'KyleLarson'], [u'43', u'DarrellWallace Jr.'], [u'47', u'AJAllmendinger'], [u'48', u'JimmieJohnson'], [u'51', u'TimmyHill'], [u'55', u'ReedSorenson'], [u'72', u'ColeWhitt'], [u'78', u'MartinTruex Jr.'], [u'88', u'AlexBowman'], [u'95', u'KaseyKahne'], [u'1', u'4', u'KevinHarvick'], [u'2', u'14', u'ClintBowyer'], [u'3', u'10', u'AricAlmirola'], [u'4', u'31', u'RyanNewman'], [u'5', u'42', u'KyleLarson'], [u'6', u'11', u'DennyHamlin'], [u'7', u'78', u'MartinTruex Jr.'], [u'8', u'20', u'ErikJones'], [u'9', u'3', u'AustinDillon'], [u'10', u'88', u'AlexBowman'], [u'11', u'1', u'JamieMcMurray'], [u'12', u'18', u'KyleBusch'], [u'13', u'41', u'KurtBusch'], [u'14', u'48', u'JimmieJohnson'], [u'15', u'9', u'ChaseElliott'], [u'16', u'37', u'ChrisBuescher'], [u'17', u'22', u'JoeyLogano'], [u'18', u'43', u'DarrellWallace Jr.'], [u'19', u'21', u'PaulMenard'], [u'20', u'2', u'BradKeselowski'], [u'21', u'19', u'DanielSuarez'], [u'22', u'32', u'MattDiBenedetto'], [u'23', u'12', u'RyanBlaney'], [u'24', u'13', u'TyDillon'], [u'25', u'17', u'RickyStenhouse Jr.'], [u'26', u'24', u'WilliamByron'], [u'27', u'47', u'AJAllmendinger'], [u'28', u'6', u'TrevorBayne'], [u'29', u'34', u'MichaelMcDowell'], [u'30', u'38', u'DavidRagan'], [u'31', u'95', u'KaseyKahne'], [u'32', u'15', u'RossChastain'], [u'33', u'72', u'ColeWhitt'], [u'34', u'00', u'JeffreyEarnhardt'], [u'35', u'51', u'TimmyHill'], [u'36', u'*55', u'ReedSorenson'], [u'37', u'23', u'GrayGaulding'], [u'1', u'78', u'MartinTruex Jr.'], [u'2', u'18', u'KyleBusch'], [u'3', u'42', u'KyleLarson'], [u'4', u'20', u'ErikJones'], [u'5', u'3', u'AustinDillon'], [u'6', u'22', u'JoeyLogano'], [u'7', u'41', u'KurtBusch'], [u'8', u'12', u'RyanBlaney'], [u'9', u'31', u'RyanNewman'], [u'10', u'4', u'KevinHarvick'], [u'11', u'2', u'BradKeselowski'], [u'12', u'37', u'ChrisBuescher'], [u'13', u'6', u'TrevorBayne'], [u'14', u'21', u'PaulMenard'], [u'15', u'1', u'JamieMcMurray'], [u'16', u'17', u'RickyStenhouse Jr.'], [u'17', u'13', u'TyDillon'], [u'18', u'32', u'MattDiBenedetto'], [u'19', u'43', u'DarrellWallace Jr.'], [u'20', u'23', u'GrayGaulding'], [u'21', u'38', u'DavidRagan'], [u'22', u'34', u'MichaelMcDowell'], [u'23', u'00', u'JeffreyEarnhardt'], [u'24', u'55', u'ReedSorenson'], [u'25', u'11', u'DennyHamlin'], [u'26', u'14', u'ClintBowyer'], [u'27', u'10', u'AricAlmirola'], [u'28', u'88', u'AlexBowman'], [u'29', u'24', u'WilliamByron'], [u'30', u'19', u'DanielSuarez'], [u'31', u'9', u'ChaseElliott'], [u'32', u'47', u'AJAllmendinger'], [u'33', u'48', u'JimmieJohnson'], [u'34', u'95', u'KaseyKahne'], [u'35', u'51', u'TimmyHill'], [u'36', u'15', u'RossChastain'], [u'37', u'72', u'ColeWhitt'], [u'1', u'78', u'MartinTruex Jr.', u'1', u'200', u'125', u'Running', u'60', u'7'], [u'2', u'42', u'KyleLarson', u'3', u'200', u'0', u'Running', u'43', u'0'], [u'3', u'18', u'KyleBusch', u'2', u'200', u'62', u'Running', u'51', u'0'], [u'4', u'2', u'BradKeselowski', u'11', u'200', u'0', u'Running', u'49', u'0'], [u'5', u'22', u'JoeyLogano', u'6', u'200', u'9', u'Running', u'45', u'0'], [u'6', u'11', u'DennyHamlin', u'25', u'200', u'1', u'Running', u'39', u'0'], [u'7', u'20', u'ErikJones', u'4', u'200', u'0', u'Running', u'39', u'0'], [u'8', u'12', u'RyanBlaney', u'8', u'200', u'0', u'Running', u'29', u'0'], [u'9', u'48', u'JimmieJohnson', u'33', u'200', u'0', u'Running', u'38', u'0'], [u'10', u'3', u'AustinDillon', u'5', u'200', u'0', u'Running', u'27', u'0'], [u'11', u'14', u'ClintBowyer', u'26', u'199', u'0', u'Running', u'30', u'0'], [u'12', u'10', u'AricAlmirola', u'27', u'199', u'0', u'Running', u'25', u'0'], [u'13', u'88', u'AlexBowman', u'28', u'199', u'0', u'Running', u'24', u'0'], [u'14', u'41', u'KurtBusch', u'7', u'199', u'0', u'Running', u'27', u'0'], [u'15', u'24', u'WilliamByron', u'29', u'199', u'1', u'Running', u'23', u'0'], [u'16', u'9', u'ChaseElliott', u'31', u'199', u'0', u'Running', u'21', u'0'], [u'17', u'1', u'JamieMcMurray', u'15', u'199', u'1', u'Running', u'20', u'0'], [u'18', u'17', u'RickyStenhouse Jr.', u'16', u'199', u'0', u'Running', u'19', u'0'], [u'19', u'21', u'PaulMenard', u'14', u'199', u'0', u'Running', u'18', u'0'], [u'20', u'43', u'DarrellWallace Jr.', u'19', u'199', u'0', u'Running', u'17', u'0'], [u'21', u'31', u'RyanNewman', u'9', u'199', u'0', u'Running', u'16', u'0'], [u'22', u'47', u'AJAllmendinger', u'32', u'199', u'0', u'Running', u'15', u'0'], [u'23', u'19', u'DanielSuarez', u'30', u'199', u'0', u'Running', u'14', u'0'], [u'24', u'95', u'KaseyKahne', u'34', u'199', u'1', u'Running', u'13', u'0'], [u'25', u'38', u'DavidRagan', u'21', u'199', u'0', u'Running', u'12', u'0'], [u'26', u'34', u'MichaelMcDowell', u'22', u'199', u'0', u'Running', u'11', u'0'], [u'27', u'13', u'TyDillon', u'17', u'198', u'0', u'Running', u'10', u'0'], [u'28', u'72', u'ColeWhitt', u'37', u'198', u'0', u'Running', u'9', u'0'], [u'29', u'15', u'RossChastain', u'36', u'198', u'0', u'Running', u'0', u'0'], [u'30', u'37', u'ChrisBuescher', u'12', u'197', u'0', u'Running', u'7', u'0'], [u'31', u'32', u'MattDiBenedetto', u'18', u'196', u'0', u'Running', u'6', u'0'], [u'32', u'23', u'GrayGaulding', u'20', u'194', u'0', u'Running', u'5', u'0'], [u'33', u'51', u'TimmyHill', u'35', u'193', u'0', u'Running', u'0', u'0'], [u'34', u'55', u'ReedSorenson', u'24', u'193', u'0', u'Running', u'3', u'0'], [u'35', u'4', u'KevinHarvick', u'10', u'191', u'0', u'Running', u'2', u'0'], [u'36', u'00', u'JeffreyEarnhardt', u'23', u'189', u'0', u'Running', u'1', u'0'], [u'37', u'6', u'TrevorBayne', u'13', u'108', u'0', u'Accident', u'1', u'0'], [u'1', u'78', u'MartinTruex Jr.', u'60'], [u'2', u'18', u'KyleBusch', u'60'], [u'3', u'22', u'JoeyLogano', u'60'], [u'4', u'2', u'BradKeselowski', u'60'], [u'5', u'48', u'JimmieJohnson', u'60'], [u'6', u'42', u'KyleLarson', u'60'], [u'7', u'41', u'KurtBusch', u'60'], [u'8', u'20', u'ErikJones', u'60'], [u'9', u'14', u'ClintBowyer', u'60'], [u'10', u'11', u'DennyHamlin', u'60'], [u'11', u'3', u'AustinDillon', u'60'], [u'12', u'1', u'JamieMcMurray', u'60'], [u'13', u'10', u'AricAlmirola', u'60'], [u'14', u'9', u'ChaseElliott', u'60'], [u'15', u'24', u'WilliamByron', u'60'], [u'16', u'19', u'DanielSuarez', u'60'], [u'17', u'21', u'PaulMenard', u'60'], [u'18', u'88', u'AlexBowman', u'60'], [u'19', u'6', u'TrevorBayne', u'60'], [u'20', u'37', u'ChrisBuescher', u'60'], [u'21', u'31', u'RyanNewman', u'60'], [u'22', u'17', u'RickyStenhouse Jr.', u'60'], [u'23', u'95', u'KaseyKahne', u'60'], [u'24', u'38', u'DavidRagan', u'59'], [u'25', u'34', u'MichaelMcDowell', u'59'], [u'26', u'43', u'DarrellWallace Jr.', u'59'], [u'27', u'32', u'MattDiBenedetto', u'59'], [u'28', u'47', u'AJAllmendinger', u'59'], [u'29', u'15', u'RossChastain', u'59'], [u'30', u'72', u'ColeWhitt', u'59'], [u'31', u'13', u'TyDillon', u'59'], [u'32', u'12', u'RyanBlaney', u'59'], [u'33', u'23', u'GrayGaulding', u'58'], [u'34', u'55', u'ReedSorenson', u'58'], [u'35', u'51', u'TimmyHill', u'58'], [u'36', u'4', u'KevinHarvick', u'57'], [u'37', u'00', u'JeffreyEarnhardt', u'56'], [u'1', u'78', u'MartinTruex Jr.', u'120'], [u'2', u'2', u'BradKeselowski', u'120'], [u'3', u'18', u'KyleBusch', u'120'], [u'4', u'11', u'DennyHamlin', u'120'], [u'5', u'20', u'ErikJones', u'120'], [u'6', u'22', u'JoeyLogano', u'120'], [u'7', u'48', u'JimmieJohnson', u'120'], [u'8', u'42', u'KyleLarson', u'120'], [u'9', u'14', u'ClintBowyer', u'120'], [u'10', u'24', u'WilliamByron', u'120'], [u'11', u'41', u'KurtBusch', u'120'], [u'12', u'10', u'AricAlmirola', u'120'], [u'13', u'31', u'RyanNewman', u'120'], [u'14', u'9', u'ChaseElliott', u'120'], [u'15', u'88', u'AlexBowman', u'120'], [u'16', u'1', u'JamieMcMurray', u'120'], [u'17', u'19', u'DanielSuarez', u'120'], [u'18', u'3', u'AustinDillon', u'120'], [u'19', u'12', u'RyanBlaney', u'120'], [u'20', u'17', u'RickyStenhouse Jr.', u'120'], [u'21', u'37', u'ChrisBuescher', u'120'], [u'22', u'95', u'KaseyKahne', u'120'], [u'23', u'38', u'DavidRagan', u'120'], [u'24', u'47', u'AJAllmendinger', u'120'], [u'25', u'43', u'DarrellWallace Jr.', u'120'], [u'26', u'34', u'MichaelMcDowell', u'120'], [u'27', u'32', u'MattDiBenedetto', u'120'], [u'28', u'15', u'RossChastain', u'119'], [u'29', u'21', u'PaulMenard', u'119'], [u'30', u'72', u'ColeWhitt', u'119'], [u'31', u'13', u'TyDillon', u'118'], [u'32', u'23', u'GrayGaulding', u'117'], [u'33', u'55', u'ReedSorenson', u'116'], [u'34', u'51', u'TimmyHill', u'115'], [u'35', u'00', u'JeffreyEarnhardt', u'114'], [u'36', u'4', u'KevinHarvick', u'113'], [u'37', u'6', u'TrevorBayne', u'108']]

要安装硒,请运行pip install selenium,然后为您的浏览器安装适当的绑定:

to install selenium, run pip install selenium, and then install the appropriate bindings for your browser:

Chrome驱动程序: https://sites.google.com/a/chrome.org/chromedriver/downloads

Chrome driver: https://sites.google.com/a/chromium.org/chromedriver/downloads

Firefox驱动程序: https://github.com/mozilla/geckodriver/releases

Firefox driver: https://github.com/mozilla/geckodriver/releases

然后,要运行代码,请创建具有与您选择的浏览器相对应的类名的驱动程序对象,并将路径传递给驱动程序:

Then, to run the code, create a driver object with the classname corresponding to your browser of choice, passing the path to the driver:

d = webdriver.Firefox("/path/to/driver")

d = webdriver.Chrome("/path/to/driver")

修改

将数据写入CSV文件:

Writing data to csv file:

import csv
write = csv.writer(open('nascarDrivers.csv', 'w'))
write.writerows(new_data) #new_data is the list of lists containing the table data

这篇关于只用"div"刮擦桌子的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆