无法使用BeautifulSoup4(Python 3)刮取特定表 [英] Can't Scrape a Specific Table using BeautifulSoup4 (Python 3)

查看:130
本文介绍了无法使用BeautifulSoup4(Python 3)刮取特定表的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从法甲橄榄球网站上刮一张桌子。特别是包含卡片和裁判信息的表格。

http:// www。 ligue1.com/LFPStats/stats_arbitre?competition=D1



我使用以下代码:

 从bs4导入请求
导入BeautifulSoup
导入csv

r = requests.get(http://www.ligue1.com / LFPStats / stats_arbitre?competition = D1)

soup = BeautifulSoup(r.content,html.parser)
table = soup.find_all('table')

这会在html中返回其他表格。我试图通过使用 [0] [1] 等来查找所有函数,但不返回任何内容。我还搜索了 tr td ,但得到了类似的结果。我不知道为什么美丽的汤无视这张桌子。



我要找的表格在下面的HTML代码中

 <表> 
< thead>
< tr>
< th class ={sorter:false} joueur>裁判员< / th>
< th class =chiffre header>< span class =icon icon_carton_jaune>黄牌< / span>< / th>
< th class =chiffre header>< span class =icon icon_carton_rouge>红卡< / span>< / th>
< th class =chiffre header>匹配< / th>
< / tr>
< / thead>
< tbody>< tr>
< td class =position>< / td>
< td class =joueur>BenoîtBASTIEN< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 245> 25< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 245> 4< / a>< / td>
< td class =chiffre> 8< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur> Hakim BEN EL HADJ< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 259> 55< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 259> 4< / a>< / td>
< td class =chiffre> 10< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur> Wilfried BIEN< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 162> 44< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 162> 3< / a>< / td>
< td class =chiffre> 9< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur> Ruddy BUQUET< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 269> 33< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 269> 2< / a>< / td>
< td class =chiffre> 7< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur> Tony CHAPRON< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 102> 43< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 102> 1< / a>< / td>
< td class =chiffre> 8< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur> Amaury DELERUE< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 343> 30< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 343> 0< / a>< / td>
< td class =chiffre> 6< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur>SaïdENNJIMI< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 113> 27< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 113> 1< / a>< / td>
< td class =chiffre> 6< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur> Fredy FAUTREL< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 338> 25< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 338> 2< / a>< / td>
< td class =chiffre> 8< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur> Antony GAUTIER< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 331> 31< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 331> 8< / a>< / td>
< td class =chiffre> 9< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur> Johan HAMEL< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 334> 43< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 334> 7< / a>< / td>
< td class =chiffre> 9< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur> Lionel JAFFREDO< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 124> 40< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 124> 2< / a>< / td>
< td class =chiffre> 9< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur>StéphaneJOCHEM< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 294> 33< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 294> 4< / a>< / td>
< td class =chiffre> 8< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur>StéphaneLANNOY< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 127> 24< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 127> 0< / a>< / td>
< td class =chiffre> 6< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur> Mikael LESAGE< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 286> 38< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 286> 3< / a>< / td>
< td class =chiffre> 9< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur>JérômeMIGUELGORRY< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 239> 32< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 239> 1< / a>< / td>
< td class =chiffre> 10< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur>BenoîtMILLOT< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 287> 43< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 287> 0< / a>< / td>
< td class =chiffre> 11< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur>SébastienMOREIRA< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 148> 38< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 148> 5< / a>< / td>
< td class =chiffre> 10< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur> Nicolas RAINVIL​​LE< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 188> 40< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 188> 7< / a>< / td>
< td class =chiffre> 10< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur> Frank SCHNEIDER< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 247> 33< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 247> 4< / a>< / td>
< td class =chiffre> 10< / td>
< / tr>
< tr class =odd>
< td class =position>< / td>
< td class =joueur>ClémentTURPIN< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 333> 26< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 333> 3< / a>< / td>
< td class =chiffre> 8< / td>
< / tr>
< tr>
< td class =position>< / td>
< td class =joueur> Bartolomeu VARELA< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 288> 35< / a>< / td>
< td class =chiffre>< a href =/ stats_arbitre_details / 288> 3< / a>< / td>
< td class =chiffre> 9< / td>
< / tr>
< / tbody>< / table>

我也试着搜寻 td 与一个特定的类以及它应该工作,但它不能挑出表。

解决方案

问题是(假设)您正在观看由浏览器生成的HTML代码,并且您缺少的是该表使用javascript附加到页面。



您可以使用chrome(或任何其他浏览器)确认这一点,而不是检查,查找查看页面源代码,并且您会注意到服务器响应中没有这样的表。



它所调用的网址是 http://www.ligue1.com/stats_arbitre ?竞争= D1 ,但有一个技巧,你必须通过http标题表明该请求是XHR。如果您在浏览器中使用此URL尝试,您将得到500个响应。



试试这个curl示例来检查您想要的表。



curl --headerX-Requested-With:XMLHttpRequesthttp://www.ligue1.com/stats_arbitre?competition=D1



在您的代码中,执行此操作:

 导入请求
from bs4 import BeautifulSoup
import csv

headers = {'X-Requested-With':'XMLHttpRequest'}
r = requests.get('http:// www .ligue1.com / stats_arbitre?competition = D1',headers =标题)

...

希望它有帮助


I would like to scrape a table from the Ligue 1 football website. Specifically the table which contains information on cards and referees.

http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1

I am using the following code:

import requests
from bs4 import BeautifulSoup
import csv

r=requests.get("http://www.ligue1.com/LFPStats/stats_arbitre?competition=D1")

soup= BeautifulSoup(r.content, "html.parser")
table=soup.find_all('table')

This returns another table somewhere else in the html. I have tried to circumnavigate this by using [0], [1] etc after the find all function but return nothing. I have also searched for tr and td but get similar results. I have no idea why beautiful soup ignores this table.

The table I am looking for is in the HTML code below

<table>
<thead>
  <tr>
    <th class="{sorter: false} hide position">Position</th>
    <th class="{sorter: false} joueur">Referees</th>
    <th class="chiffre header"><span class="icon icon_carton_jaune">Yellow card</span></th>
    <th class="chiffre header"><span class="icon icon_carton_rouge">Red card</span></th>
    <th class="chiffre header">Matches</th>
  </tr>
</thead>
    <tbody><tr>
  <td class="position"></td>
  <td class="joueur">Benoît BASTIEN</td>
  <td class="chiffre"><a href="/stats_arbitre_details/245">25</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/245">4</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Hakim BEN EL HADJ</td>
  <td class="chiffre"><a href="/stats_arbitre_details/259">55</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/259">4</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Wilfried BIEN</td>
  <td class="chiffre"><a href="/stats_arbitre_details/162">44</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/162">3</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Ruddy BUQUET</td>
  <td class="chiffre"><a href="/stats_arbitre_details/269">33</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/269">2</a></td>
  <td class="chiffre">7</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Tony CHAPRON</td>
  <td class="chiffre"><a href="/stats_arbitre_details/102">43</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/102">1</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Amaury DELERUE</td>
  <td class="chiffre"><a href="/stats_arbitre_details/343">30</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/343">0</a></td>
  <td class="chiffre">6</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Saïd ENNJIMI</td>
  <td class="chiffre"><a href="/stats_arbitre_details/113">27</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/113">1</a></td>
  <td class="chiffre">6</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Fredy FAUTREL</td>
  <td class="chiffre"><a href="/stats_arbitre_details/338">25</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/338">2</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Antony GAUTIER</td>
  <td class="chiffre"><a href="/stats_arbitre_details/331">31</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/331">8</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Johan HAMEL</td>
  <td class="chiffre"><a href="/stats_arbitre_details/334">43</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/334">7</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Lionel JAFFREDO</td>
  <td class="chiffre"><a href="/stats_arbitre_details/124">40</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/124">2</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Stéphane JOCHEM</td>
  <td class="chiffre"><a href="/stats_arbitre_details/294">33</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/294">4</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Stéphane LANNOY</td>
  <td class="chiffre"><a href="/stats_arbitre_details/127">24</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/127">0</a></td>
  <td class="chiffre">6</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Mikael LESAGE</td>
  <td class="chiffre"><a href="/stats_arbitre_details/286">38</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/286">3</a></td>
  <td class="chiffre">9</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Jérôme MIGUELGORRY</td>
  <td class="chiffre"><a href="/stats_arbitre_details/239">32</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/239">1</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Benoît MILLOT</td>
  <td class="chiffre"><a href="/stats_arbitre_details/287">43</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/287">0</a></td>
  <td class="chiffre">11</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Sébastien MOREIRA</td>
  <td class="chiffre"><a href="/stats_arbitre_details/148">38</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/148">5</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Nicolas RAINVILLE</td>
  <td class="chiffre"><a href="/stats_arbitre_details/188">40</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/188">7</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Frank SCHNEIDER</td>
  <td class="chiffre"><a href="/stats_arbitre_details/247">33</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/247">4</a></td>
  <td class="chiffre">10</td>
</tr>
    <tr class="odd">
  <td class="position"></td>
  <td class="joueur">Clément TURPIN</td>
  <td class="chiffre"><a href="/stats_arbitre_details/333">26</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/333">3</a></td>
  <td class="chiffre">8</td>
</tr>
    <tr>
  <td class="position"></td>
  <td class="joueur">Bartolomeu VARELA</td>
  <td class="chiffre"><a href="/stats_arbitre_details/288">35</a></td>
  <td class="chiffre"><a href="/stats_arbitre_details/288">3</a></td>
  <td class="chiffre">9</td>
</tr>
</tbody></table>

I have also tried searching for td with a specific class as well which should work but it can't pick out the table in the first place.

解决方案

The problem is that (i assume) you are watching the HTML code generated by the browser, and what you are missing is that the table is appended to the page using javascript.

You can confirm this using chrome (or any other browser), and instead of "Inspect", look for "View Page Source", and you will notice that there is no such table in the server response.

The URL it calls is "http://www.ligue1.com/stats_arbitre?competition=D1", but there is a trick, you must indicate via http headers, that the request is a XHR. If you try in the browser with this URL, you'll get 500 response.

Try this curl example to check is the table you want.

curl --header "X-Requested-With: XMLHttpRequest" http://www.ligue1.com/stats_arbitre?competition=D1

In your code, do this:

import requests
from bs4 import BeautifulSoup
import csv

headers = {'X-Requested-With': 'XMLHttpRequest'}
r = requests.get('http://www.ligue1.com/stats_arbitre?competition=D1', headers=headers)

...

Hope it helps

这篇关于无法使用BeautifulSoup4(Python 3)刮取特定表的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆