BeautifulSoup未获取数据 [英] BeautifulSoup not fetching the Data

查看:69
本文介绍了BeautifulSoup未获取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试从网站获取数据.但是在可变汤中未获得名称,业务性质,电话,电子邮件等字段的任何信息.我应该在下面的代码中添加些什么以获取此数据?

I am trying to fetch the data from the website. But not getting any of the information for fields like name, Nature of business, Telephone, Email, etc. in the variable soup. What should I add to the below code to have this data?

import requests 
import pandas as pd
from bs4 import BeautifulSoup
page = "http://www.pmas.sg/page/members-directory"
pages = requests.get(page)
soup = BeautifulSoup(pages.content, 'html.parser')
print(soup)

我使用上面的代码得到的输出是:-

The output I am getting using the above code is:-

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 3.2 Final//EN">

<html>
<head>
<title>WebKnight Application Firewall Alert</title>
<meta content="NOINDEX" name="ROBOTS"/>
</head>
<body bgcolor="#ffffff" link="#FF3300" text="#000000" vlink="#FF3300">
<table cellpadding="3" cellspacing="5" width="410">
<tr>
<td align="left">
<font face="Verdana,Arial,Helvetica" size="2">
<font size="3"><b>WebKnight Application Firewall Alert</b></font><br/><br/><br/>
Your request triggered an alert! If you feel that you have received this page in error, please contact the administrator of this web site.
<br/>
<hr/>
<br/><b>What is WebKnight?</b><br/>
AQTRONIX WebKnight is an application firewall for web servers and is released under the GNU General Public License. It is an ISAPI filter for securing web servers by blocking certain requests. If an alert is triggered WebKnight will take over and protect the web server.<br/><br/>
<hr/>
<br/>For more information on WebKnight: <a href="http://www.aqtronix.com/webknight/">http://www.aqtronix.com/WebKnight/</a><br/><br/>
<b><font color="#FF3300">AQTRONIX</font> WebKnight</b></font>
</td>
</tr>
</table>
</body>
</html>

推荐答案

import requests
from bs4 import BeautifulSoup
import csv
import regex

headers = {
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:71.0) Gecko/20100101 Firefox/71.0"
}
r = requests.get('http://www.pmas.sg/page/members-directory', headers=headers)

soup = BeautifulSoup(r.text, 'html.parser')

data = []
for item in soup.findAll('div', {'class': 'col-md-4'}):
    l = []
    for p in item.findAll('p'):
        matches = regex.findall(
            r"^(?:.*?:[[:blank:]]+\K)?.*", p.text, regex.MULTILINE)
        b = next(iter(matches))
        l.append(b)
    if l:
        print(l)
        data.append(l)


with open('data.csv', 'w', newline='') as f:
    writer = csv.writer(f)
    writer.writerow(['Name', 'Nature of Business',
                     'Address', 'Contact', 'Phone#', 'Fax', 'Website', 'Email'])
    writer.writerows(data)
    print("Done")

这篇关于BeautifulSoup未获取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆