Python BeautifulSoup抓取;如何合并两个不同的字段,或根据站点中的位置将它们配对? [英] Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?
问题描述
好的,所以我在这里很初学者.我要尝试做的目的是抓取一个网站,以获取公司名称和相应的电话号码.最终目标是将这些内容写入可以用Excel打开的CSV文件中.
目前,我能够分别检索公司名称和电话号码.我认为我可以以某种方式合并两个列表,但是我担心单个异常数据会抵消整个合并,并使数字与名称不匹配.
完成此任务的最佳方法是什么?
urllib导入请求中的 从bs4导入BeautifulSoupurl ='https://www.iqsdirectory.com/bolts/bolts-2/'html = request.urlopen(url)汤= BeautifulSoup(html,'html.parser')data1 = soup.findAll('span',{'itemprop':'name'})data2 = soup.findAll('a',{'itemprop':'电话'})datalist1 = []datalist2 = []对于我在data1中:datalist1.append(i.string)对于我在data2中:datalist2.append(i.string)x = zip(数据列表1,数据列表2)打印(列表(x))
是否可以在相同的汤功能中提取姓名和电话以保持其联系?
任何帮助将不胜感激!
导入请求从bs4导入BeautifulSoup导入csvdef main(URL):r = request.get(URL)汤= BeautifulSoup(r.content,'html.parser')目标= soup.select("h3.cname")使用open("data.csv",'w',newline =")作为f:writer = csv.writer(f)writer.writerow([["Name","Phone"])对于目标中的tar:名称= tar.find("span",itemprop =名称").textphone = tar.find("a",itemprop =电话").textwriter.writerow([姓名,电话])main("https://www.iqsdirectory.com/bolts/bolts-2/")
输出:
Ok guys, so I'm very much a beginner here. The purpose of what I'm trying to do is to scrape a website for company names and corresponding phone numbers. The end goal would be to write these to a CSV that can be opened with Excel.
Currently I'm able to retrieve the company names, and the phone numbers, separately. I am thinking that i could merge the two lists somehow, but I'm concerned about a single outlier data offsetting the whole merge, and mismatching the numbers to names.
What is the best way to accomplish this?
from urllib import request
from bs4 import BeautifulSoup
url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')
data1 = soup.findAll('span', {'itemprop':'name'})
data2 = soup.findAll('a', {'itemprop':'telephone'})
datalist1 = []
datalist2 = []
for i in data1:
datalist1.append(i.string)
for i in data2:
datalist2.append(i.string)
x = zip(datalist1, datalist2)
print(list(x))
Is it possible to pull name and phone in the same soup function in order to preserve their connection?
Any help would be appreciated!
import requests
from bs4 import BeautifulSoup
import csv
def main(url):
r = requests.get(url)
soup = BeautifulSoup(r.content, 'html.parser')
target = soup.select("h3.cname")
with open("data.csv", 'w', newline="") as f:
writer = csv.writer(f)
writer.writerow(["Name", "Phone"])
for tar in target:
name = tar.find("span", itemprop="name").text
phone = tar.find("a", itemprop="telephone").text
writer.writerow([name, phone])
main("https://www.iqsdirectory.com/bolts/bolts-2/")
Output: view-online
这篇关于Python BeautifulSoup抓取;如何合并两个不同的字段,或根据站点中的位置将它们配对?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!