Python BeautifulSoup抓取;如何合并两个不同的字段,或根据站点中的位置将它们配对? [英] Python BeautifulSoup scraping; how to combine two different fields, or pair them based on location in site?

查看:36
本文介绍了Python BeautifulSoup抓取;如何合并两个不同的字段,或根据站点中的位置将它们配对?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

好的,所以我在这里很初学者.我要尝试做的目的是抓取一个网站,以获取公司名称和相应的电话号码.最终目标是将这些内容写入可以用Excel打开的CSV文件中.

目前,我能够分别检索公司名称和电话号码.我认为我可以以某种方式合并两个列表,但是我担心单个异常数据会抵消整个合并,并使数字与名称不匹配.

完成此任务的最佳方法是什么?

urllib导入请求中的

 从bs4导入BeautifulSoupurl ='https://www.iqsdirectory.com/bolts/bolts-2/'html = request.urlopen(url)汤= BeautifulSoup(html,'html.parser')data1 = soup.findAll('span',{'itemprop':'name'})data2 = soup.findAll('a',{'itemprop':'电话'})datalist1 = []datalist2 = []对于我在data1中:datalist1.append(i.string)对于我在data2中:datalist2.append(i.string)x = zip(数据列表1,数据列表2)打印(列表(x)) 

是否可以在相同的汤功能中提取姓名和电话以保持其联系?

任何帮助将不胜感激!

解决方案

 导入请求从bs4导入BeautifulSoup导入csvdef main(URL):r = request.get(URL)汤= BeautifulSoup(r.content,'html.parser')目标= soup.select("h3.cname")使用open("data.csv",'w',newline =")作为f:writer = csv.writer(f)writer.writerow([["Name","Phone"])对于目标中的tar:名称= tar.find("span",itemprop =名称").textphone = tar.find("a",itemprop =电话").textwriter.writerow([姓名,电话])main("https://www.iqsdirectory.com/bolts/bolts-2/") 

输出:

Ok guys, so I'm very much a beginner here. The purpose of what I'm trying to do is to scrape a website for company names and corresponding phone numbers. The end goal would be to write these to a CSV that can be opened with Excel.

Currently I'm able to retrieve the company names, and the phone numbers, separately. I am thinking that i could merge the two lists somehow, but I'm concerned about a single outlier data offsetting the whole merge, and mismatching the numbers to names.

What is the best way to accomplish this?

from urllib import request
from bs4 import BeautifulSoup

url = 'https://www.iqsdirectory.com/bolts/bolts-2/'
html = request.urlopen(url)
soup = BeautifulSoup(html, 'html.parser')

data1 = soup.findAll('span', {'itemprop':'name'})
data2 = soup.findAll('a', {'itemprop':'telephone'})

datalist1 = []
datalist2 = []

for i in data1:
    datalist1.append(i.string)

for i in data2:
    datalist2.append(i.string)

x = zip(datalist1, datalist2)

print(list(x))

Is it possible to pull name and phone in the same soup function in order to preserve their connection?

Any help would be appreciated!

解决方案

import requests
from bs4 import BeautifulSoup
import csv


def main(url):
    r = requests.get(url)
    soup = BeautifulSoup(r.content, 'html.parser')
    target = soup.select("h3.cname")
    with open("data.csv", 'w', newline="") as f:
        writer = csv.writer(f)
        writer.writerow(["Name", "Phone"])
        for tar in target:
            name = tar.find("span", itemprop="name").text
            phone = tar.find("a", itemprop="telephone").text
            writer.writerow([name, phone])


main("https://www.iqsdirectory.com/bolts/bolts-2/")

Output: view-online

这篇关于Python BeautifulSoup抓取;如何合并两个不同的字段,或根据站点中的位置将它们配对?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆