在python中使用bs4从网站的不同链接中获取律师详细信息 [英] Fetching lawyer details from different links of a website using bs4 in python

查看：86 发布时间：2020/5/24 4:25:58 python pandas web-scraping beautifulsoup

本文介绍了在python中使用bs4从网站的不同链接中获取律师详细信息的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我是使用Python进行Web爬网的绝对初学者，对Python编程的了解很少.我只是想提取田纳西州律师的信息.在该网页中，有多个链接，其中还有更多关于律师类别的链接，而这些链接中还有律师的详细信息.

I am an absolute beginner to Web Scraping using Python with very little knowledge in Python programming. I am just trying to extract the information of the lawyers in the Tennesse location. In the webpage ,there are multiple links, within which there are further more links about the categories of lawyers and within those are the lawyers details.

我已经将各个城市的链接提取到一个列表中，并且还提取了每个城市链接中可用的各种律师.此外，我将每个律师的URL分别提取为不同的类别，并将它们存储在单独的集中.现在，我已经遍历了这些URL，以获取律师的详细信息，最后将它们写到.xls文件中.但是，当我运行我的代码时，该过程将继续执行并无限期地执行代码.我必须强制停止该过程，并且甚至在程序中指定的所需位置都没有创建.xls文件.该怎么办?如果可能的话.

I have already extracted the links of the various city into a list and have also extracted the various categories of lawyers available in each of the city links. Furthermore I have extracted each of the lawyers url in separate categories and stored them in a separate set. Now I have iterated over those urls for fetching the details of the lawyers and finally writing them on to a .xls file. But while I am running my code the process is going on executing for and indefinite amount of time. I have to force stop the process and even no .xls file is being created at the desired location as stated in the program. What can be done? If possible suggest.

import requests
from bs4 import BeautifulSoup as bs
import pandas as pd

final=[]
records=[]
with requests.Session() as s:
    res = s.get('https://attorneys.superlawyers.com/tennessee/', headers = {'User-agent': 'Super Bot 9000'})
    soup = bs(res.content, 'lxml')

    cities = [item['href'] for item in soup.select('#browse_view a')]
    for c in cities:
        r=s.get(c)
        s1=bs(r.content,'lxml')
        categories = [item['href'] for item in s1.select('.three_browse_columns:nth-of-type(2) a')]
        for c1 in categories:
            r1=s.get(c1)
            s2=bs(r1.content,'lxml')
            lawyers = [item['href'].split('*')[1] if '*' in item['href'] else item['href'] for item in
                       s2.select('.indigo_text .directory_profile')]



            for i in lawyers:
                r2 = s.get(i)
                s3 = bs(r2.content, 'lxml')
                name = s3.select_one('#lawyer_name').text
                category = s3.select_one('#attorney_profile_heading').text
                firm = s3.select_one('#firm_profile_page').text
                address = ' '.join([string for string in s3.select_one('#poap_postal_addr_block').stripped_strings][1:])
                practices = ' '.join([item.text for item in s3.select('#pa_list li')])
                records.append({'Names': name,'Category':category,'Address': address, 'Firm Name': firm,'Practice Area':practices})
df = pd.DataFrame(records,columns=['Names','Category','Address','Farm Name','Practice Areas'])
df=df.drop_duplicates()
df.to_excel(r'C:\Users\laptop\Desktop\lawyers.xls', sheet_name='MyData2', index = False, header=True)

我希望程序完成其执行并创建一个.xls文件，但是它仍在继续执行，甚至我也不知道完成它需要多长时间.无限循环是否有可能?发生了吗?如果可能，建议."

"I expected the program to complete its execution and create an .xls file, but it is going on executing and even I have no idea for how long will it require to complete it's execution. Is there any possibility that an infinite loop has occoured? If possible suggest."

在python中使用bs4从网站的不同链接中获取律师详细信息 [英] Fetching lawyer details from different links of a website using bs4 in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录关闭

在python中使用bs4从网站的不同链接中获取律师详细信息 [英] Fetching lawyer details from different links of a website using bs4 in python

问题描述

推荐答案

相关文章

Python最新文章

热门教程

热门工具

登录 关闭

登录关闭