无法使用两个线程在脚本中执行两个函数 [英] Unable to use two Threads to execute two functions within a script

查看:33
本文介绍了无法使用两个线程在脚本中执行两个函数的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我使用 python 与 Thread 结合创建了一个刮板,以加快执行速度.刮板应该解析网页中以不同字母结尾的所有可用链接.它确实解析了它们.

I've created a scraper using python in combination with Thread to make the execution faster. The scraper is supposed to parse all the links available within the webpage ended with different alphabets. It does parse them all.

但是,我希望再次使用 Thread 解析这些单独链接中的所有 namesphone 号码.我可以使用 Thread 设法运行第一部分,但我不知道如何创建另一个 Thread 来执行脚本的后一部分?

However, I wish to parse all the names and phone numbers from those individual links using Thread again. The first portion I could manage to run using Thread but I can't get any idea how to create another Thread to execute the latter portion of the script?

我可以将它们包装在一个 Thread 中,但我的目的是知道如何使用两个 Threads 来执行两个函数.

I could have wrapped them within a single Thread but my intention is to know how to use two Threads to execute two functions.

对于第一部分:我像下面这样尝试并且成功

import requests
import threading
from lxml import html

main_url = "https://www.houzz.com/proListings/letter/{}"

def alphabetical_links(mainurl):
    response = requests.get(link).text
    tree = html.fromstring(response)
    return [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]

if __name__ == '__main__':
    linklist = []
    for link in [main_url.format(chr(page)) for page in range(97,123)]:
        thread = threading.Thread(target=alphabetical_links, args=(link,))
        thread.start()
        linklist+=[thread]

    for thread in linklist:
        thread.join()

我的问题是:如何在另一个 Thread

import requests
import threading
from lxml import html

main_url = "https://www.houzz.com/proListings/letter/{}"

def alphabetical_links(mainurl):
    response = requests.get(link).text
    tree = html.fromstring(response)
    return [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]

def sub_links(process_links):
    response = requests.get(process_links).text
    root = html.fromstring(response)

    for container in root.cssselect(".proListing"):
        try:
            name = container.cssselect("h2 a")[0].text
        except Exception: name = ""
        try:
            phone = container.cssselect(".proListingPhone")[0].text
        except Exception: phone = ""
        print(name, phone)

if __name__ == '__main__':
    linklist = []
    for link in [main_url.format(chr(page)) for page in range(97,123)]:
        thread = threading.Thread(target=alphabetical_links, args=(link,))
        thread.start()
        linklist+=[thread]

    for thread in linklist:
        thread.join()

推荐答案

尝试使用自己的线程更新 alphabetical_links:

Try to update alphabetical_links with its own Threads:

import requests
import threading
from lxml import html

main_url = "https://www.houzz.com/proListings/letter/{}"


def alphabetical_links(mainurl):
    response = requests.get(mainurl).text
    tree = html.fromstring(response)
    links_on_page = [container.attrib['href'] for container in tree.cssselect(".proSitemapLink a")]
    threads = []
    for link in links_on_page:
        thread = threading.Thread(target=sub_links, args=(link,))
        thread.start()
        threads.append(thread)
    for thread in threads:
        thread.join()


def sub_links(process_links):
    response = requests.get(process_links).text
    root = html.fromstring(response)

    for container in root.cssselect(".proListing"):
        try:
            name = container.cssselect("h2 a")[0].text
        except Exception: name = ""
        try:
            phone = container.cssselect(".proListingPhone")[0].text
        except Exception: phone = ""
        print(name, phone)

if __name__ == '__main__':
    linklist = []
    for link in [main_url.format(chr(page)) for page in range(97,123)]:
        thread = threading.Thread(target=alphabetical_links, args=(link,))
        thread.start()
        linklist+=[thread]


    for thread in linklist:
        thread.join()

请注意,这只是如何管理内部线程"的示例.由于多个线程同时启动,您的系统可能会因资源不足而无法启动其中一些线程,并且您将收到 RuntimeError: can't start new thread 异常.在这种情况下,您应该尝试实施 ThreadPool

Note that this is just an example of how to manage "inner Threads". Because of numerous threads that are starting at the same time your system might fail to start some of them due to lack of resources and you will get RuntimeError: can't start new thread exception. In this case you should try to implement ThreadPool

这篇关于无法使用两个线程在脚本中执行两个函数的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆