请看我的问题,相信我很容易解决 [英] Please see my problem, believe me it is easy to solve

查看:82
本文介绍了请看我的问题,相信我很容易解决的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我尝试在 spawn 子进程中实现异步和等待.但它没有奏效.请看这个

i tried to implement async and await inside spawn child process. But it didn't worked. Please see this

预期输出

 *************
http://www.stevecostellolaw.com/
 *************
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/personal-injury.html
 *************
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/#
 *************
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/home.html
 *************
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/about-us.html
 *************
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/

 *************

Becoz 每次 spawn child 找到 await 它将返回 python 脚本并打印 ************* 然后打印网址.此处忽略 2 次打印相同的 url.

Becoz each time spawn child found await it will go back to python script and print ************* it and then print URL. Ignore 2 times printing of same url here.

我得到的输出

C:\Users\ASUS\Desktop\searchermc>node app.js
server running on port 3000

DevTools listening on ws://127.0.0.1:52966/devtools/browser/933c20c7-e295-4d84-a4b8-eeb5888ecbbf
[3020:120:0402/105304.190:ERROR:device_event_log_impl.cc(214)] [10:53:04.188] USB: usb_device_handle_win.cc:1056 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)
[3020:120:0402/105304.190:ERROR:device_event_log_impl.cc(214)] [10:53:04.189] USB: usb_device_handle_win.cc:1056 Failed to read descriptor from node connection: A device attached to the system is not functioning. (0x1F)

 *************
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/personal-injury.html
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/#
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/home.html
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/about-us.html
http://www.stevecostellolaw.com/
http://www.stevecostellolaw.com/

 *************

请看下面的app.js代码

// form submit request
app.post('/formsubmit', function(req, res){

    csvData = req.files.csvfile.data.toString('utf8');
    filteredArray = cleanArray(csvData.split(/\r?\n/))
    csvData = get_array_string(filteredArray)
    csvData = csvData.trim()
    
    var keywords = req.body.keywords
    keywords = keywords.trim()

    // Send request to python script
    var spawn = require('child_process').spawn;
    var process = spawn('python', ["./webextraction.py", csvData, keywords, req.body.full_search])

    var outarr = []

    // process.stdout.on('data', (data) => {
    //   console.log(`stdout: ${data}`);
    // });

    process.stdout.on('data', async function(data){

      console.log("\n ************* ")
      console.log(data.toString().trim())
      await outarr.push(data.toString().trim())
      console.log("\n ************* ")

    });

});

当 if 条件匹配时发送 URL 的 Python 函数

Python function which is sending in the URLs when the if condition matched

# Function for searching keyword start
def search_keyword(href, search_key):
    extension_list = ['mp3', 'jpg', 'exe', 'jpeg', 'png', 'pdf', 'vcf']
    if(href.split('.')[-1] not in extension_list):
        try:    
            content = selenium_calling(href)
            soup = BeautifulSoup(content,'html.parser')
            search_string = re.sub("\s+"," ", soup.body.text)
            search_string = search_string.lower()
            res = [ele for ele in search_key if(ele.lower() in search_string)]
            outstr = getstring(res)
            outstr = outstr.lstrip(", ")
            if(len(res) > 0):
                print(href)
                found_results.append(href)
                href_key_dict[href] = outstr
                return 1
            else:
                notfound_results.append(href)
        except Exception as err:
            pass

我想做这一切是因为 python 脚本需要更多时间来执行,因此每次都会出现超时错误,所以我想在我的 nodejs 脚本中获得 python 脚本的中间输出.您可以在下图中看到我遇到的错误.

I want to do all this because of the python script which takes more time to execute and thus give timeout error each time, so i am thinking to get intermediate ouput of the python script in my nodejs script. you can see the error i m getting in below image.

推荐答案

我不确定我是否完全理解你想要做什么,但我会试一试,因为你似乎已经问了很多这个问题时间已经(这通常不是一个好主意).我相信您的问题不够明确 - 如果您能阐明您的最终目标是什么(即您希望它如何表现?)

I'm not sure I completely understand what you're trying to do, but I'll give it a shot since you seem to have asked this question many times already (which usually isn't a good idea). I believe that there's a lack of clarity in your question - it would help a lot if you could clarify what your end goal is (i.e. how do you want this to behave?)

我想你在这里提到了两个不同的问题.第一个是您希望在从脚本返回的每个单独的数据之前放置一个新的******"行.这是不能依赖的 - 查看这个问题的答案以获取更多详细信息:process.stdout.on( 'data', ... ) 和 process.stderr.on( 'data', ... ) 的顺序.数据将以块的形式传递给您的标准输出处理程序,而不是逐行传递,并且可以一次提供任意数量的数据,具体取决于当前管道中的数据量.

I think you mentioned two separate problems here. The first is that you expect a new line of '******' to be placed before each separate piece of data returned from your script. This is something that can't be relied on - check out the answer to this question for more detail: Order of process.stdout.on( 'data', ... ) and process.stderr.on( 'data', ... ). The data will be passed to your stdout handler in chunks, not line-by-line, and any amount of data can be provided at a time depending how much is currently in the pipe.

我最困惑的部分是在我的 nodejs 脚本中获取 python 脚本的中间输出"的措辞.不一定有任何立即"数据 - 您不能依靠进程的标准输出处理程序在任何特定时间传入的数据,它会以由 Python 脚本本身及其运行的进程决定的速度向您传递数据.话虽如此,听起来像是您的主要问题是您的 POST 超时.您永远不会结束您的请求 - 这就是您超时的原因.我将假设您要等待第一个数据块 - 无论它包含多少行 - 在发送回响应之前.在这种情况下,您需要添加 res.send,如下所示:

The part I'm most confused about is your phrasing of "to get intermediate ouput of the python script in my nodejs script". There's not necessarily any "immediate" data - you can't rely on data coming in at any particular time with your process's stdout handler, its going to hand you data at a pace determined by the Python script itself and the process its running in. With that said, it sounds like your main problem here is the timeout happening on your POST. You aren't ever ending your request - that's why you're getting a timeout. I'm going to assume that you want to wait for the first chunk of data - regardless of how many lines it contains - before sending a response back. In that case, you'll need to add res.send, like this:

    // form submit request
app.post('/formsubmit', function(req, res){

    csvData = req.files.csvfile.data.toString('utf8');
    filteredArray = cleanArray(csvData.split(/\r?\n/))
    csvData = get_array_string(filteredArray)
    csvData = csvData.trim()
    
    var keywords = req.body.keywords
    keywords = keywords.trim()

    // Send request to python script
    var spawn = require('child_process').spawn;
    var process = spawn('python', ["./webextraction.py", csvData, keywords, req.body.full_search])

    var outarr = []

    // process.stdout.on('data', (data) => {
    //   console.log(`stdout: ${data}`);
    // });
    
    // Keep track of whether we've already ended the request
    let responseSent = false;

    process.stdout.on('data', async function(data){

        console.log("\n ************* ")
        console.log(data.toString().trim())
        outarr.push(data.toString().trim())
        console.log("\n ************* ")
        
        // If the request hasn't already been ended, send back the current output from the script
        // and end the request
        if (!responseSent) {
            responseSent = true;
            res.send(outarr);
        }
    });

});

这篇关于请看我的问题,相信我很容易解决的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆