需要获取HTML源代码为字符串CEFPython [英] Need to get HTML source as string CEFPython

查看:290
本文介绍了需要获取HTML源代码为字符串CEFPython的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试使用CEFPython从Web URL获取HTML源代码作为字符串 我希望对MainFrame的源内容进行爬网并在

I am trying to get HTML source as string from web URL using CEFPython I want MainFrame's source content to be crawled and get string in

def save_screenshot(browser):    
    # Browser object provides GetUserData/SetUserData methods
    # for storing custom data associated with browser. The
    # "OnPaint.buffer_string" data is set in RenderHandler.OnPaint.
    buffer_string = browser.GetUserData("OnPaint.buffer_string")
    if not buffer_string:
        raise Exception("buffer_string is empty, OnPaint never called?")
    mainFrame = browser.GetMainFrame()
    print("Main frame is ", mainFrame)
    # print("buffer string" ,buffer_string)

    # visitor object
    visitorObj = cef_string()
    temp = mainFrame.GetSource(visitorObj).GetString()
    print("temp : ", temp)

    visitorText = mainFrame.GetText(temp)
    siteHTML = mainFrame.GetSource(visitorText)
    print("siteHTML is ", siteHTML)

问题: 代码没有为siteHTML返回

Problem: The code is returning nothing for siteHTML

推荐答案

您的mainframe.GetSource(visitor)是异步的.因此,您不能从中调用GetString().

Your mainframe.GetSource(visitor) is asynchronous. Therefore you cannot call GetString() from it.

这是这样做的方法,不幸的是,您需要以异步方式进行思考:

This is the way to do, unfortunately you need to think in asynchronous manner:

class Visitor(object)
    def Visit(self, value):
        print("This is the HTML source:")
        print(value)
myvisitor = Visitor()
mainFrame = browser.GetMainFrame()
mainFrame.GetSource(myvisitor)

还要注意的一件事:上例中的访问者对象myvisitor在弱引用中传递给GetSource().换句话说,您必须使该对象保持活动状态,直到将源传递回去为止.如果将上述代码段的最后三行放在一个函数中,则必须确保该函数在完成作业之前不会返回.

One more thing to beware of: the visitor object myvisitor in the above example is passed on to GetSource() in weak reference. In other words, you must keep that object alive until the source is passed back. If you put the last three lines in the above snippet in a function, you have to make sure the function does not return until the job is done.

这篇关于需要获取HTML源代码为字符串CEFPython的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆