是否可以将更强大的HTML解析器连接到Python机械化? [英] Is it possible to hook up a more robust HTML parser to Python mechanize?
问题描述
我正在尝试使用机械化方法在网站上解析和提交表单,但是似乎内置表单解析器无法检测到表单及其元素.我怀疑它在格式不正确的HTML上令人窒息,我想尝试使用更好地设计以处理不良HTML(例如lxml或BeautifulSoup)的解析器对其进行预解析,然后将经过整理,清理的输出馈送到表单解析器.我不仅需要机械化的方式来提交表单,而且还需要维护会话(我在登录会话中正在使用此表单.)
I am trying to parse and submit a form on a website using mechanize, but it appears that the built-in form parser cannot detect the form and its elements. I suspect that it is choking on poorly formed HTML, and I'd like to try pre-parsing it with a parser better designed to handle bad HTML (say lxml or BeautifulSoup) and then feeding the prettified, cleaned-up output to the form parser. I need mechanize not only for submitting the form but also for maintaining sessions (I'm working this form from within a login session.)
如果确实可行,我不确定该怎么做.我不太熟悉HTTP协议的各种细节,如何使各个部分协同工作,等等.有没有指针?
I'm not sure how to go about doing this, if it is indeed possible.. I'm not that familiar with the various details of the HTTP protocol, how to get various parts to work together etc. Any pointers?