如何获取网页中的特定框架并检索其内容 [英] How to get a specific frame in a web page and retrieve its content

查看:240
本文介绍了如何获取网页中的特定框架并检索其内容的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想访问下列网址的翻译结果


http://translate.google.com/translate?hl=en&sl=en&tl=ar&u=http%3A%2F%2Fwww.saltycrane.com%2Fblog%2F2008%2F10%2Fhow-escape-百分比编码网址-python%2F


翻译显示在两个框架中的底部内容框架中。我有兴趣只检索底部内容框架以获取翻译。



selenium for python允许我们通过网页自动化获取页面内容:

  browser.get('http://translate.google.com/#en/ar/'+hurl)

所需的框架是一个iframe:

  < div id =contentframestyle =top:160px>< iframe src =/ translate_p?hl = en& am ... name = c frameborder =0style =height:100%; width:100%; position:absolute; top:0px; bottom:0px;>< / div>< / iframe> 

但是如何获得底部内容框架元素来检索使用web自动翻译?



知道PyQuery也允许我们使用JQuery形式化浏览内容

更新:

Selenium提供了一种方法,你可以做到这一点。

  frame = browser.find _element_by_tag_name('iframe')
browser.switch_to_frame(frame)
#get page source
browser.page_source

但在上例中不起作用。它可以使用 driver.switchTo.frame(1); / code>这里,frame()中的数字1是网页中存在的帧的索引。因为你的要求是切换到第二帧并且索引从0开始,所以你应该使用 driver.switchTo.frame(1);



但上面的代码是用Java编写的。在Python中,您可以使用下面的行。

  driver.switch_to_frame(1); 

更新

  driver.get(http://translate.google.com/translate?hl=zh-CN&sl=en&tl=ar&u=http://www.saltycrane。 COM /博客/ 2008/10 /如何逃生%的编码-URL的Python /); 
driver.switchTo().frame(0);
System.out.println(driver.findElement(By.xpath(/ html / body / div / div / div [3] / h1 / span / a))。getText());

输出: SaltyCrane ???????



我刚刚尝试打印存在于iframe中的标题名称SaltCrane。
它为我工作,除了? SaltCrane后的符号。因为它是阿拉伯语,所以无法解码。


上面的代码是用Java编写的。同样的逻辑也应该在Python中工作。


I wanted to access the translation results of the following url

http://translate.google.com/translate?hl=en&sl=en&tl=ar&u=http%3A%2F%2Fwww.saltycrane.com%2Fblog%2F2008%2F10%2Fhow-escape-percent-encode-url-python%2F

the translation is displayed in the bottom content frame out of the two frames. I am interested in retrieving only the bottom content frame to get the translations

selenium for python allows us to fetch page contents via web automation:

browser.get('http://translate.google.com/#en/ar/'+hurl)

The required frame is an iframe :

<div id="contentframe" style="top:160px"><iframe   src="/translate_p?hl=en&am... name=c frameborder="0" style="height:100%;width:100%;position:absolute;top:0px;bottom:0px;"></div></iframe>

but how to get the bottom content frame element to retrieve the translations using web automation?

Came to know that PyQuery also allows us to browse the contents using the JQuery formalism

Update:

An answer mentioned that Selenium provides a method where you can do that.

frame = browser.find_element_by_tag_name('iframe')
browser.switch_to_frame(frame)
# get page source
browser.page_source

but it does not work in the above example. It returns an empty page .

解决方案

You can use driver.switchTo.frame(1); here, the digit 1 inside frame() is the index of frames present in the webpage. as your requirement is to switch to second frame and the index starts with 0, you should use driver.switchTo.frame(1);

But the above code is in Java. In Python, you can use the below line.

driver.switch_to_frame(1);

UPDATE

 driver.get("http://translate.google.com/translate?hl=en&sl=en&tl=ar&u=http://www.saltycrane.com/blog/2008/10/how-escape-percent-encode-url-python/");
 driver.switchTo().frame(0);
 System.out.println(driver.findElement(By.xpath("/html/body/div/div/div[3]/h1/span/a")).getText());

Output: SaltyCrane ???????

I have just tried to print the title name SaltCrane that is present inside the iframe. It worked for me except for the ? symbols after the SaltCrane. As it was arabic, it was unable to decode the same.

The above code is in Java. Same logic should also work in Python.

这篇关于如何获取网页中的特定框架并检索其内容的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆