在网页上查找广告 [英] Finding Ads on a web page
问题描述
我正在编写一个应用程序,试图确定网页上是否有广告。这是目前使用浏览器通过使用python的selenium webdriver。
我认为iframe中存在大量广告,并且我已经制作了一个循环以查看内部每帧
browser = webdriver.Chrome()
browser.get(http://cnn.com )
all_iframes = browser.find_elements_by_tag_name(iframe)
用于all_iframe中的iframe:
browser.switch_to_frame(iframe)
print(浏览器。 page_source)
browser.switch_to_default_content()
browser.quit()
我想知道是否有一致的标签或标签参数可以在多个网页上使用,以确定网页上是否存在广告(包括页面上的iframe内外)。我是否必须在每个框架内寻找像doubleclick或adtech或adblade这样的东西?
或者我必须为每个页面检查生成不同的规则?
任何人都知道广告在网页上的显示方式吗?感谢。
您可以通过广告服务器进行搜索。
http:// pgl.yoyo.org/as/serverlist.php?hostformat=adblockplus
查看其他项目并了解他们如何处理同样的任务:
I'm writing an application that's trying to determine if there are ads on a page. This is currently using brower-driving through selenium webdriver using python.
I figured that a good amount of ads exist inside iframes, and I've made a loop to look inside each frame
browser = webdriver.Chrome()
browser.get("http://cnn.com")
all_iframes = browser.find_elements_by_tag_name("iframe")
for iframe in all_iframes:
browser.switch_to_frame(iframe)
print(browser.page_source)
browser.switch_to_default_content()
browser.quit()
I'm wondering if there is any consistently found tags or tag parameters that I can use across multiple pages to determine if there are ads located on a page (both in and outside of iframes on a page). Would I have to look for instances of stuff like doubleclick or adtech or adblade inside each frame?
Or would I have to generate different rules for checking on a per-page basis?
Anyone in the know about how ads are displayed on pages? Thanks.
You can search by the ad servers.
http://pgl.yoyo.org/as/serverlist.php?hostformat=adblockplus
It would be helpful to look at other projects and see how they handle doing the same task:
http://adblockplus.org/en/source
这篇关于在网页上查找广告的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!