如何查找广告的最终目的地(网址)(以编程方式) [英] How to find the final destination (url) of an ad (programmatically)

查看:179
本文介绍了如何查找广告的最终目的地(网址)(以编程方式)的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

这可能是微不足道的,或者不是,但我正在开发一款软件,用于验证通过我的网络应用程序显示的广告的行尾域。理想情况下,我有一个我不想提供广告的域名列表(假设Norton.com就是其中之一),但大多数广告网络通过缩短的,含义模糊的URL(adsrv.com)提供广告,最终重定向到Norton.com。所以问题是:有任何一个构建,或者知道如何构建一个类似刮刀的工具,它将返回广告的最终目标网址。

This may be trivial, or not, but I'm working on a piece of software that will verify the "end of the line" domain for ads displayed through my web application. Ideally, I have a list of domains I do not want to serve ads from (let's say Norton.com is one of them) but most ad networks serve ads via shortened, and cryptic, URLs (adsrv.com), that eventually redirect to Norton.com. So the question is: has any one built, or have an idea of how to build, a scraper-like tool that will return the final destination url of an ad.

初步发现:某些广告采用Flash,JavaScript或纯HTML格式。模拟浏览器是完全可行的,并且可以对抗不同格式的广告。并非所有Flash或JS广告都有noflash或noscript替代品。 (浏览器可能是必要的,但如上所述,这非常好......使用像WatiN或WatiR或WatiJ或Selenium等等......)

Initial discovery: Some ads are in Flash, JavaScript, or plain HTML. Emulating a browser is perfectly viable, and would combat different formats of ads. Not all Flash or JS ads have a noflash or noscript alternative. (Browser may be necessary, but as stated this is perfectly fine... Using something like WatiN or WatiR or WatiJ or Selenium, etc...)

首选开源这样我就可以自己重建一个。非常感谢帮助!

Prefer open source so that I could rebuild one myself. Really appreciate help!

编辑*此脚本需要点击广告,因为它可能是Flash,JS,或者只是HTML plain。所以Curl不太可能是一个选项,除非Curl可以点击?

EDIT* This script needs to Click on the ad, since it might be Flash, JS, or just HTML plain. So Curl is less likely an option, unless Curl can click?

推荐答案

示例PHP实现:

$k = curl_init('http://goo.gl');
curl_setopt($k, CURLOPT_FOLLOWLOCATION, true); // follow redirects
curl_setopt($k, CURLOPT_USERAGENT, 
'Mozilla/5.0 (Windows; U; Windows NT 6.1; en-US) AppleWebKit/534.7 ' .
'(KHTML, like Gecko) Chrome/7.0.517.41 Safari/534.7'); // imitate chrome
curl_setopt($k, CURLOPT_NOBODY, true); // HEAD request only (faster)
curl_setopt($k, CURLOPT_RETURNTRANSFER, true); // don't echo results
curl_exec($k);
$final_url = curl_getinfo($k, CURLINFO_EFFECTIVE_URL); // get last URL followed
curl_close($k);
echo $final_url;

哪些应返回类似
https:// www。 google.com/accounts/ServiceLogin?service=urlshortener&continue=http://goo.gl/?authed%3D1&followup=http://goo.gl/?authed%3D1&passive=true&go=true

注意:您可能需要使用 curl_setopt()来关闭 CURLOPT_SSL_VERIFYHOST CURLOPT_SSL_VERIFYPEER 如果您想要可靠地跟踪HTTPS / SSL

Note: You might need to use curl_setopt() to turn off CURLOPT_SSL_VERIFYHOST and CURLOPT_SSL_VERIFYPEER if you want to reliably follow across HTTPS/SSL

这篇关于如何查找广告的最终目的地(网址)(以编程方式)的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆