cUrl一个没有http:// www的域 [英] cUrl a domain without http://www

查看:169
本文介绍了cUrl一个没有http:// www的域的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

您好,我有一个域我想解析与cUrl和这里的情况:



当我去域 http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201



会将我重新导向至[register.metsad.ee/avalik/info_teatis.php?too_id=2942704201]



没有http:// www。
用于解析的代码是:

 函数get_data($ url){
$ ch = curl_init ();
$ timeout = 5;
curl_setopt($ ch,CURLOPT_URL,$ url);
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ ch,CURLOPT_FOLLOWLOCATION,1);
curl_setopt($ ch,CURLOPT_MAXREDIRS,10);
curl_setopt($ ch,CURLOPT_CONNECTTIMEOUT,$ timeout);
$ data = curl_exec($ ch);
curl_close($ ch);
return $ data;
}
$ src ='http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201';

然后 $ c = get_data($ src);
echo $ c;

对于resoult我得到一个空白的白页。我还尝试过Simple_Html_Dom解析器,像这样:



echo file_get_html($ src) - > plaintext; p>

但是我仍然得到一个空白的白页。当我trie解析没有http://那么有一个错误,



警告:file_get_contents(register.metsad.ee/avalik/info_teatis .php?too_id = 2942704201)[function.file-get-contents]:无法打开流:结果太大C:\xampp\htdocs\Trash\metsakontroll\system\c_simple_html_dom.php on line 70



cUrl仍然呈现白屏,无效果。当我试图解析它像一个文件夹像这样:



http://www.metsad.ee/register/avalik/info_teatis.php?too_id=2942704201 然后服务器说找不到



我搜索整个互联网= /任何想法如何通过cUrl或Simple_html_dom阅读该页面?

解决方案

在register.metsad.ee端有一些保护。



调用失败(空响应)

>

  feedbee @ server:〜$ telnet register.metsad.ee 80 
尝试213.184.43.115 ...
已连接到register.metsad.ee。
转义字符是'^]'。
GET /avalik/info_teatis.php?too_id=2942704201 HTTP / 1.1
Host:register.metsad.ee

HTTP / 1.1 200 OK
日期:Thu, 13 Dec 2012 20:07:11 GMT
服务器:Apache
Content-Length:0
Content-Type:text / html; charset = UTF-8

成功调用(返回HTML页面):

  feedbee @ server:〜$ telnet register.metsad.ee 80 
GET http://register.metsad.ee/avalik/info_teatis.php? too_id = 2942704201 HTTP / 1.1
Host:register.metsad.ee
User-Agent:Mozilla / 5.0(Windows NT 6.1; WOW64; rv:12.0)Gecko / 20100101 Firefox / 12.0

HTTP / 1.1 200 OK
日期:Thu,13 Dec 2012 20:13:07 GMT
服务器:Apache
到期日:Thu,19 Nov 1981 08:52:00 GMT
Cache-Control:no-store,no-cache,must-revalidate,post-check = 0,pre-check = 0
Pragma:no-cache
Set-Cookie:SNS = a0e425c2aec17c38be3716b366f75749 ; path = /
Transfer-Encoding:chunked
Content-Type:text / html; charset = UTF-8

762
<!DOCTYPE html PUBLIC - // W3C // DTD XHTML 1.0 Transitional // ENhttp://www.w3.org/ TR / xhtml1 / DTD / xhtml1-transitional.dtd>
< html xmlns =http://www.w3.org/1999/xhtml>
...

因此,您需要将下一行添加到:

  curl_setopt($ ch,所以你需要添加CURLOPT_USERAGENT,Mozilla / 5.0(Windows NT 6.1; WOW64; rv:12.0)Gecko / 20100101 Firefox / 12.0); (或任何其他用户代理字符串)。 


Hi i have a domain i'd like to parse with cUrl and here is the case:

When i go on domain http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201

it redirects me to [ register.metsad.ee/avalik/info_teatis.php?too_id=2942704201 ]

its the same thing without http:// www. code i use to parse is:

function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }
$src = 'http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201';

And then $c = get_data($src); echo $c; For resoult i get a blank white page. I also tried with Simple_Html_Dom parser like this:

echo file_get_html($src)->plaintext;

But still i get a blank white page. When i trie to parse without http:// then there is an error that

Warning: file_get_contents(register.metsad.ee/avalik/info_teatis.php?too_id=2942704201) [function.file-get-contents]: failed to open stream: Result too large in C:\xampp\htdocs\Trash\metsakontroll\system\c_simple_html_dom.php on line 70

cUrl gives still white screen, no effect. When i tried to parse it like a folder like this:

http://www.metsad.ee/register/avalik/info_teatis.php?too_id=2942704201 then server says Not Found

i searched the whole internet =/ any ideas how to read that page via cUrl or Simple_html_dom ?

解决方案

There is some kind of protection on register.metsad.ee side. Thay return empty response until User-Agent header is set.

Failed call (empty response):

feedbee@server:~$ telnet register.metsad.ee 80
Trying 213.184.43.115...
Connected to register.metsad.ee.
Escape character is '^]'.
GET /avalik/info_teatis.php?too_id=2942704201 HTTP/1.1
Host: register.metsad.ee

HTTP/1.1 200 OK
Date: Thu, 13 Dec 2012 20:07:11 GMT
Server: Apache
Content-Length: 0
Content-Type: text/html; charset=UTF-8

Successfull call (HTML page returned):

feedbee@server:~$ telnet register.metsad.ee 80
GET http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201 HTTP/1.1
Host: register.metsad.ee
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0

HTTP/1.1 200 OK
Date: Thu, 13 Dec 2012 20:13:07 GMT
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: SNS=a0e425c2aec17c38be3716b366f75749; path=/
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

762
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
...

So you need to add the next line to:

curl_setopt($ch, So you need to add CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0"); for example (or any other user agent string).

这篇关于cUrl一个没有http:// www的域的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆