cUrl一个没有http：// www的域 [英] cUrl a domain without http://www

查看：169 发布时间：2017/3/6 5:28:10 curl subdomain

本文介绍了cUrl一个没有http：// www的域的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

您好，我有一个域我想解析与cUrl和这里的情况：

当我去域 http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201

会将我重新导向至[register.metsad.ee/avalik/info_teatis.php?too_id=2942704201]

没有http：// www。
用于解析的代码是：

 函数get_data（$ url）{
 $ ch = curl_init （）; 
 $ timeout = 5; 
 curl_setopt（$ ch，CURLOPT_URL，$ url）; 
 curl_setopt（$ ch，CURLOPT_RETURNTRANSFER，1）; 
 curl_setopt（$ ch，CURLOPT_FOLLOWLOCATION，1）; 
 curl_setopt（$ ch，CURLOPT_MAXREDIRS，10）; 
 curl_setopt（$ ch，CURLOPT_CONNECTTIMEOUT，$ timeout）; 
 $ data = curl_exec（$ ch）; 
 curl_close（$ ch）; 
 return $ data; 
} 
 $ src ='http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201';

然后$ c = get_data（$ src）; echo $ c;
对于resoult我得到一个空白的白页。我还尝试过Simple_Html_Dom解析器，像这样：

echo file_get_html（$ src） - > plaintext; p>

但是我仍然得到一个空白的白页。当我trie解析没有http：//那么有一个错误，

警告：file_get_contents（register.metsad.ee/avalik/info_teatis .php？too_id = 2942704201）[function.file-get-contents]：无法打开流：结果太大C：\xampp\htdocs\Trash\metsakontroll\system\c_simple_html_dom.php on line 70

cUrl仍然呈现白屏，无效果。当我试图解析它像一个文件夹像这样：

http://www.metsad.ee/register/avalik/info_teatis.php?too_id=2942704201 然后服务器说找不到

我搜索整个互联网= /任何想法如何通过cUrl或Simple_html_dom阅读该页面？

解决方案

在register.metsad.ee端有一些保护。

调用失败（空响应）

：

 > 
 
 
  feedbee @ server：〜$ telnet register.metsad.ee 80 
尝试213.184.43.115 ... 
已连接到register.metsad.ee。 
转义字符是'^]'。 
 GET /avalik/info_teatis.php?too_id=2942704201 HTTP / 1.1 
 Host：register.metsad.ee 
 
 HTTP / 1.1 200 OK 
日期：Thu， 13 Dec 2012 20:07:11 GMT 
服务器：Apache 
 Content-Length：0 
 Content-Type：text / html; charset = UTF-8 
  
成功调用（返回HTML页面）：
  feedbee @ server：〜$ telnet register.metsad.ee 80 
 GET http://register.metsad.ee/avalik/info_teatis.php？ too_id = 2942704201 HTTP / 1.1 
 Host：register.metsad.ee 
 User-Agent：Mozilla / 5.0（Windows NT 6.1; WOW64; rv：12.0）Gecko / 20100101 Firefox / 12.0 
 
 HTTP / 1.1 200 OK 
日期：Thu，13 Dec 2012 20:13:07 GMT 
服务器：Apache 
到期日：Thu，19 Nov 1981 08:52:00 GMT 
 Cache-Control：no-store，no-cache，must-revalidate，post-check = 0，pre-check = 0 
 Pragma：no-cache 
 Set-Cookie：SNS = a0e425c2aec17c38be3716b366f75749 ; path = / 
 Transfer-Encoding：chunked 
 Content-Type：text / html; charset = UTF-8 
 
 762 
<！DOCTYPE html PUBLIC -  // W3C // DTD XHTML 1.0 Transitional // ENhttp://www.w3.org/ TR / xhtml1 / DTD / xhtml1-transitional.dtd> 
< html xmlns =http://www.w3.org/1999/xhtml> 
 ... 
  
因此，您需要将下一行添加到：
  curl_setopt（$ ch，所以你需要添加CURLOPT_USERAGENT，Mozilla / 5.0（Windows NT 6.1; WOW64; rv：12.0）Gecko / 20100101 Firefox / 12.0）; （或任何其他用户代理字符串）。 
  
 
Hi i have a domain i'd like to parse with cUrl and here is the case:

When i go on domain http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201

it redirects me to [ register.metsad.ee/avalik/info_teatis.php?too_id=2942704201 ]

its the same thing without http:// www.
code i use to parse is: 
function get_data($url) {
        $ch = curl_init();
        $timeout = 5;
        curl_setopt($ch, CURLOPT_URL, $url);
        curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
        curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);
        curl_setopt($ch, CURLOPT_MAXREDIRS, 10);
        curl_setopt($ch, CURLOPT_CONNECTTIMEOUT, $timeout);
        $data = curl_exec($ch);
        curl_close($ch);
        return $data;
    }
$src = 'http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201';
And then $c = get_data($src);
        echo $c;
For resoult i get a blank white page. I also tried with Simple_Html_Dom parser like this: 

echo file_get_html($src)->plaintext;

But still i get a blank white page. When i trie to parse without http:// then there is an error that 

Warning: file_get_contents(register.metsad.ee/avalik/info_teatis.php?too_id=2942704201) [function.file-get-contents]: failed to open stream: Result too large in C:\xampp\htdocs\Trash\metsakontroll\system\c_simple_html_dom.php on line 70

cUrl gives still white screen, no effect. When i tried to parse it like a folder like this: 

http://www.metsad.ee/register/avalik/info_teatis.php?too_id=2942704201 then server says Not Found 

i searched the whole internet =/ any ideas how to read that page via cUrl or Simple_html_dom ?
 解决方案 
There is some kind of protection on register.metsad.ee side. Thay return empty response until User-Agent header is set.

Failed call (empty response):
feedbee@server:~$ telnet register.metsad.ee 80
Trying 213.184.43.115...
Connected to register.metsad.ee.
Escape character is '^]'.
GET /avalik/info_teatis.php?too_id=2942704201 HTTP/1.1
Host: register.metsad.ee

HTTP/1.1 200 OK
Date: Thu, 13 Dec 2012 20:07:11 GMT
Server: Apache
Content-Length: 0
Content-Type: text/html; charset=UTF-8
Successfull call (HTML page returned):
feedbee@server:~$ telnet register.metsad.ee 80
GET http://register.metsad.ee/avalik/info_teatis.php?too_id=2942704201 HTTP/1.1
Host: register.metsad.ee
User-Agent: Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0

HTTP/1.1 200 OK
Date: Thu, 13 Dec 2012 20:13:07 GMT
Server: Apache
Expires: Thu, 19 Nov 1981 08:52:00 GMT
Cache-Control: no-store, no-cache, must-revalidate, post-check=0, pre-check=0
Pragma: no-cache
Set-Cookie: SNS=a0e425c2aec17c38be3716b366f75749; path=/
Transfer-Encoding: chunked
Content-Type: text/html; charset=UTF-8

762
<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<html xmlns="http://www.w3.org/1999/xhtml">
...
So you need to add the next line to:
curl_setopt($ch, So you need to add CURLOPT_USERAGENT, "Mozilla/5.0 (Windows NT 6.1; WOW64; rv:12.0) Gecko/20100101 Firefox/12.0"); for example (or any other user agent string).


                        
这篇关于cUrl一个没有http：// www的域的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！


                    
                        查看全文

cUrl一个没有http：// www的域 [英] cUrl a domain without http://www

问题描述

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

cUrl一个没有http：// www的域 [英] cUrl a domain without http://www

问题描述

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭