file_get_contents()给我403 Forbidden [英] file_get_contents() give me 403 Forbidden

查看:961
本文介绍了file_get_contents()给我403 Forbidden的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个合作伙伴为我创建了一些内容。

我可以用我的浏览器访问该页面,但是在尝试用户 file_get_contents stream_context_create 403禁止 / code>,但这并没有帮助 - 这可能是因为我不知道该怎么办。



1)有没有办法给我刮掉数据?

2)如果没有,并且如果伙伴不允许配置服务器来允许我访问,那么我能做什么呢?



我试过的代码:

  $ opts = array(
'http'=> array (
'user_agent'=>'我的公司名称',
'method'=>GET,
'header'=> implode(\r\\\
,array(
'Content-type:text / plain;'
))

);

$ context = stream_context_create($ opts);

//获取标题内容
$ _header = file_get_contents($ partner_url,false,$ context);


解决方案

这不是您的脚本中的问题,它的一个功能在你的合作伙伴的网络服务器安全。



很难说什么阻止你,很可能是它的某种阻止刮。如果您的合作伙伴可以访问他的网络服务器设置,它可能有助于查明。



您可以通过设置用户代理标题来伪造Web浏览器它会模仿一个标准的Web浏览器。



我会推荐cURL来做到这一点,并且很容易找到用于这样做的好文档。
$ b

  //创建卷曲资源
$ ch = curl_init();

//设置网址
curl_setopt($ ch,CURLOPT_URL,example.com);

//以字符串的形式返回传输
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ ch,CURLOPT_USERAGENT,'Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13)Gecko / 20080311 Firefox / 2.0.0.13');

// $ output包含输出字符串
$ output = curl_exec($ ch);

//关闭curl资源以释放系统资源
curl_close($ ch);


I have a partner that has created some content for me to scrape.
I can access the page with my browser, but when trying to user file_get_contents, I get a 403 forbidden.

I've tried using stream_context_create, but that's not helping - it might be because I don't know what should go in there.

1) Is there any way for me to scrape the data?
2) If no, and if partner is not allowed to configure server to allow me access, what can I do then?

The code I've tried using:

$opts = array(
  'http'=>array(
    'user_agent' => 'My company name',
    'method'=>"GET",
    'header'=> implode("\r\n", array(
      'Content-type: text/plain;'
    ))
  )
);

$context = stream_context_create($opts);

//Get header content
$_header = file_get_contents($partner_url,false, $context);

解决方案

This is not a problem in your script, its a feature in you partners web server security.

It's hard to say exactly whats blocking you, most likely its some sort of block against scraping. If your partner has access to his web servers setup it might help pinpoint.

What you could do is to "fake a web browser" by setting the user-agent headers so that it imitates a standard web browser.

I would recommend cURL to do this, and it will be easy to find good documentation for doing this.

    // create curl resource
    $ch = curl_init();

    // set url
    curl_setopt($ch, CURLOPT_URL, "example.com");

    //return the transfer as a string
    curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
    curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');

    // $output contains the output string
    $output = curl_exec($ch);

    // close curl resource to free up system resources
    curl_close($ch); 

这篇关于file_get_contents()给我403 Forbidden的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆