file_get_contents()给我403 Forbidden [英] file_get_contents() give me 403 Forbidden
问题描述
我有一个合作伙伴为我创建了一些内容。
我可以用我的浏览器访问该页面,但是在尝试用户 file_get_contents $ c $我试过使用
stream_context_create
。 / code>,但这并没有帮助 - 这可能是因为我不知道该怎么办。
1)有没有办法给我刮掉数据?
2)如果没有,并且如果伙伴不允许配置服务器来允许我访问,那么我能做什么呢?
我试过的代码:
$ opts = array(
'http'=> array (
'user_agent'=>'我的公司名称',
'method'=>GET,
'header'=> implode(\r\\\
,array(
'Content-type:text / plain;'
))
)
);
$ context = stream_context_create($ opts);
//获取标题内容
$ _header = file_get_contents($ partner_url,false,$ context);
这不是您的脚本中的问题,它的一个功能在你的合作伙伴的网络服务器安全。
很难说什么阻止你,很可能是它的某种阻止刮。如果您的合作伙伴可以访问他的网络服务器设置,它可能有助于查明。
您可以通过设置用户代理标题来伪造Web浏览器它会模仿一个标准的Web浏览器。
我会推荐cURL来做到这一点,并且很容易找到用于这样做的好文档。
$ b
//创建卷曲资源
$ ch = curl_init();
//设置网址
curl_setopt($ ch,CURLOPT_URL,example.com);
//以字符串的形式返回传输
curl_setopt($ ch,CURLOPT_RETURNTRANSFER,1);
curl_setopt($ ch,CURLOPT_USERAGENT,'Mozilla / 5.0(Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13)Gecko / 20080311 Firefox / 2.0.0.13');
// $ output包含输出字符串
$ output = curl_exec($ ch);
//关闭curl资源以释放系统资源
curl_close($ ch);
I have a partner that has created some content for me to scrape.
I can access the page with my browser, but when trying to user file_get_contents
, I get a 403 forbidden
.
I've tried using stream_context_create
, but that's not helping - it might be because I don't know what should go in there.
1) Is there any way for me to scrape the data?
2) If no, and if partner is not allowed to configure server to allow me access, what can I do then?
The code I've tried using:
$opts = array(
'http'=>array(
'user_agent' => 'My company name',
'method'=>"GET",
'header'=> implode("\r\n", array(
'Content-type: text/plain;'
))
)
);
$context = stream_context_create($opts);
//Get header content
$_header = file_get_contents($partner_url,false, $context);
This is not a problem in your script, its a feature in you partners web server security.
It's hard to say exactly whats blocking you, most likely its some sort of block against scraping. If your partner has access to his web servers setup it might help pinpoint.
What you could do is to "fake a web browser" by setting the user-agent headers so that it imitates a standard web browser.
I would recommend cURL to do this, and it will be easy to find good documentation for doing this.
// create curl resource
$ch = curl_init();
// set url
curl_setopt($ch, CURLOPT_URL, "example.com");
//return the transfer as a string
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch,CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.13) Gecko/20080311 Firefox/2.0.0.13');
// $output contains the output string
$output = curl_exec($ch);
// close curl resource to free up system resources
curl_close($ch);
这篇关于file_get_contents()给我403 Forbidden的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!