cURL从重定向获取url [英] cURL get url from redirect

查看:239
本文介绍了cURL从重定向获取url的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我目前正在使用cURL尝试从网站刮板的重定向中获取网址。我只需要从网站的url。我研究过stackoverflow和其他网站过去几天,并没有成功。我目前使用的代码来自此网站:

I'm currently using cURL to try and get the URL from a redirect for a website scraper. I only need the url from the website. I've researched on stackoverflow and other sites for the past couple days and have been unsuccessful. The code I'm currently using is from this website:

  $url = "http://www.someredirect.com";
  $ch = curl_init($url);
  curl_setopt($ch, CURLOPT_USERAGENT,'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-US; rv:1.8.1.1) Gecko/20061204 Firefox/2.0.0.1');         
  curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
  curl_setopt($ch, CURLOPT_HEADER, true);
  curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
  curl_setopt($ch, CURLOPT_NOBODY, true);
  $response = curl_exec($ch);
  preg_match_all('/^Location:(.*)$/mi', $response, $matches);
  curl_close($ch);
  echo !empty($matches[1]) ? trim($matches[1][0]) : 'No redirect found';

任何帮助都将非常感谢!

Any help would be greatly appreciated!

推荐答案

在您的特定情况下,服务器正在检查某些用户代理字符串。

In your particular case, the server is checking for certain user-agent strings.

字符串,当服务器看到有效(根据服务器)用户代理时,它将只响应一个 302 重定向状态代码。任何无效用户代理都不会收到 302 重定向状态代码响应或位置:头。

When a server checks the user-agent string, it will only respond with a 302 redirect status code when the server sees a "valid" (according to the server) user-agent. Any "invalid" user-agents will not receive the 302 redirect status code response or Location: header.

在特定情况下,当服务器收到来自无效用户代理的请求时,它会以 200 OK

In your particular case, when the server receives a request from an "invalid" user-agent it responds with a 200 OK status code with no text in the response body.

注意:在下面的代码中,提供的实际网址已被替换为示例。 )

(Note: in the code below, the actual URLs provided have been replaced with examples.)

假设 http://www.example.com 的服务器检查User-Agent字符串 http://www.example.com/product/123/ 重定向到 http://www.example.org/abc

Let's say that http://www.example.com's server checks the User-Agent string and that http://www.example.com/product/123/ redirects to http://www.example.org/abc.

在PHP中,您的解决方案是:

In PHP your solution would be:

<?php

$url = 'http://www.example.com/product/123/';

$ch = curl_init($url);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, false);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, true);
curl_setopt($ch, CURLOPT_HEADER, true);
curl_setopt($ch, CURLOPT_USERAGENT, "Mozilla/5.0 (X11; Linux x86_64; rv:21.0) Gecko/20100101 Firefox/21.0"); // Necessary. The server checks for a valid User-Agent.
curl_exec($ch);

$response = curl_exec($ch);
preg_match_all('/^Location:(.*)$/mi', $response, $matches);
curl_close($ch);

echo !empty($matches[1]) ? trim($matches[1][0]) : 'No redirect found';

并且,此脚本的输出将是: http:// www .example.org / abc

And, the output of this script would be: http://www.example.org/abc.

这篇关于cURL从重定向获取url的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆