如何为GET请求绕过LWP的URL编码? [英] How may I bypass LWP's URL encoding for a GET request?
问题描述
我正在谈论似乎是损坏的HTTP守护程序,我需要发出一个 GET
请求,其中包括一个管道 | 字符。
I'm talking to what seems to be a broken HTTP daemon and I need to make a GET
request that includes a pipe |
character in the URL.
LWP :: UserAgent
在发送请求之前转义管道字符。
LWP::UserAgent
escapes the pipe character before the request is sent.
例如,网址传递为:
https://hostname/url/doSomethingScript?ss=1234&activities=Lec1|01
传递给HTTP守护程序,
is passed to the HTTP daemon as
https://hostname/url/doSomethingScript?ss=1234&activities=Lec1%7C01
这是正确的,但不适用于此损坏的服务器。
This is correct, but doesn't work with this broken server.
如何覆盖或绕过LWP及其朋友的编码
How can I override or bypass the encoding that LWP and its friends are doing?
注意
我在这里看到并尝试了其他答案在StackOverflow上解决了类似的问题。此处的区别似乎是,这些答案正在处理 POST
请求,可以在其中传递URL的 formfield
部分作为键/值对的数组或'Content'=> $ content
参数。这些方法不适用于LWP请求。
I've seen and tried other answers here on StackOverflow addressing similar problems. The difference here seems to be that those answers are dealing with POST
requests where the formfield
parts of the URL can be passed as an array of key/value pairs or as a 'Content' => $content
parameter. Those approaches aren't working for me with an LWP request.
我也尝试过构建 HTTP :: Request
对象并将其传递给LWP,并将完整的URL直接传递给 LWP-> get()
。
I've also tried constructing an HTTP::Request
object and passing that to LWP, and passing the full URL direct to LWP->get()
. No dice with either approach.
为回应Borodin的请求,这是我正在使用的代码的净化版本
In response to Borodin's request, this is a sanitised version of the code I'm using
#!/usr/local/bin/perl -w
use HTTP::Cookies;
use LWP;
my $debug = 1;
# make a 'browser' object
my $browser = LWP::UserAgent->new();
# cookie handling...
$browser->cookie_jar(HTTP::Cookies->new(
'file' => '.cookie_jar.txt',
'autosave' => 1,
'ignore_discard' => 1,
));
# proxy, so we can watch...
if ($debug == 1) {
$browser->proxy(['http', 'ftp', 'https'], 'http://localhost:8080/');
}
# user agent string (pretend to be Firefox)
$agent = 'Mozilla/5.0 (Windows; U; Windows NT 5.1; en-GB; rv:1.7.12) Gecko/20050919 Firefox/1.0.7';
# set the user agent
$browser->agent($agent);
# do some things here to log in to the web site, accept session cookies, etc.
# These are basic POSTs of filled forms. Works fine.
# [...]
my $baseURL = 'https://hostname/url/doSomethingScript?ss=1234&activities=VALUEA|VALUEB';
@values = ['Lec1', '01', 'Lec1', '02'];
while (1) {
if (scalar(@values) < 2) { last; }
my $vala = shift(@values);
my $valb = shift(@values);
my $url = $basEURL;
$url =~ s/VALUEA/$vala/g;
$url =~ s/VALUEB/$valb/g;
# simplified. Would usually check request for '200' response, etc...
$content = $browser->get($url)->content();
# do something here with the content
# [...]
# fails because the '|' character in the url is escaped after it's handed
# to LWP
}
# end
推荐答案
正如@bchgys在他的评论中提到的,这(几乎)在链接的线程。这里有两个解决方案:
As @bchgys mentions in his comment, this is (almost) answered in the linked thread. Here are two solutions:
第一个也是可以说最干净的方法是本地覆盖URI :: Escape中的转义图以不修改管道字符:
The first and arguably cleanest one is to locally override the escape map in URI::Escape to not modify the pipe character:
use URI;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $res;
{
# Violate RFC 2396 by forcing broken query string
# local makes the override take effect only in the current code block
local $URI::Escape::escapes{'|'} = '|';
$res = $ua->get('http://server/script?q=a|b');
}
print $res->request->as_string, "\n";
或者,您可以简单地通过在请求被请求后直接在请求中修改URI来取消转义。创建:
Alternatively, you can simply undo the escaping by modifying the URI directly in the request after the request has been created:
use HTTP::Request;
use LWP::UserAgent;
my $ua = LWP::UserAgent->new();
my $req = HTTP::Request->new(GET => 'http://server/script?q=a|b');
# Violate RFC 2396 by forcing broken query string
${$req->uri} =~ s/%7C/|/;
my $res = $ua->request($req);
print $res->request->as_string, "\n";
第一个解决方案几乎可以肯定是更可取的,因为它至少依赖于% URI :: Escape :: escapes
包变量,该变量已导出并记录下来,因此与使用受支持的API来完成此操作的距离很近。
The first solution is almost certainly preferable because it at least relies on the %URI::Escape::escapes
package variable which is exported and documented, so that's probably as close as you're gonna get to doing this with a supported API.
请注意,在任何一种情况下,您都违反RFC 2396,但是如上所述,当您与无法控制的损坏服务器通信时,您可能别无选择。
Note that in either case you are in violation of RFC 2396 but as mentioned you may have no choice when talking to a broken server that you have no control over.
这篇关于如何为GET请求绕过LWP的URL编码?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!