如果< base href ...>会发生什么情况是用双斜线设置的? [英] What happens if <base href...> is set with a double slash?
问题描述
我喜欢了解如何为我的网页抓取工具使用< base / base / base / base / base.cfg =< base_href =/>
值,所以我测试了几种主要浏览器的组合,发现了一些双斜杠我不明白。
如果您不喜欢阅读所有内容,跳转至 D 的测试结果, <强>电子即可。演示所有测试:
http://gutt.it/basehref.php
一步一步调用 http://example.com/images.html
的测试结果:
A - 多重基准href
< HTML>
< head>
< base target =_ blank/>
< base href =http://example.com/images//>
< base href =http://example.com//>
< / head>
< body>
< img src =/ images / image.jpg>
< img src =image.jpg>
< img src =./ image.jpg>
< img src =images / image.jpg>未找到
< img src =/ image.jpg>未找到
< img src =../ image.jpg>找不到
< / body>
< / html>
结论
- 只有第一个
< base>
与href
/ li>
- 以
/
开头的源以root为目标 -
.. /
上一个文件夹 -
< base href>
会忽略最后一个斜杠之后的所有内容,例如http:// example。 com / images
变成http://example.com/
B - 无尾斜杠
< html>
< head>
< base href =http://example.com/images/>
< / head>
< body>
< img src =/ images / image.jpg>
< img src =image.jpg>未找到
< img src =./ image.jpg>未找到
< img src =images / image.jpg>
< img src =/ image.jpg>未找到
< img src =../ image.jpg>找不到
< / body>
< / html>
结论
C - 应该如何
< html>
< head>
< base href =http://example.com//>
< / head>
< body>
< img src =/ images / image.jpg>
< img src =image.jpg>未找到
< img src =./ image.jpg>未找到
< img src =images / image.jpg>
< img src =/ image.jpg>未找到
< img src =../ image.jpg>找不到
< / body>
< / html>
结论
- 结果与测试B 中的结果相同
D - Double Slash
< html>
< head>
< base href =http://example.com/images///>
< / head>
< body>
< img src =/ images / image.jpg>
< img src =image.jpg>
< img src =./ image.jpg>
< img src =images / image.jpg>未找到
< img src =/ image.jpg>未找到
< img src =../ image.jpg>
< / body>
< / html>
E - 双斜线与空格
< html>
< head>
< base href =http://example.com/images/ //>
< / head>
< body>
< img src =/ images / image.jpg>
< img src =image.jpg>未找到
< img src =./ image.jpg>未找到
< img src =images / image.jpg>未找到
< img src =/ image.jpg>未找到
< img src =../ image.jpg>
< / body>
< / html>
这两个都不是有效的网址,而是我的网络爬虫的真实结果。请解释 D 和 E 中可能找到 ../ image.jpg
的原因以及为什么会导致空格有什么区别?
仅限于您的兴趣:
< base href =http://example.com///>
与 Test C
< base href =http://example.com/ //>
是完全不同的。只有 ../ image.jpg
被找到
< base href =a //> ;
仅查找 /images/image.jpg
如测试A所示,如果有多个 base
与 href
,文档基址将是第一个。
解析相对网址已完成方式:
应用 作为基本URL,使用编码将
-parser-0rel =nofollow noreferrer> URL解析器 改为 url >作为编码。
URL解析算法在URL规范中定义。
这太复杂了,无法在这里详细解释。但基本上,这是发生了什么: (可能目录和文件不是URL中的正确术语) 一些例子: 请注意,在大多数情况下, 然而,一般 您可以使用以下代码片段解决您的问题相对URL列表为绝对值: 您也可以玩这个片段: I like to understand how to use a If you don't like to read everything jump to the test results of D and E. Demonstration of all tests: Step by step my test results on calling A - Multiple base href Conclusion B - Without trailing slash Conclusion C - How it should be Conclusion D - Double Slash E - Double Slash with whitespace Both are not "valid" URLs, but real results of my web crawler. Please explain what happend in D and E that Only for your interest: The behavior of The As shown in your test A, if there are multiple Resolving relative URLs is done this way: Apply the URL parser to url, with base as the base URL, with encoding as the encoding. The URL parsing algorithm is defined in the URL spec. It's too complex to be explained here in detail. But basically, this is what happens: (Probably, "directory" and "file" are not the proper terminology in URLs) Some examples: Note that, in most cases, Unless you're using some kind of URL rewriting (in which case the
rewriting rules may be affected by the number of slashes), the uri
maps to a path on disk, but in (most?) modern operating systems
(Linux/Unix, Windows), multiple path separators in a row do not have
any special meaning, so /path/to/foo and /path//to////foo would
eventually map to the same file. However, in general You can use the following snippet to resolve your list of relative URLs to absolute ones:
You can also play with this snippet:
这篇关于如果< base href ...>会发生什么情况是用双斜线设置的?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
/
开头的相对URL是计算相对于基本URL的主机。
/
结尾,最后一部分将是一个文件,而不是目录。
./
是当前目录
../
向上移动一个目录
http://example.com/images/a /./
http://example.com/images/a/
http://example.com/images/a/../
是 http://example.com/images/
http://example.com/images//./
是 http://example.com / images //
http://example.c om / images //../
是 http://example.com/images/
http://example.com/images/./
是 http://example.com/images/
http://example.com/images/../
是 http://example.com /
//
会像 /
。正如通过@poncha说的,
<除非你使用某种URL重写(在这种情况下
重写规则可能受到斜杠数量的影响),uri
映射到磁盘上的路径,但是(大多数?)现代操作系统
(Linux / Unix,Windows),一行中的多路径分隔符没有
任何特殊含义,所以/ path / to / foo和/ path // // /// / foo会
最终映射到同一个文件。
/ /
不会变成 //
。
var bases = [http://example.com/images/ ,http://example.com/images,http://example.com/,http://example.com/images//,http://example.com/images/ / ]; var urls = [/images/image.jpg,image.jpg,./image.jpg,images / image.jpg,/image.jpg,../image .jpg];函数newEl(type,contents){var el = document.createElement(type);如果(!内容)返回el; if(!(contentof instanceof Array))contents = [contents]; if(typeof contents [i] =='string')el.appendChild(document.createTextNode(contents [i]))else if(typeof contents [ i] =='object')// contents [i] instanceof Node el.appendChild(contents [i])return el;} function emoticon(str){return {'http://example.com/images/image。 jpg':'good','http://example.com/images//image.jpg':'neutral'} [str] || 'bad';} var base = document.createElement('base'),a = document.createElement('a'),output = document.createElement('ul'),head = document.getElementsByTagName('head')[ 0]; head.insertBefore(base,head.firstChild); for(var i = 0; i< bases.length; ++ i){base.href = bases [i]; var test = newEl('li',['Test'+(i + 1)+':',newEl('span',bases [i])]); test.className ='test'; var testItems = newEl('ul'); testItems.className ='test-items'; for(var j = 0; j
code> span {background:#eef;}。test-items {display:table; border-spacing:.13em;填充左:1.1em; margin-bottom:.3em;}。test-item {display:table-row;位置:相对; list-style:none;}。test-item> span {display:table-cell;}。test-item:before {display:inline-block;宽度:1.1em;身高:1.1em; line-height:1em; text-align:center;边界半径:50%; margin-right:.4em;位置:绝对;左:-1.1em; top:0;}。good:before {content:':)';背景:#0f0;}。neutral:before {content:':|';背景:#ff0;}。bad:before {content:':('; background:#f00;}
var resolveURL =(function(){var base = document.createElement('base'),a = document.createElement('a'),head = document.getElementsByTagName('head') [0]; return function(url,baseurl){if(base){base.href = baseurl; head.insertBefore(base,head.firstChild);} a.href = url; var abs = a.cloneNode(false) .href; / *愚蠢的旧IE需要克隆https://stackoverflow.com/a/24437713/1529630 * / if(base)head.removeChild(base); return abs;};})(); var base = document .getElementById('base'),url = document.getElementById('url'),abs = docum如果(event.propertyName ==value)update()};(base.oninput = url.oninput = update)() ; function update(){abs.value = resolveURL(url.value,base.value);}
label {display:block; margin:1em 0;} input {width:100%;}
<标签>基本网址:< input id =basevalue =http://example.com/images//foo////bar/bazplaceholder =在此输入您的基本网址/>< / label> ;<标签>要解析的网址:< input id =urlvalue =./ a / b /../ cplaceholder =在此处输入您的网址>< / label>< label>结果网址:< input id =absolutereadonly>< / label>
<base href="" />
value for my web crawler, so I tested several combinations with major browsers and finally found something with double slashes I don't understand.
http://gutt.it/basehref.phphttp://example.com/images.html
:<html>
<head>
<base target="_blank" />
<base href="http://example.com/images/" />
<base href="http://example.com/" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg">
<img src="./image.jpg">
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>
<base>
with href
counts/
targets the root../
goes one folder up <html>
<head>
<base href="http://example.com/images" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg">
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>
<base href>
ignores everything after the last slash so http://example.com/images
becomes http://example.com/
<html>
<head>
<base href="http://example.com/" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg">
<img src="/image.jpg"> not found
<img src="../image.jpg"> not found
</body>
</html>
<html>
<head>
<base href="http://example.com/images//" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg">
<img src="./image.jpg">
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg">
</body>
</html>
<html>
<head>
<base href="http://example.com/images/ /" />
</head>
<body>
<img src="/images/image.jpg">
<img src="image.jpg"> not found
<img src="./image.jpg"> not found
<img src="images/image.jpg"> not found
<img src="/image.jpg"> not found
<img src="../image.jpg">
</body>
</html>
../image.jpg
could be found and why causes the whitespace a difference?
<base href="http://example.com//" />
is the same as Test C<base href="http://example.com/ /" />
is completely different. Only ../image.jpg
is found<base href="a/" />
finds only /images/image.jpg
base
is explained in the HTML spec:
base
element allows authors to specify the document base
URL for the purposes of resolving relative URLs.base
with href
, the document base URL will be the first one.
/
is calculated with respect to base URL's host./
, the last part will be a file, not a directory../
is the current directory../
goes one directory up
http://example.com/images/a/./
is http://example.com/images/a/
http://example.com/images/a/../
is http://example.com/images/
http://example.com/images//./
is http://example.com/images//
http://example.com/images//../
is http://example.com/images/
http://example.com/images/./
is http://example.com/images/
http://example.com/images/../
is http://example.com/
//
will be like /
. As said by @poncha,
/ /
won't become //
.var bases = [
"http://example.com/images/",
"http://example.com/images",
"http://example.com/",
"http://example.com/images//",
"http://example.com/images/ /"
];
var urls = [
"/images/image.jpg",
"image.jpg",
"./image.jpg",
"images/image.jpg",
"/image.jpg",
"../image.jpg"
];
function newEl(type, contents) {
var el = document.createElement(type);
if(!contents) return el;
if(!(contents instanceof Array))
contents = [contents];
for(var i=0; i<contents.length; ++i)
if(typeof contents[i] == 'string')
el.appendChild(document.createTextNode(contents[i]))
else if(typeof contents[i] == 'object') // contents[i] instanceof Node
el.appendChild(contents[i])
return el;
}
function emoticon(str) {
return {
'http://example.com/images/image.jpg': 'good',
'http://example.com/images//image.jpg': 'neutral'
}[str] || 'bad';
}
var base = document.createElement('base'),
a = document.createElement('a'),
output = document.createElement('ul'),
head = document.getElementsByTagName('head')[0];
head.insertBefore(base, head.firstChild);
for(var i=0; i<bases.length; ++i) {
base.href = bases[i];
var test = newEl('li', [
'Test ' + (i+1) + ': ',
newEl('span', bases[i])
]);
test.className = 'test';
var testItems = newEl('ul');
testItems.className = 'test-items';
for(var j=0; j<urls.length; ++j) {
a.href = urls[j];
var absURL = a.cloneNode(false).href;
/* Stupid old IE requires cloning
https://stackoverflow.com/a/24437713/1529630 */
var testItem = newEl('li', [
newEl('span', urls[j]),
' → ',
newEl('span', absURL)
]);
testItem.className = 'test-item ' + emoticon(absURL);
testItems.appendChild(testItem);
}
test.appendChild(testItems);
output.appendChild(test);
}
document.body.appendChild(output);
span {
background: #eef;
}
.test-items {
display: table;
border-spacing: .13em;
padding-left: 1.1em;
margin-bottom: .3em;
}
.test-item {
display: table-row;
position: relative;
list-style: none;
}
.test-item > span {
display: table-cell;
}
.test-item:before {
display: inline-block;
width: 1.1em;
height: 1.1em;
line-height: 1em;
text-align: center;
border-radius: 50%;
margin-right: .4em;
position: absolute;
left: -1.1em;
top: 0;
}
.good:before {
content: ':)';
background: #0f0;
}
.neutral:before {
content: ':|';
background: #ff0;
}
.bad:before {
content: ':(';
background: #f00;
}
var resolveURL = (function() {
var base = document.createElement('base'),
a = document.createElement('a'),
head = document.getElementsByTagName('head')[0];
return function(url, baseurl) {
if(base) {
base.href = baseurl;
head.insertBefore(base, head.firstChild);
}
a.href = url;
var abs = a.cloneNode(false).href;
/* Stupid old IE requires cloning
https://stackoverflow.com/a/24437713/1529630 */
if(base)
head.removeChild(base);
return abs;
};
})();
var base = document.getElementById('base'),
url = document.getElementById('url'),
abs = document.getElementById('absolute');
base.onpropertychange = url.onpropertychange = function() {
if (event.propertyName == "value")
update()
};
(base.oninput = url.oninput = update)();
function update() {
abs.value = resolveURL(url.value, base.value);
}
label {
display: block;
margin: 1em 0;
}
input {
width: 100%;
}
<label>
Base url:
<input id="base" value="http://example.com/images//foo////bar/baz"
placeholder="Enter your base url here" />
</label>
<label>
URL to be resolved:
<input id="url" value="./a/b/../c"
placeholder="Enter your URL here">
</label>
<label>
Resulting url:
<input id="absolute" readonly>
</label>