获得外部网页图像的绝对路径 [英] Getting Absolute Path of External Web Page Images
问题描述
我正在使用bookmarklet,我正在使用HTML DOM解析器获取任何外部页面的所有照片(如之前的SO答案所提到的)。我正在获取照片,并显示在我的书签中弹出。但我对照片的相对路径有问题。
I am working on bookmarklet and I am fetching all the photos of any external page using HTML DOM parser(As suggested earlier by SO answer). I am fetching the photos correctly and displaying that in my bookmarklet pop up. But I am having problem with the relative path of photos.
例如外部页面上的照片来源说 http://www.example.com/dir/index.php
for example the photo source on external page say http://www.example.com/dir/index.php
-
照片来源1:img source ='hostname / photos / photo.jpg' - 获取照片是绝对的
照片来源2:img source = '/photos/photo.jpg' - 没有得到它不是绝对的。
photo Source 2 : img source='/photos/photo.jpg' - not getting as it is not absolute.
我通过当前的URL我的意思是使用dirname或pathinfo来获取当前url的目录。但是在主机/目录/(给主机作为父目录)和host / dir / index.php(host / dir作为父目录是正确的)之间导致问题。
I worked through the current url I mean using dirname or pathinfo for getting directory by current url. but causes problem between host/dir/ (gives host as parent directory ) and host/dir/index.php (host/dir as parent directory which is correct)
请帮助我如何获得这些相对照片?
Please help How can I get these relative photos ??
推荐答案
FIXED (添加对查询字符串的支持只有图像路径)
FIXED (added support for query-string only image paths)
function make_absolute_path ($baseUrl, $relativePath) {
// Parse URLs, return FALSE on failure
if ((!$baseParts = parse_url($baseUrl)) || (!$pathParts = parse_url($relativePath))) {
return FALSE;
}
// Work-around for pre- 5.4.7 bug in parse_url() for relative protocols
if (empty($baseParts['host']) && !empty($baseParts['path']) && substr($baseParts['path'], 0, 2) === '//') {
$parts = explode('/', ltrim($baseParts['path'], '/'));
$baseParts['host'] = array_shift($parts);
$baseParts['path'] = '/'.implode('/', $parts);
}
if (empty($pathParts['host']) && !empty($pathParts['path']) && substr($pathParts['path'], 0, 2) === '//') {
$parts = explode('/', ltrim($pathParts['path'], '/'));
$pathParts['host'] = array_shift($parts);
$pathParts['path'] = '/'.implode('/', $parts);
}
// Relative path has a host component, just return it
if (!empty($pathParts['host'])) {
return $relativePath;
}
// Normalise base URL (fill in missing info)
// If base URL doesn't have a host component return error
if (empty($baseParts['host'])) {
return FALSE;
}
if (empty($baseParts['path'])) {
$baseParts['path'] = '/';
}
if (empty($baseParts['scheme'])) {
$baseParts['scheme'] = 'http';
}
// Start constructing return value
$result = $baseParts['scheme'].'://';
// Add username/password if any
if (!empty($baseParts['user'])) {
$result .= $baseParts['user'];
if (!empty($baseParts['pass'])) {
$result .= ":{$baseParts['pass']}";
}
$result .= '@';
}
// Add host/port
$result .= !empty($baseParts['port']) ? "{$baseParts['host']}:{$baseParts['port']}" : $baseParts['host'];
// Inspect relative path path
if ($relativePath[0] === '/') {
// Leading / means from root
$result .= $relativePath;
} else if ($relativePath[0] === '?') {
// Leading ? means query the existing URL
$result .= $baseParts['path'].$relativePath;
} else {
// Get the current working directory
$resultPath = rtrim(substr($baseParts['path'], -1) === '/' ? trim($baseParts['path']) : str_replace('\\', '/', dirname(trim($baseParts['path']))), '/');
// Split the image path into components and loop them
foreach (explode('/', $relativePath) as $pathComponent) {
switch ($pathComponent) {
case '': case '.':
// a single dot means "this directory" and can be skipped
// an empty space is a mistake on somebodies part, and can also be skipped
break;
case '..':
// a double dot means "up a directory"
$resultPath = rtrim(str_replace('\\', '/', dirname($resultPath)), '/');
break;
default:
// anything else can be added to the path
$resultPath .= "/$pathComponent";
break;
}
}
// Add path to result
$result .= $resultPath;
}
return $result;
}
测试:
echo make_absolute_path('http://www.example.com/dir/index.php','/photos/photo.jpg')."\n";
// Outputs: http://www.example.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','photos/photo.jpg')."\n";
// Outputs: http://www.example.com/dir/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','./photos/photo.jpg')."\n";
// Outputs: http://www.example.com/dir/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','../photos/photo.jpg')."\n";
// Outputs: http://www.example.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','http://www.yyy.com/photos/photo.jpg')."\n";
// Outputs: http://www.yyy.com/photos/photo.jpg
echo make_absolute_path('http://www.example.com/dir/index.php','?query=something')."\n";
// Outputs: http://www.example.com/dir/index.php?query=something
我认为应该处理你可能正确遇到的一切,并且应该大致相当于浏览器使用的逻辑。还应该更正您使用 dirname()
的错误正斜杠在Windows上可能得到的任何错误。
I think that should deal with just about everything your likely to encounter correctly, and should equate to roughly the logic used by a browser. Also should correct any oddities you might get on Windows with stray forward slashes from using dirname()
.
第一个参数是您找到< img>
(或< a> $)的页面的完整 c $ c>或其他),第二个参数是
src
/ href
etc属性的内容。
First argument is the full URL of the page where you found the <img>
(or <a>
or whatever) and second argument is the contents of the src
/href
etc attribute.
如果有人发现一些不起作用的东西(我知道你们都会试图打破它:-D),让我知道,我会尝试修复它。
If anyone finds something that doesn't work (cos I know you'll all be trying to break it :-D), let me know and I'll try and fix it.
这篇关于获得外部网页图像的绝对路径的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!