修复php图像刮刀代码,以便在不同情况下更灵活 [英] fixing php image scraper code to be more flexible in different situations

查看:184
本文介绍了修复php图像刮刀代码,以便在不同情况下更灵活的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我能够构建一些代码,从下面的网站抓取图像,每次图像链接都是随机的,并在另一个网站上镜像。虽然这很好用,但我无法将此格式复制到任何其他网站上。我看到图像被getElementbyId文件抓取,但是从原始源代码中有很多引用文件,所以我有点卡住了。非常了解php。

I was able to construct some code that grabs an image from the below website where the image link will be random every time, and mirrors it on another site. While it's great that this works, I'm unable to copy this format onto any other site. I see that the image is being grabbed with a getElementbyId "file," but from the original source code there are many many refrences to "file," so I'm a bit stuck. very knew to php still.

我正在尝试做的是复制以下结果,但是在任何具有特定图像的网站上。

What I'm trying to be able to do, is replicate the below result, but on any site with a particular image.

 <?php
$html = 
file_get_contents("http://commons.wikimedia.org/wiki/Special:Random/File");
$dom = new DOMDocument();
$dom->loadHTML($html);
$remoteImage = $dom->getElementById("file")->firstChild->attributes[0]-
>textContent;
header("Content-type: image/png");
header('Content-Length: ' . filesize($remoteImage));
echo file_get_contents($remoteImage);
?>

试图找出如何在此网站上复制ex https://pokemondb.net/pokedex/wartortle

Trying to figure out how I could reproduce that on this site for ex https://pokemondb.net/pokedex/wartortle

我在哪里我试图拉动wartortle.jpg

where I'm trying to pull the wartortle.jpg

我最初的想法是,如果不知道图像假设的名称是什么,因为我希望这在随机条件下工作,是用它的标签识别图像< div class =colset>

My initial idea if not knowing exactly what the image would hypothetically be named, since I want this to work during random conditions, is to identify the image with it's tag < div class="colset">

唉,插入colset而不是file虽然不能解决问题。

Alas, plugging in "colset" instead of "file" didn't do the trick though.

有什么想法?非常感谢。-Wilson

Any thoughts?? Thanks so much.-Wilson

推荐答案

使用XPath总是更灵活(尽管可能比其他解决方案慢)。使用前面的示例,您可以使用以下内容获取文件名...

Using XPath is always a lot more flexible (although probably slower than other solutions). Using the previous example you could use the following to get the file name...

<?php
ob_start();
$doc = new DOMDocument;

$doc->loadHTMLFile('https://pokemondb.net/pokedex/wartortle');

$xpath = new DOMXPath($doc);

$query = "//li[@id='svtabs_basic_8']//img/@src";
ob_end_clean();
header('content-type: image/jpeg');
$entries = $xpath->query($query);
foreach ($entries as $entry) {
    readfile((string)$entry->value);
}

我添加了ob_start和ob_end_clean来删除xml验证错误。

I've added the ob_start and ob_end_clean to remove the xml validation errors.

这篇关于修复php图像刮刀代码,以便在不同情况下更灵活的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆