你如何找到“主”给定网址的网站图片? [英] How do you find the "main" picture of a website, given the URL?

查看:126
本文介绍了你如何找到“主”给定网址的网站图片?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

假设您获得 http://nytimes.com
您将如何取消主要的形象?



我问的原因是因为Flipboard能够从网站获取主要图片,只需使用URL。



你可以解析出所有的图片标签。但是什么呢? 解决方案

真的没有任何东西被认为是网页中的主要用HTML或其他来区分这一点。更不用说你可能需要阅读CSS中的所有图像(或者更确切地说是背景图像等)。但是,如果我必须这样做,这是我会做的:


  1. 首先,我会决定一个合适的图像大小,比如400x400最小。 (我不想挑选任何旧图像,真的很小的东西很可能会严重缩小)
  2. 然后,我会遍历页面上的每张图片。
  3. 对于我遇到的每张图片,我都会检查it3的大小。
    如果它是400x400(我的预定义大小)或更大,我会使用此图像。
    如果不是,我会检查它是迄今为止发现的最大图片,如果是这样,请将其信息保存在一边。

  4. 一旦我达到了预定义数量的图像,我已经检查过

    (对于参数可以说10,但肯定你可能会高得多)我会使用我发现的最大的图像(存储在旁边),因为我不想无限期地扫描页面寻找图像!



Let's say you're given http://nytimes.com How would you pull out the "main" image?

The reason I'm asking is because Flipboard is able to grab the main image from a website, just using the URL.

You could parse out all the image tags. But then what?

解决方案

There really isn't anything that is considered the "main" image in a web page--nothing in HTML or otherwise to distinguish this. Not to mention you'd probably have to read all the images in CSS (or rather the background images etc). But if I had to do this, here is what I would do:

  1. First I would decide a suitable image size, lets say a 400x400 minimum. (I don't want to pick any old image, something really small would likely scale horribly)
  2. I would then iterate through each image on the page.2.
  3. For each image I encountered I would check the size of it3. If it was 400x400 (my predefined size) or larger I would use this image. If it wasn't, I would check that its the largest image I've found so far and if so keep its information stored off to the side.
  4. Once I had reached a predefined number of images I've checked

    (for argument lets say 10, but surely you'd probably go much higher) I'd use the largest image I've found (stored off to the side) because I wouldn't want to scan the page indefinitely looking for images!

这篇关于你如何找到“主”给定网址的网站图片?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆