从网页(标题,图片,头等)获取信息 [英] Get information from a web page (title, pictures, heads, etc...)

查看:104
本文介绍了从网页(标题,图片,头等)获取信息的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

在Facebook中,当您添加一个链接到墙上时,它会获得标题,图片和部分文字。我在其他可以添加链接的网站中看到这种行为,它是如何工作的?有没有名字?是否有任何JavaScript / jQuery扩展程序实现它?

In Facebook, when you add a link to your wall, it gets the title, pictures and part of the text. I've seen this behavior in other websites where you can add links, how does it work? does it has a name? Is there any javascript/jQuery extension that implements it?

如何可能Facebook到另一个网站,得到的HTML,当它,据说是禁止做一个十字架网站ajax call ??

And how is possible that facebook goes to another website and gets the html when it's, supposedly, forbidden to make a cross site ajax call ??

谢谢。

推荐答案

一个PHP服务器端脚本来获取任何网页的内容(查找网页抓取)。什么Facebook是通过ajax通过一个PHP函数调用PHP服务器端脚本调用

You can use a PHP server side script to fetch the contents of any web page (look up web scraping). What facebook does is it throws out a call to a PHP server side script via ajax which has a PHP function called

file_get_contents('http://somesite.com.au'); 

现在,一旦文件或网页被吸入服务器端脚本,就可以过滤内容特别是什么例如。 Facebook的获取链接将寻找标题,img和meta属性=通过正则表达式描述文件或网页的部分

now once the file or webpage has been sucked into your server-side script you can then filter the contents for anything in particular. eg. Facebooks get link will look for the title, img and meta property="description parts of the file or webpage via regular expression

例如PHP的

preg_match(); Function.

这可以收集,然后返回到您的网页。

This can be collected then returned back to your webpage.

您可能还需要考虑添加额外的功能来返回所需的数据,因为某些页面可能需要更长时间比预期返回所需的信息,例如,过滤掉不相关的东西,如javascript,css,irrelavant标签,巨大的图像等,使其运行更快。

You may also want to consider adding extra functions for returning the data you want as scraping some pages may take longer than expected to return your desired information. eg. filter out irrelevant stuff like javascript, css, irrelavant tags, huge images etc. to make it run faster.

如果你得到您可能有潜力建立网络搜索引擎或更好地收集网站,如黄页,例如电话号码,邮寄地址等。

If you get this down pat you could potentialy be on your way to building a web search engine or better yet, collecting data off sites like yellowpages, eg. phone numbers, mailing addresses, etc.

另外您可能想进一步了解:

Also you may want to look further into:

get_meta_tags('http://somesite.com.au');

: - )

这篇关于从网页(标题,图片,头等)获取信息的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆