如何从.html页面中提取链接和标题? [英] how to extract links and titles from a .html page?

查看:1283
本文介绍了如何从.html页面中提取链接和标题?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述



我希望用户能够上传他的书签备份文件(从任何浏览器if可能),所以我可以上传到他们的个人资料,他们不必手动插入他们所有的人...



唯一的一部分我失踪要做到这一点这是从上传的文件中提取标题和URL的部分..任何人都可以提供一个线索,从哪里开始或在哪里阅读?

使用搜索选项和(如何从原始HTML文件中提取数据)这是我的最相关的问题,它不会谈论它。



我真的不介意使用jquery或php



非常感谢你

解决方案

谢谢大家,我知道了!



最后的代码:
这将向您显示所有链接分配的锚点文本和 href 在.html文件中

  $ html = file_get_contents('bookmarks.html'); 
//创建一个新的DOM文档
$ dom = new DOMDocument;

//解析HTML。 @用于抑制任何解析错误
//如果$ html字符串无效XHTML将抛出。
@ $ dom-> loadHTML($ html);

//获取所有链接。您也可以在这里使用任何其他标签名称,
//如'img'或'table',以提取其他标签。
$ links = $ dom-> getElementsByTagName('a');

//遍历提取的链接并显示它们的URL
foreach($ links as $ link){
//提取并显示href属性。
echo $ link-> nodeValue;
echo $ link-> getAttribute('href'),'< br>';
}

再次感谢。


for my website, i'd like to add a new functionality.

I would like user to be able to upload his bookmarks backup file (from any browser if possible) so I can upload it to their profile and they don't have to insert all of them manually...

the only part i'm missing to do this it's the part of extracting title and URL from the uploaded file.. can anyone give a clue where to start or where to read?

used search option and ( how to extract data from a raw html file ) this sis the most related question for mine and it doesn't talk about it..

I really don't mind if its using jquery or php

thank you very much

解决方案

Thank you everyone, i GOT IT!

the final Code: This shows you the anchor text assigned and the href for all links in a .html file

$html = file_get_contents('bookmarks.html');
//Create a new DOM document
$dom = new DOMDocument;

//Parse the HTML. The @ is used to suppress any parsing errors
//that will be thrown if the $html string isn't valid XHTML.
@$dom->loadHTML($html);

//Get all links. You could also use any other tag name here,
//like 'img' or 'table', to extract other tags.
$links = $dom->getElementsByTagName('a');

//Iterate over the extracted links and display their URLs
foreach ($links as $link){
    //Extract and show the "href" attribute.
    echo $link->nodeValue;
    echo $link->getAttribute('href'), '<br>';
}

Again, thanks a lot.

这篇关于如何从.html页面中提取链接和标题?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆