从html页面提取元素 [英] extract elements from a html page
问题描述
我下载了一些youtube评论页面,我想从下面的代码块中提取用户名(或用户显示名称)
和链接
:
< p class =metadata>
< span class =author>
< a href =/ channel / UCuoJ_C5xNTrdnc4motXPHIAclass =yt-uix-sessionlink yt-user-namedata-sessionlink =ei = CKG174zFqbQCFZmaIQodtmyE0A%3D%3Ddir =ltr> Sabil穆罕默德< / A>
< / span>
< span class =timedir =ltr>
< a dir =ltrhref =http://www.youtube.com/comment?lc=S2ZH2gSPYaef43vTRkLDxUzo2fYicVUc3SFvmYq2jrs>
il y a 1 jour
< / a>
< / span>
< / p>
我想提取
/ channel / UCuoJ_C5xNTrdnc4motXPHIA
和
Sabil Muhammad
当然在html页面有许多行,但我只想把注意力放在上面的代码块上,并提取所有的用户名和相应的链接,把它们放到一个日志文件中
是否有任何好的脚本?
我知道bash和c / c ++
谢谢!
您可以使用 jQuery 来迭代所有的<元数据
'类并提取你需要的内容:
//在页面中包含jQuery
$(document).ready(function()
{
//遍历每个元数据标签
$('。metadata')。each(function()
{
//拉取用户名
var username = $('。yt-user-name',this).text();
//拉取链接
var link = $('。time a',this).attr('href');
//相应处理
alert(username +':'+ link);
} );
});
I download some youtube comment page and I want to extract username(or user display name) and the link like from the following code block:
<p class="metadata">
<span class="author ">
<a href="/channel/UCuoJ_C5xNTrdnc4motXPHIA" class="yt-uix-sessionlink yt-user-name " data-sessionlink="ei=CKG174zFqbQCFZmaIQodtmyE0A%3D%3D" dir="ltr">Sabil Muhammad</a>
</span>
<span class="time" dir="ltr">
<a dir="ltr" href="http://www.youtube.com/comment?lc=S2ZH2gSPYaef43vTRkLDxUzo2fYicVUc3SFvmYq2jrs">
il y a 1 jour
</a>
</span>
</p>
I want to extract /channel/UCuoJ_C5xNTrdnc4motXPHIA and Sabil Muhammad
there are of course many many lines in the html page, but I only want to focus on code blocks as the above and extract all usernames and corresponding links, and put them into a log file
are there any good scripts for this? I know bash and c/c++
thanks!
You could use jQuery to accomplish something like this by iterating through all of the 'metadata
' classes and pulling the contents that you need :
//After including jQuery within your page
$(document).ready(function()
{
//Iterates through each of the metadata tags
$('.metadata').each(function()
{
//Pulls the username
var username = $('.yt-user-name', this).text();
//Pulls the link
var link = $('.time a', this).attr('href');
//Process each accordingly
alert(username + ':' + link);
});
});
这篇关于从html页面提取元素的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!