从html页面提取元素 [英] extract elements from a html page

查看：144 发布时间：2018/6/25 14:05:54 html xml phantomjs scrape

本文介绍了从html页面提取元素的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

我下载了一些youtube评论页面，我想从下面的代码块中提取用户名（或用户显示名称）
和链接
：

 < p class =metadata> 
< span class =author> 
< a href =/ channel / UCuoJ_C5xNTrdnc4motXPHIAclass =yt-uix-sessionlink yt-user-namedata-sessionlink =ei = CKG174zFqbQCFZmaIQodtmyE0A％3D％3Ddir =ltr> Sabil穆罕默德< / A> 
< / span> 
< span class =timedir =ltr> 
< a dir =ltrhref =http://www.youtube.com/comment?lc=S2ZH2gSPYaef43vTRkLDxUzo2fYicVUc3SFvmYq2jrs> 
 il y a 1 jour 
< / a> 
< / span> 
< / p>

我想提取
/ channel / UCuoJ_C5xNTrdnc4motXPHIA
和
Sabil Muhammad

当然在html页面有许多行，但我只想把注意力放在上面的代码块上，并提取所有的用户名和相应的链接，把它们放到一个日志文件中

是否有任何好的脚本？
我知道bash和c / c ++

谢谢！

解决方案

您可以使用 jQuery 来迭代所有的<元数据'类并提取你需要的内容：

  //在页面中包含jQuery 
 $（document）.ready（function（）
 {
 //遍历每个元数据标签
 $（'。metadata'）。each（function（）
 {
 //拉取用户名
 var username = $（'。yt-user-name'，this）.text（）; 
 //拉取链接
 var link = $（'。time a'，this）.attr（'href'）; 
 //相应处理
 alert（username +'：'+ link）; 
} ）; 
}）;

工作示例

I download some youtube comment page and I want to extract username(or user display name) and the link like from the following code block:

 <p class="metadata">
      <span class="author ">
        <a href="/channel/UCuoJ_C5xNTrdnc4motXPHIA" class="yt-uix-sessionlink yt-user-name " data-sessionlink="ei=CKG174zFqbQCFZmaIQodtmyE0A%3D%3D" dir="ltr">Sabil Muhammad</a>
      </span>
        <span class="time" dir="ltr">
          <a dir="ltr" href="http://www.youtube.com/comment?lc=S2ZH2gSPYaef43vTRkLDxUzo2fYicVUc3SFvmYq2jrs">
            il y a 1 jour
          </a>
        </span>
    </p>

I want to extract /channel/UCuoJ_C5xNTrdnc4motXPHIA and Sabil Muhammad

there are of course many many lines in the html page, but I only want to focus on code blocks as the above and extract all usernames and corresponding links, and put them into a log file

are there any good scripts for this? I know bash and c/c++

thanks!

解决方案

You could use jQuery to accomplish something like this by iterating through all of the 'metadata' classes and pulling the contents that you need :

//After including jQuery within your page
$(document).ready(function()
{
    //Iterates through each of the metadata tags
    $('.metadata').each(function()
    {
          //Pulls the username
          var username = $('.yt-user-name', this).text();
          //Pulls the link
          var link = $('.time a', this).attr('href');
          //Process each accordingly
          alert(username + ':' + link);
    });
});

Working Example

这篇关于从html页面提取元素的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从html页面提取元素 [英] extract elements from a html page

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录关闭

从html页面提取元素 [英] extract elements from a html page

问题描述

相关文章

前端开发最新文章

热门教程

热门工具

登录 关闭

登录关闭