分析从mochiweb_html获得的结果 [英] Parsing the result obtained from mochiweb_html

查看:165
本文介绍了分析从mochiweb_html获得的结果的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从html文件(无xml)解析一些内容。

I would like to parse some content from an html file (no xml).

目前,我使用mochiweb_html检索要解析的结构:

At the moment I retrieve the structure to parse using mochiweb_html:

1> inets:start().
2> {ok, {Status, Headers, Body}} = httpc:request("http://www.google.com").
3> {String, Attributes, Other} = mochiweb_html:parse(Body).

,结果如下:

{<<"html">>,
 [{<<"itemscope">>,<<"itemscope">>},
  {<<"itemtype">>,<<"http://schema.org/WebPage">>}],
 [{<<"head">>,[],
   [{<<"meta">>,
     [{<<"itemprop">>,<<"image">>},
      {<<"content">>,<<"/images/google_favicon_128.png">>}],
     []},
    {<<"title">>,[],[<<"Google">>]},
....

从mochiweb_http获取的结构中检索所有元素的最佳方式是什么具有特定类别的特定标签(例如,< span id =footer> )?

What is the best way to retrieve from a structure obtained from mochiweb_http all the elements in the web page that have a specific tag with a specific class (e.g., <span id="footer">)?

推荐答案

您可以使用 mochiweb_xpath

> mochiweb_xpath:execute("//span[@id='footer']",
    mochiweb_html:parse(
      "<html><body><span>not this one</span><span id='footer'>but this one</span></body></html>")).
[{<<"span">>,
  [{<<"id">>,<<"footer">>}],
  [<<"but this one">>]}]

这篇关于分析从mochiweb_html获得的结果的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆