如何从< a href>获取文本在嵌套的html元素使用jericho? [英] how to get text from <a href> in nested html elements using jericho?

查看:168
本文介绍了如何从< a href>获取文本在嵌套的html元素使用jericho?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有这样的html代码

 < div class =itm hasOverlay lastrow> 
< / a>
< div class =itm-overlay itm-group-mainbox-with-group>< / div>
< / div>

我应该怎么做文字 league-sepatu-casual-geof-sl-lo -hitambiru-68166.html

< a href =联盟sepatu-casual-geof-sl-lo-hitambiru-68166.html> ;

这应该是相当简单的...

  Source source = new Source(new StringReader(inputString)); 
元素aElement = source.getFirstElement(HTMLElementName.A);
String href = aElement.getAttributeValue(href);
System.out.println(href);

...虽然这当然有一些假设:也就是说, inputString 只是您发布的字符串(并且此部分未包含在其他标记中),并且此部分仅包含单个链接( A )。



(如果这些假设无效,则必须识别此特定的 div 和正确的<$ c例如,通过搜索属性 class =itm hasOverlay lastrow的 div ,对于 a 与类 class =itm-link itm -drk trackingOnClick > - 无论如何,人们必须更多地了解从中提取这些信息的文档的实际结构

I have some html code like this

<div class="itm hasOverlay lastrow">
<a id="3:LE343SPABGLIANID" class="itm-link itm-drk trackingOnClick" title="League Sepatu Casual Geof S/L LO - Hitam/Biru" href="league-sepatu-casual-geof-sl-lo-hitambiru-68166.html" rel="-standard|">
</a>
<div class="itm-overlay itm-group-mainbox-with-group"></div>
</div>

What should I do to get text league-sepatu-casual-geof-sl-lo-hitambiru-68166.html in

<a href="league-sepatu-casual-geof-sl-lo-hitambiru-68166.html">?

解决方案

That should be rather simple...

Source source=new Source(new StringReader(inputString));
Element aElement = source.getFirstElement(HTMLElementName.A);
String href = aElement.getAttributeValue("href");
System.out.println(href);

... although this makes some assumptions, of course: Namely, that the inputString is only the string that you posted (and that this part is not enclosed in other tags), and that this part only contains a single link (a).

(If these assumptions are not valid, one somehow has to identify this particular div and the correct a tag. For example, by searching for a div with the attribute class="itm hasOverlay lastrow" and for a a with the class class="itm-link itm-drk trackingOnClick" - in any case, one has to know more about the actual structure of the document from which this information should be extracted)

这篇关于如何从&lt; a href&gt;获取文本在嵌套的html元素使用jericho?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆