从Java中的xhtml收集信息:解析器+访问者? [英] Gather info from xhtml in java: parser + visitor?

查看:114
本文介绍了从Java中的xhtml收集信息:解析器+访问者?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我必须编写一段代码来加载远程网页,搜索链接,访问那些页面并从某些标签中收集一些信息...

I have to write a piece of code that loads a remote web page, search for the links, visit those pages and gather some info from certain tags...

您将如何执行此操作?访客模式在这里有帮助吗?如果是这样,我该怎么用?

How would you do this? Is the visitor pattern of any help here? If so, how could I use it?

谢谢

推荐答案

一些评论/建议

  • 不确定访客模式是否适合这里.访问者模式的典型场景是操作算法根据应用该算法的对象而有所不同.
  • 解决此问题的粗略方法是将算法嵌入相关的对象本身,但这相当于混合数据和操作(出于关注分离的精神)
  • 访问者模式在这里帮助我们将算法与应用其的数据分开.
  • 查看示例,以更好地了解访问者模式.
  • Not sure if the visitor patter is a good fit over here. A typical scenario for the visitor pattern is when the operation algorithm differs depending on the object on which the algorithm is applied.
  • The crude way to solve this is to embed the algorithm in the concerned Object itself but this amounts to mixing data and operation (against the spirit of Separation of Concern)
  • Visitor pattern helps us here to separate the the algorithm from the data on which its applied.
  • Please check out an example for better understanding of visitor pattern.

以您的情况

  • 对象网页链接操作访问 parse 提取信息.
  • 相同操作集应用于所有网页和链接.
  • 因此,此处的操作算法不会因不同的网页和链接而改变,因此访问者模式不适合.
  • 从技术上讲,您仍然可以使用访客模式,但这不是它的用途.
  • Objects are the web page, Links and Operations are Visit, parse, extract information.
  • The same set of operations are applied on all the web pages and links.
  • So here the operation algorithm is not changing for different web pages and links and hence the visitor pattern is not suitable.
  • Technically you can still use visitor pattern, but that's not what it is for.

对于您的问题,

  • 我认为它不是非常复杂的设计问题.某些模式似乎可以解决问题,例如命令模式(命令:extractLinkFromPagevisitLinkAndParseTags),但是IMO,对于这个简单的问题,这将是 overkill .
  • 我建议一种在实用程序类中托管逻辑并在您的调用程序中使用该逻辑的简单方法,
  • I think its not very complicated design problem. Some patterns might seem to solve the problem like Command Pattern ( Commands: extractLinkFromPage, visitLinkAndParseTags), but IMO, it will be overkill for this simple problem.
  • I would suggest a simple way of hosting the logic in a utility class and using the same from your calling program,
 class WebUtility{
 public List<String> parseLinks(String remotePageAddress){
 //Parse links
 }   
 public TageInfo extractTageInfo(String pageURL){
 //Extract the Tag information 
 }
 }

根据您的要求,TagInfo类将是pojo.

Here the TagInfo class will be a pojo as per your requirement.

此类是无状态的,可以用作singleton. (可选),您可以将构造函数设为私有,并将方法设为静态.

This class is stateless and can be used as singleton. Optionally you can make the constructor private and method static.

一旦有了这个,就可以调用parseLinks来获取链接,然后通过调用extractTageInfo方法遍历链接列表以从每个链接中获取标签信息.

Once you have this, you can invoke parseLinks to get the links and then loop through the list of links to get the tag information from each link by invoking extractTageInfo method.

这篇关于从Java中的xhtml收集信息:解析器+访问者?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆