一个HTML解析类的构造函数应该做多少工作? [英] How much work should the constructor for an HTML parsing class do?

查看:101
本文介绍了一个HTML解析类的构造函数应该做多少工作?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

对象构造函数要做多少工作是合理的?



背景: $

b $ b我正在写一个类,负责解析一个HTML页面并返回基于解析信息的各种信息。类的设计使得类的构造函数进行解析,如果发生错误则抛出异常。一旦实例被初始化,解析的值就可以在没有通过访问器的进一步处理的情况下使用。例如:

  public class Parser {

public Parser(final String html)throws ParsingException {
/ *解析设置私有字段的逻辑* /
/ *如果有错误则抛出错误* /
}

public int getNumOfWhatevers私人领域}
public String getOtherValue(){return other private field; }
}

设计类后,我开始怀疑这是否正确。如果解析代码放置在 void parseHtml()方法中,并且访问器只有在调用此方法时才返回有效值?我觉得我的实现是正确的,但我不禁感到一些OO纯粹主义者可能会发现它不正确的一些原因,一个实现如下面会更好:

  public class Parser {

public Parser(final String html){
/ *记住html以供以后解析。 * /
}

public void parseHtml()throws ParsingException {
/ *解析设置私有字段的逻辑* /
/ *错误。* /
}

public int getNumOfWhatevers(){return private field; }
public String getOtherValue(){return other private field; }
}

是否存在初始化代码(如解析信息)不应出现的实例



从构造函数中拆分解析有什么好处/缺点?



想法?见解?

解决方案

我通常遵循一个简单的原则:



strong>为类实例的正确存在和行为所强制的所有内容都应该传递并传递给构造函数。

每个其他活动都由其他方法。



构造函数不应该:







因为使用重写行为的目的而使用覆盖行为

  • 我学到了困难的方式,当你在构造函数中,对象处于一个不连贯的中间状态,这是太危险的处理。一些这种意外的行为可以从你的代码,可能是从语言架构和编译器决定。



    在你的情况下,我会使用Parser :: parseHtml(file)方法。解析器的实例化和解析是两个不同的操作。当你实例化一个解析器时,构造函数将它放在执行其作业(解析)的条件中。然后使用其方法执行解析。然后有两个选择:


    1. 您可以允许解析器包含解析的结果,并为客户端提供一个接口,解析信息(例如Parser :: getFooValue())。如果您尚未执行解析,或解析失败,方法将返回Null。

    2. 或您的Parser :: parseHtml()返回一个ParsingResult实例,包含Parser找到的内容。

    第二个策略为您提供更好的粒度,因为Parser现在是无状态的,客户端需要与ParsingResult的方法交互接口。解析器接口保持光滑和简单。 Parser类的内部结构将遵循 Builder模式



    您的意见:我觉得,如果返回一个解析器的实例没有解析任何东西(如你的建议),一个构造函数失去了它的目的。一个解析器没有实际解析信息的意图所以如果解析肯定会发生,我们应该尽早解析和报告和错误早期,如在解析器的构造过程中我感觉好像初始化一个解析器



    不是真的。如果你返回一个解析器的实例,当然它将解析。在Qt中,当你实例化一个按钮,当然会显示。然而,你有方法QWidget :: show()手动调用之前的东西是可见的用户。



    OOP中的任何对象都有两个问题:初始化和操作(忽略最终化,现在不在讨论)。如果将这两个操作保持在一起,您都会遇到麻烦(具有不完整的对象操作)并且失去灵活性。在调用parseHtml()之前,为什么要执行对象的中间设置有很多原因。示例:假设您要将解析器配置为严格(因此,如果表中的给定列包含字符串而不是整数,则为失败)或允许。或者注册一个侦听器对象,每次执行或结束新的解析时都会警告它(认为GUI进度条)。这些是可选信息,如果您的架构将构造函数作为übermethod执行所有操作,则最终会有一个巨大的可选方法参数和条件列表,以处理一个本身就是一个雷区的方法。



    缓存不应该是解析器的责任,如果要缓存数据,应该创建一个单独的缓存类来提供该功能。



    在对面。如果你知道你要对很多文件使用解析功能,并且有很大的机会,文件将被访问和以后再解析,解析器执行智能缓存的内部责任是什么它已经看到了。从客户的角度来看,如果这个缓存被执行或者不执行,它是完全没有意义的。他仍然在调用解析,仍然获得一个结果对象。但它得到的答案更快。我认为没有比这更好的示范分离的关注。



    但是,请注意,我并不主张你应该永远不要 >使用构造函数调用来执行解析。我只是声称它是潜在的危险,你失去灵活性。有很多例子,其中构造函数是对象的实际活动的中心,但也有很多相反的例子。示例(虽然有偏见,它来自C风格):在python中,我会考虑非常奇怪的东西像这样

      f = file ()
    f.setReadOnly()
    f.open(filename)

    而不是实际的

      f = file(filename,r)

    但我相信有IO访问库使用第一种方法(第二种是糖语法方法)。



    编辑:最后,请记住,虽然它很容易和兼容,以添加在未来的构造函数快捷方式,如果你发现它是危险的,是不可能删除此功能,有问题。由于显而易见的原因,对接口的添加要比删除更容易。含糖行为必须加权,以支持您为该行为提供的支持。


    How much work is it reasonable for an object constructor to do? Should it simply initialize fields and not actually perform any operations on data, or is it okay to have it perform some analysis?

    Background: I was writing a class which is responsible for parsing an HTML page and returning various information based on the parsed information. The design of the class is such that the class' constructor does the parsing, throwing an exception if an error occurs. Once the instance is initialized, the parsed values are available without further processing via accessors. Something like:

    public class Parser {
    
        public Parser(final String html) throws ParsingException {
            /* Parsing logic that sets private fields */
            /* that throws an error if something is erroneous.*/
        }
    
        public int getNumOfWhatevers() { return private field; }
        public String getOtherValue()  { return other private field; }
    }
    

    After designing the class I started to wonder if this was correct OO practice. Should the parsing code be placed within a void parseHtml() method and the accessors only return valid values once this method is called? I feel as though my implementation is correct, but I can't help but feel that some OO purists might find it incorrect for some reason and that an implementation such as the following would be better:

    public class Parser {
    
        public Parser(final String html) {
            /* Remember html for later parsing. */
        }
    
        public void parseHtml() throws ParsingException { 
            /* Parsing logic that sets private fields */
            /* that throws an error if something is erroneous.*/
        }
    
        public int getNumOfWhatevers() { return private field; }
        public String getOtherValue()  { return other private field; }
    }
    

    Are there instances where initialization code, such as parsing information, should not occur within the constructor, or am I just being silly and second-guessing myself?

    What are the benefits/drawbacks of splitting the parsing from the constructor?

    Thoughts? Insights?

    解决方案

    I normally follow one easy principle:

    Everything that is mandatory for the correct existence and behavior of the class instance should be passed and done into the constructor.

    Every other activity is done by other methods.

    The constructor should never:

    • use other methods of the class with the purpose of using overriding behavior
    • act on its private attributes via methods

    Because I learned the hard way that while you are in the constructor, the object is in a incoherent, intermediate state which is too dangerous to handle. Some of this unexpected behavior could be expected from your code, some could be from the language architecture and compiler decisions. Never guess, stay safe, be minimal.

    In your case, I would use a Parser::parseHtml(file) method. The instantiation of the parser and the parsing are two different operations. When you instance a parser, the constructor puts it in the condition to perform its job (parsing). Then you use its method to perform the parsing. You then have two choices:

    1. Either you allow the parser to contain the results of the parsing, and give the clients an interface to retrieve the parsed information (e.g. Parser::getFooValue()). The methods will return Null if you haven't performed parsing yet, or if the parsing failed.
    2. or your Parser::parseHtml() returns a ParsingResult instance, containing what the Parser found.

    The second strategy grants you better granularity, as the Parser is now stateless, and the client needs to interact with the methods of the ParsingResult interface. The Parser interface remains sleek and simple. The internals of the Parser class will tend to follow the Builder pattern.

    You comment: "I feel as though returning an instance of a parser that hasn't parsed anything (as you suggest), a constructor that's lost its purpose. There's no use in initializing a parser without the intent of actually parsing the information. So if parsing is going to happen for sure, should we parse as early as possible and report and errors early, such as during the construction of the parser? I feel as though initializing a parser with invalid data should result in an error being thrown."

    Not really. If you return an instance of a Parser, of course it's going to parse. In Qt, when you instantiate a button, of course it's going to be shown. However, you have the method QWidget::show() to manually call before something is visible to the user.

    Any object in OOP has two concerns: initialization, and operation (ignore finalization, it's not on discussion right now). If you keep these two operations together, you both risk trouble (having an incomplete object operating) and you lose flexibility. There are plenty of reasons why you would perform intermediate setup of your object before calling parseHtml(). Example: suppose you want to configure your Parser to be strict (so to fail if a given column in a table contains a string instead of an integer) or permissive. Or to register a listener object which is warned every time a new parsing is performed or ended (think GUI progress bar). These are optional information, and if your architecture puts the constructor as the übermethod that does everything, you end up having a huge list of optional method parameters and conditions to handle into a method which is inherently a minefield.

    "Caching should not be the responsibility of a parser. If data is to be cached, a separate cache class should be created to provide that functionality."

    On the opposite. If you know that you are going to use the parsing functionality on a lot of files, and there's a significant chance that the files are going to be accessed and parsed again later on, it is internal responsability of the Parser to perform smart caching of what it already saw. From the client perspective, it is totally oblivious if this caching is performed or not. He is still callling the parsing, and still obtaining a result object. but it is getting the answer much faster. I think there's no better demonstration of separation of concerns than this. You boost performance with absolutely no change in the contract interface or the whole software architecture.

    However, note that I am not advocating that you should never use a constructor call to perform parsing. I am just claiming that it's potentially dangerous and you lose flexibility. There are plenty of examples out there where the constructor is at the center of the actual activity of the object, but there is also plenty of examples of the opposite. Example (although biased, it arises from C style): in python, I would consider very weird something like this

    f = file()
    f.setReadOnly()
    f.open(filename)
    

    instead of the actual

    f = file(filename,"r")
    

    But I am sure there are IO access libraries using the first approach (with the second as a sugar-syntax approach).

    Edit: finally, remember that while it's easy and compatible to add in the future a constructor "shortcut", it is not possible to remove this functionality if you find it dangerous or problematic. Additions to the interface are much easier than removals, for obvious reasons. Sugary behavior must be weighted against future support you have to provide to that behavior.

    这篇关于一个HTML解析类的构造函数应该做多少工作?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

  • 查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆