Scala HTML解析器对象的使用情况 [英] Scala HTML parser object usage
问题描述
我使用HTML解析器解析HTML字符串:
import nu.validator.htmlparser。{sax,common }
import sax.HtmlParser
import common.XmlViolationPolicy
$ b $ val source = Source.fromString(response)
val html = new models.HTML5Parser
val htmlObject = html.loadXML(source)
如何为对象中的特定元素提取值?我可以通过以下方式获取孩子和标签:
val child = htmlObject.child(1).label
但我不知道如何获取孩子的内容。此外,我不知道如何迭代子对象。
目前还不清楚 HTML5Parser
类来自于,但我会假设它是这个例子(或类似的东西)。在这种情况下,您的 htmlObject
只是一个 scala.xml.Node
。首先进行一些设置:
val source = Source.fromString(
< html>< head /> ;< body>< div class ='main'>< span>测试< / span>< / div>< / body>< / html>
)
val htmlObject = html.loadXML(source)
现在您可以执行以下操作:
scala> htmlObject.child(1).label
pre>
res0:String = body
scala> htmlObject.child(1).child(0).child(0).text
res1:String = test
scala> (htmlObject \\span)。text
res2:String = test
scala> (htmlObject \body\div\span)。text
res3:String = test
scala> (htmlObject \\\div)。head.attributes.asAttrMap
res4:Map [String,String] = Map(class - > main)
等等。
I am using the HTML parser to parse an HTML string:
import nu.validator.htmlparser.{sax,common} import sax.HtmlParser import common.XmlViolationPolicy val source = Source.fromString(response) val html = new models.HTML5Parser val htmlObject = html.loadXML(source)
How do I pull values for specific elements in the object? I can get the child and the label using this:
val child = htmlObject.child(1).label
But I don't know how to get the content of the child. Also, I don't know how to iterate through the child objects.
解决方案It's unclear where your
HTML5Parser
class comes from, but I'm going to assume it's the one in this example (or something similar). In that case yourhtmlObject
is just ascala.xml.Node
. First for some setup:val source = Source.fromString( "<html><head/><body><div class='main'><span>test</span></div></body></html>" ) val htmlObject = html.loadXML(source)
Now you can do the following, for example:
scala> htmlObject.child(1).label res0: String = body scala> htmlObject.child(1).child(0).child(0).text res1: String = test scala> (htmlObject \\ "span").text res2: String = test scala> (htmlObject \ "body" \ "div" \ "span").text res3: String = test scala> (htmlObject \\ "div").head.attributes.asAttrMap res4: Map[String,String] = Map(class -> main)
Etcetera.
这篇关于Scala HTML解析器对象的使用情况的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!