如何使用C ++/Qt解析HTML? [英] How to parse HTML with C++/Qt?

查看:570
本文介绍了如何使用C ++/Qt解析HTML?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我如何解析以下HTML

How can i parse the following HTML

<body>
<span style="font-size:11px">12345</span>
<a>Hello<a>
</body>

我想从www.testtest.com的style为"font-size:11px"的"span"中检索数据"12345",但是我只想要这些数据,而没有别的.

I would like to retrive the data "12345" from a "span" with style="font-size:11px" from www.testtest.com, but I only want the that very data, and nothing else.

我该怎么做?

推荐答案

编辑:来自

对于5.6,将不再支持Qt WebKit和Qt Quick 1,并将其从发行版中删除.这些模块的源代码仍然可用.

With 5.6, Qt WebKit and Qt Quick 1 will no longer be supported and are dropped from the release. The source code for these modules will still be available.

因此,从Qt 5.6开始除非您愿意编译源代码,否则QtWebKit将不再可用.如果您使用的Qt发行版早于5.6 ot,并且愿意编译QtWebKit,这可能会有所帮助;否则,可能会有所帮助.否则此答案不再有效.

So, as of Qt 5.6 – unless you're willing to compile the sources –, QtWebKit is no longer available. If you're using a Qt release older than 5.6 ot are willing to compile QtWebKit, this might be helpful; otherwise this answer is no longer valid.

由于您对用例的解释不完整,因此很难确切地告诉您需要做什么.但是,有两种处理方法.

It is hard to tell you exactly what needs to be done as your explanation is incomplete about the use case. However, there are two ways of proceeding.

如果您已经需要该模块的任何其他功能,则不会引入任何其他依赖关系,这对您来说将是最方便的使用.

If you already need any other functionality from that module, this is not going to introduce any further dependencies, and it will be the most convenient for you to use.

您需要获取 https://doc.qt.io/archives/qt-5.5/qwebelement.html

一旦您在html中找到第一个"span"元素,它就会出现:

That will come once you find the first "span" element in your html:

https://doc.qt.io/archives/qt- 5.5/qwebframe.html#findFirstElement

然后,您可以使用相应的QWebElement方法简单地获取该元素的文本.例如,您可以使用此方法获取属性值:

Then, you can simply get the text for that element with the corresponding QWebElement method(s). For instances, you can use this one for getting an attribute value:

https://doc.qt.io/archives/qt- 5.5/qwebelement.html#attribute

...但是您也可以请求属性名称,如文档等所示.

... but you can also request the attribute names as you can see in the documentation, etc.

这是您获取12345值的方式:

https://doc.qt.io/archives/qt- 5.5/qwebelement.html#toPlainText

如果您的软件不需要Webkit,并且html数据以不同的方式而不是直接来自需要使用QWebKit的Web方式,那么最好使用QtCore中提供的xml解析器.即使您没有来自QtWebKit的任何其他依赖关系,这种附加依赖关系也不会在您的用例中引起任何问题,即使情况仍然如此.根据您的描述很难分辨.可以肯定的是,与针对html设计的基于Webkit的解决方案相比,这样做虽然方便程度不高,但数量不多.

If you do not need webkit for your sotware, and the html data comes in a different way rather than directly from the web for which you would need to use QWebKit, then you are better off using the xml parser available in QtCore. It still might be the case even if you do not have any other dependency from QtWebKit that this additional dependency will not cause any issues in your use case. It is hard to tell based upon your description. For sure, this would be less convenient, albeit not that much, compared to the webkit based solution as that is designed for html.

您需要避免的是QtXmlPatterns.到目前为止,它是一种无需维护的软件,并且会以任何一种方式为您的代码引入额外的依赖关系.

What you need to avoid is QtXmlPatterns. It is an unmaintained software as of now, and that would introduce an additional dependency for your code either way.

这篇关于如何使用C ++/Qt解析HTML?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆