HXT:在纯代码中读取和写入HTML到字符串时出现令人惊讶的行为 [英] HXT: Surprising behavior when reading and writing HTML to String in pure code

查看:126
本文介绍了HXT:在纯代码中读取和写入HTML到字符串时出现令人惊讶的行为的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想从String中读取HTML,处理它并使用HXT以String形式返回已更改的文档。由于此操作不需要IO,我宁愿使用 runLA 而不是 runX 执行Arrow。



代码如下所示(为简单起见,省略了处理):

$ $ $ $ $ $ $ $ runLA hread>>> writeDocumentToString [withOutputHTML,withIndent yes])html

然而,结果中缺少code> html 标签:

  [\\\
< ; head> \\\
< title> \\ n伪造< /标题> \\\
< / head> \\\
< body> \\\
一些微不足道的文本。\\\
< / body> \\\
,]

当我使用runX代替这个时:

  runX(readString [] html>>> writeDocumentToString [withOutputHTML,withIndent yes])

我得到预期的结果:

  [< html> \\\
< head> \\\
< title>虚假< / title> \\ n< / head> \\\
< body> \\\
一些微不足道的文本。\\\
< / body> \\\
< / html> \ n]

为什么会这样,我该如何解决它?

解决方案

如果您查看两者的 XmlTree s,您会看到 readString 增加了顶级/元素。对于非 IO runLA 版本:

 > putStr。 formatTree显示。 head $ runLA xread html 
--- XTaghtml[]
|
+ --- XText\\\

|
+ --- XTaghead[]
...

并使用 runX

 > putStr。 formatTree显示。 head =<< runX(readString [] html)
--- XTag/[NTree(XAttrtransfer-Status)[NTree(XText200)...
|
+ --- XTaghtml[]
|
+ --- XText\\\

|
+ --- XTaghead[]
...

writeDocumentToString 使用 getChildren 来剥离此根元素。

一个简单的方法是使用类似于 selem 的东西来将 xread 的输出封装到一个类似的根元素中,为了使它看起来像输入的类型 writeDocumentToString 期望:

 > runLA(selem/[xread]>>> writeDocumentToString [withOutputHTML,withIndent yes])html 
[< html> \\\
< head> \\\
< title> Bogus< / title> \\\
< / head> \\\
< body> \\\
一些琐碎的文本。\\\
< / body> \\\
< / html> \\\
]

这会产生所需的输出。


I want to read HTML from a String, process it and return the changed document as a String using HXT. As this operation does not require IO, I would rather execute the Arrow with runLA than with runX.

The code look like this (omitting the processing for simplicity):

runLA (hread >>> writeDocumentToString [withOutputHTML, withIndent yes]) html

However, the surrounding html tag is missing in the result:

["\n  <head>\n    <title>Bogus</title>\n  </head>\n  <body>\n        Some trivial bogus text.\n    </body>\n",""]

When I use runX instead like this:

runX (readString [] html >>> writeDocumentToString [withOutputHTML, withIndent yes])

I get the expected result:

["<html>\n  <head>\n    <title>Bogus</title>\n  </head>\n  <body>\n        Some trivial bogus text.\n    </body>\n</html>\n"]

Why is that, and how can I fix it?

解决方案

If you look at the XmlTrees for both, you'll see that readString adds a top-level "/" element. For the non-IO runLA version:

> putStr . formatTree show . head $ runLA xread html
---XTag "html" []
   |
   +---XText "\n  "
   |
   +---XTag "head" []
   ...

And with runX:

> putStr . formatTree show . head =<< runX (readString [] html)
---XTag "/" [NTree (XAttr "transfer-Status") [NTree (XText "200")...
   |
   +---XTag "html" []
       |
       +---XText "\n  "
       |
       +---XTag "head" []
       ...

writeDocumentToString uses getChildren to strip off this root element.

One easy way around this is to use something like selem to wrap the output of xread in a similar root element, in order to make it look like the kind of input writeDocumentToString expects:

> runLA (selem "/" [xread] >>> writeDocumentToString [withOutputHTML, withIndent yes]) html
["<html>\n  <head>\n    <title>Bogus</title>\n  </head>\n  <body>\n        Some trivial bogus text.\n    </body>\n</html>\n"]

This produces the desired output.

这篇关于HXT:在纯代码中读取和写入HTML到字符串时出现令人惊讶的行为的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆