是否有JavaScript或Ruby版本的"HTML tidy"? [英] Are there JavaScript or Ruby versions of "HTML tidy"?

查看:103
本文介绍了是否有JavaScript或Ruby版本的"HTML tidy"?的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

是否存在类似于HTML tidy(http://tidy.sourceforge.net/)的库,该库不是特定于操作系统的(需要在每个主机上进行编译).基本上,我只想验证/清除用户发送给我的HTML.

Does there exist a library similar to HTML tidy (http://tidy.sourceforge.net/) that is not OS specific (needs to be compiled on each host). Basically i just want to validate/clean the HTML sent to me by the user.

<p>hello</p></p><br>

应该成为

<p>hello</p>
<br/>

使用javascript或ruby的东西对我来说很有效. 谢谢!

Something in javascript or ruby would work for me. Thanks!

推荐答案

在Ruby中,您可以在Nokogiri中解析HTML,这将使您检查错误,然后输出HTML,这将清除丢失的结束标记,并这样的.请注意,在以下HTML中,title和p标签未正确关闭,但Nokogiri添加了结尾标签.

In Ruby you can parse the HTML in Nokogiri, which will let you check for errors, then have it output the HTML, which will clean up missing closing tags and such. Notice in the following HTML that the title and p tags are not closed correctly, but Nokogiri adds the ending tags.

require 'nokogiri'

html = '<html><head><title>the title</head><body><p>a paragraph</body></html>'
doc = Nokogiri::HTML(html)
puts "Errors found" if (doc.errors.any?)
puts doc.to_html
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN" "http://www.w3.org/TR/REC-html40/loose.dtd">
# >> <html>
# >> <head>
# >> <meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
# >> <title>the title</title>
# >> </head>
# >> <body><p>a paragraph</p></body>
# >> </html>

或者,您也可以打开与/usr/bin/tidy的连接,并告诉它执行肮脏的工作:

Alternately you can open a connection to /usr/bin/tidy and tell it to do the dirty work:

require 'open3'

html = '<html><head><title>the title</head><body><p>a paragraph</body></html>'

stdin, stdout, stderr = Open3.popen3('/usr/bin/tidy -qi')
stdin.puts html
stdin.close
puts stdout.read
# >> <!DOCTYPE html PUBLIC "-//W3C//DTD HTML 3.2//EN">
# >> 
# >> <html>
# >> <head>
# >>   <meta name="generator" content=
# >>   "HTML Tidy for Mac OS X (vers 31 October 2006 - Apple Inc. build 15.3.6), see www.w3.org">
# >> 
# >>   <title>the title</title>
# >> </head>
# >> 
# >> <body>
# >>   <p>a paragraph</p>
# >> </body>
# >> </html>

这篇关于是否有JavaScript或Ruby版本的"HTML tidy"?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆