是否有类似“CSS 选择器"之类的东西?或XPath grep? [英] Is there something like a "CSS selector" or XPath grep?
问题描述
我需要找到一堆 HTML 文件中的所有位置,这些文件位于以下结构 (CSS) 中:
I need to find all places in a bunch of HTML files, that lie in following structure (CSS):
div.a ul.b
或 XPath:
//div[@class="a"]//div[@class="b"]
grep
在这里对我没有帮助.是否有一个命令行工具可以返回与此标准匹配的所有文件(以及其中的所有位置)?即,如果文件匹配某个 HTML 或 XML 结构,则返回文件名.
grep
doesn't help me here. Is there a command-line tool that returns all files (and optionally all places therein), that match this criterium? I.e., that returns file names, if the file matches a certain HTML or XML structure.
推荐答案
试试这个:
- 安装 http://www.w3.org/Tools/HTML-XML-实用程序/.
- Ubuntu:
aptitude install html-xml-utils
- MacOS:
brew install html-xml-utils
- Ubuntu:
其中 "label.black"
是唯一标识 HTML 元素名称的 CSS 选择器.编写一个名为 cssgrep
的辅助脚本:
Where "label.black"
is the CSS selector that uniquely identifies the name of the HTML element. Write a helper script named cssgrep
:
#!/bin/bash
# Ignore errors, write the results to standard output.
hxnormalize -l 240 -x $1 2>/dev/null | hxselect -s '
' -c "$2"
然后您可以运行:
cssgrep filename.html "label.black"
这将为 black
类的所有 HTML label
元素生成内容.
This will generate the content for all HTML label
elements of the class black
.
-l 240
参数对于避免解析输出中的换行符很重要.例如,如果 <label class="black">Text to
extract</label>
是输入,则 -l 240
会将 HTML 重新格式化为 <label class="black">要提取的文本</label>
,在第 240 列插入换行符,从而简化了解析.也可以扩展到 1024 或更多.
The -l 240
argument is important to avoid parsing line-breaks in the output. For example if <label class="black">Text to
extract</label>
is the input, then -l 240
will reformat the HTML to <label class="black">Text to extract</label>
, inserting newlines at column 240, which simplifies parsing. Extending out to 1024 or beyond is also possible.
另见:
- https://superuser.com/a/529024/9067 - similar question
- https://gist.github.com/Boldewyn/4473790 - wrapper script
这篇关于是否有类似“CSS 选择器"之类的东西?或XPath grep?的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!