使用SED提取具有特定名称的所有输入元素的值 [英] Use SED to extract value of all input elements with a certain name
问题描述
如何根据搜索其他属性获取值属性?
例如:
< body>
< input name =dummyvalue =foo>
< input name =alphavalue =bar>
< / body>
如何获得输入元素名称为dummy的值?
既然你正在寻找一个使用bash和sed的解决方案,我假设你正在寻找一个Linux命令行选项。
使用 hxselect
html解析工具提取元素;使用 sed
从元素提取值
我做了一个Google搜索linux bash parse html tool跨越此: https://unix.stackexchange .com / questions / 6389 / how-to-parse-hundred-html-source-code-files-in-shell
接受的答案建议使用 html中的 hxselect
工具-xml-utils软件包,它基于css选择器提取元素。
因此,在安装(下载,解压缩, ./configure
, make
, make install
),你可以使用给定的css选择器运行这个命令
hxselectinput [name ='dummy']< example.html
(假设example.html包含问题的示例html)。 :
< input name =dummyvalue =foo/>
几乎就在那里。我们需要从该行提取值:
hxselectinput [name ='dummy']< example.html | sed -n -es /^.* value = ['\] \(。* \)['\]。* / \ 1 / p
返回foo。
为什么你会/不想使用这种方法
- 使用正则表达式来解析出属性是复杂的,而且往往是错误的方式去
- hxselect工具(在我的其他答案中)是一个痛苦的安装
- 但这种方法接受格式错误的html ,这是在对上面链接的问题的这个答案中争论的。顺便说一下,这个问题已经深入讨论了正则表达式+ html的辩论。
How do I get the value attribute based on a search of some other attribute?
For example:
<body>
<input name="dummy" value="foo">
<input name="alpha" value="bar">
</body>
How do I get the value of the input element with the name "dummy"?
Since you're looking for a solution using bash and sed, I'm assuming you're looking for a Linux command line option.
Use hxselect
html parsing tool to extract element; use sed
to extract value from element
I did a Google search for "linux bash parse html tool" and came across this: https://unix.stackexchange.com/questions/6389/how-to-parse-hundred-html-source-code-files-in-shell
The accepted answer suggests using the hxselect
tool from the html-xml-utils package which extracts elements based on a css selector.
So after installing (downoad, unzip, ./configure
, make
, make install
), you can run this command using the given css selector
hxselect "input[name='dummy']" < example.html
(Given that example.html contains your example html from the question.) This will return:
<input name="dummy" value="foo"/>
Almost there. We need to extract the value from that line:
hxselect "input[name='dummy']" < example.html | sed -n -e "s/^.*value=['\"]\(.*\)['\"].*/\1/p"
Which returns "foo".
why you would / would not want to use this approach
- using regex to parse out the attributes is complicated, and often the wrong way to go
- the hxselect tool (in my other answer) is a pain to install
- BUT, this approach accepts malformed html, which is what is argued for in this answer to the question linked above. By the way, that question has very thorough discussion on the regex+html debate.
这篇关于使用SED提取具有特定名称的所有输入元素的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!