使用Shell脚本从html页面上的javascript中获取JSON字符串 [英] Get JSON string from within javascript on a html page using shell script
问题描述
我想用shell脚本解析的html页面上的javascript中有有效的json.
首先,我想将整个json字符串从{
转换为}
,然后例如可以使用jq
进行解析.
There's valid json in a javascript on a html page that I want to parse with a shell script.
First of all I would like to get the entire json string from {
to }
and then I can parse it with jq
for example.
这基本上是我的html的外观:
This is basically how my html looks:
<!DOCTYPE html>
<html>
<head>
<title>foobar</title>
</head>
<body>
<script type="text/javascript" src="resources/script.js" charset="UTF-8"></script>
<script type="text/javascript" src="resources/resources.js" charset="UTF-8"></script>
<script type="text/javascript">
if( foo.foobar.getInstance().isbar() )
{
foo.bar.Processor.message( {"head":{"url":"anotherfoo;barid=347EDAFA2B136D7825745B0A490DE32"},...});
}
else
{....}
</script>
</body>
</html>
最后,我想获取位于"barid = ..."的ID.
我一直在尝试使用grep foo.bar.Processor.message
然后使用sed
或cut
,但是我认为有更好的方法来使用它.
如果您能指出正确的方向,那就太好了!
谢谢!
In the end I want to get the ID that's at "barid=...".
I was playing around trying to use grep foo.bar.Processor.message
and then sed
or cut
but I think there's better ways to do it.
If you could point me in the right direction that'd be great!
Thank you!
推荐答案
通常,建议不要使用unix命令行工具来解析HTML.但是,如果您知道标记字符串foo.bar.Processor.message
,则可以使用以下sed + jq
解决方案:
Usually it is not recommended to use unix command line tools for parsing HTML. But if you know your marker string foo.bar.Processor.message
, then you may use this sed + jq
solution:
sed -n 's/foo\.bar\.Processor\.message(\([^)]*\).*/\1/p' file.html |
jq -r '.head.url | split(";")[1] | split("=")[1]'
347EDAFA2B136D7825745B0A490DE32
在没有jq
的情况下,您可以使用以下sed + gnu grep
解决方案:
In the absence of jq
, you may use this sed + gnu grep
solution:
sed -n 's/foo\.bar\.Processor\.message(\([^)]*\).*/\1/p' file.html |
grep -oP ';barid=\K\w+'
这篇关于使用Shell脚本从html页面上的javascript中获取JSON字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!