从HTML文件获取jpg图像 [英] Getting the jpg images from an HTML file
问题描述
我正在尝试使用grep获取HTML文件中jpg图像的完整url地址.一个问题是其中没有很多换行符,因此当我使用grep时,它会获取路径,但还有很多我不感兴趣的其他东西.如何获取jpg图片的网址? /p>
I'm trying to use grep to get the full url addresses of jpg images in an HTML file. One problem is that there aren't many newlines in it, so when I use grep it gets the path, but also a lot of other stuff I'm not interested in. How can I just get the urls for the jpg images?
推荐答案
一个sed
命令
One single sed
command
sed -n '/<img/s/.*src="\([^"]*\)".*/\1/p' yourfile.html
or using ERE (extended regular expressions) to avoid backslashes from above expression:
sed -E -n '/<img/s/.*src="([^"]*)".*/\1/p' yourfile.html
一个基本的grep
命令
One basic grep
command
grep -o '<img[^>]*src="[^"]*"' yourfile.html
两个连续的基本grep
命令
Two successive basic grep
commands
grep -o '<img[^>]*src="[^"]*"' yourfile.html | grep -o '"[^"]*"'
使用Perl Regex的单个grep
命令(PER)
One single grep
commands using Perl Regex (PER)
grep -Po '<img[^>]*src="\K[^"]*(?=")' yourfile.html
将ack
用作类似grep
的替换
Using ack
as a grep
-like replacement
sudo apt install ack
ack -o '<img[^>]*src="\K[^"]*(?=")' yourfile.html
下载网页,这是由 s-hunter
curl -s example.com/a.html | sed -En '/<img/s/.*src="([^"]*)".*/\1/p'
这篇关于从HTML文件获取jpg图像的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!