从文本文件中提取所有以http或https开头并以html结尾的URL [英] Extract all URLs that start with http or https and end with html from text file
本文介绍了从文本文件中提取所有以http或https开头并以html结尾的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!
问题描述
我想使用grep命令从文本文件中提取所有以http://开头(不确定我是否有https://)并以.html结尾的链接.
I would like to extract every link that starts with http:// (not sure if I have https:// inside) and ends with .html from a text file using grep command.
我遇到的问题是文件太大,并且有很多链接...
Problem that I have is that file is too big and there are a lot of links...
我尝试过:
grep "/http:\/\/.*?\.html/" filename.txt > newFile.txt
但是我得到一个空文件,就像这样:
but I get an empty file, just like with this:
grep -Eo "(http|https)://[a-zA-Z0-9]./(html)" filename.txt > newFile.txt
有人可以帮助我吗?
为了确保我们走在同一个轨道上,我想提取所有指向新文件的链接,并让它们每行1个.
Just to be sure that we are on the same track, I want to extract all links to new file and have them 1 per line.
谢谢.
最诚挚的问候
推荐答案
您可以使用:
grep -Eo "https?://\S+?\.html" filename.txt > newFile.txt
这将匹配https://
之后和.html
这篇关于从文本文件中提取所有以http或https开头并以html结尾的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!
查看全文