从文本文件中提取所有以http或https开头并以html结尾的URL [英] Extract all URLs that start with http or https and end with html from text file

查看:561
本文介绍了从文本文件中提取所有以http或https开头并以html结尾的URL的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想使用grep命令从文本文件中提取所有以http://开头(不确定我是否有https://)并以.html结尾的链接.

I would like to extract every link that starts with http:// (not sure if I have https:// inside) and ends with .html from a text file using grep command.

我遇到的问题是文件太大,并且有很多链接...

Problem that I have is that file is too big and there are a lot of links...

我尝试过:

grep "/http:\/\/.*?\.html/"  filename.txt > newFile.txt

但是我得到一个空文件,就像这样:

but I get an empty file, just like with this:

grep -Eo "(http|https)://[a-zA-Z0-9]./(html)" filename.txt > newFile.txt

有人可以帮助我吗?

为了确保我们走在同一个轨道上,我想提取所有指向新文件的链接,并让它们每行1个.

Just to be sure that we are on the same track, I want to extract all links to new file and have them 1 per line.

谢谢.

最诚挚的问候

推荐答案

您可以使用:

grep -Eo "https?://\S+?\.html" filename.txt > newFile.txt

这将匹配https://之后和.html

这篇关于从文本文件中提取所有以http或https开头并以html结尾的URL的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆