从使用脚本网站提取电子邮件地址 [英] Extract email addresses from a website using scripts

查看：130 发布时间：2016/8/3 10:43:49 bash email website

本文介绍了从使用脚本网站提取电子邮件地址的处理方法，对大家解决问题具有一定的参考价值，需要的朋友们下面随着小编来一起学习吧！

问题描述

，我不知道什么是最好的方法，编程和/或使用脚本，提取了在从该链接的格式XXXX@YYYYY.ZZZZ纯文本每页上present所有电子邮件地址和所有的网站下，递归或者一些固定的深度。

Given a website, I wonder what is the best procedure, programmatically and/or using scripts, to extract all email addresses that are present on each page in plain text in the form XXXX@YYYYY.ZZZZ from that link and all sites underneath, recursively or until some fixed depth.

推荐答案

使用shell编程可以实现使用2个节目管道连接到一起你的目标：

Using shell programming you can achieve your goal using 2 programs piped together:

wget的：将得到所有的页面

的grep ：将过滤，并给您只有电子邮件

wget: will get all pages
grep: will filter and give you only the emails

一个例子：

wget -q -r -l 5 -O - http://somesite.com/ | grep -E -o "\b[a-zA-Z0-9.-]+@[a-zA-Z0-9.-]+\.[a-zA-Z0-9.-]+\b"

的wget ，在安静模式下（ -q ），递归获得所有页面（ -r ）用5最大深度级别（ -l 5 ）从somesite.com.br并打印所有到标准输出（ -O - ）

wget, in quiet mode (-q), is getting all pages recursively (-r) with maximum depth level of 5 (-l 5) from somesite.com.br and printing everything to stdout (-O -).

的grep 是使用扩展的正前pression（ -E ），并只显示（ -o ）的电子邮件地址。

grep is using an extended regular expression (-E) and showing only (-o) email address.

所有电子邮件都将被打印到标准输出，并可以通过附加＆GT它们写入文件; somefile.txt 来的命令。

All emails are going to be printed to standard output and you can write them to a file by appending > somefile.txt to the command.

阅读人有关更多的文档页面wget的和 grep的。

这个例子与GNU 庆典版本4.2.37（1） - 释放测试，GNU的grep 2.12和GNU Wget的1.13.4。

This example was tested with GNU bash version 4.2.37(1)-release, GNU grep 2.12 and GNU Wget 1.13.4.

这篇关于从使用脚本网站提取电子邮件地址的文章就介绍到这了，希望我们推荐的答案对大家有所帮助，也希望大家多多支持IT屋！

查看全文

从使用脚本网站提取电子邮件地址 [英] Extract email addresses from a website using scripts

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录关闭

从使用脚本网站提取电子邮件地址 [英] Extract email addresses from a website using scripts

问题描述

推荐答案

相关文章

Linux/Unix最新文章

热门教程

热门工具

登录 关闭

登录关闭