查找和后缀的文件替换网址 - 的Linux / Ubuntu的 [英] Find and replace URLs in postfix files - Linux/Ubuntu

查看:531
本文介绍了查找和后缀的文件替换网址 - 的Linux / Ubuntu的的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我想监视特定的文件夹。
此文件夹中的每个新文件应该被扫描的URL。
这些URL应该进行编辑,如果域不在白名单中定义

例如:

 布拉布拉http://www.black.com/green/yellow.html布拉布拉
sdfsdfsdfsdf http://www.white.com/red.html

白名单:

  http://www.white.com

结果:

 布拉布拉httx://www.black.com/green/yellow.html布拉布拉
sdfsdfsdfsdf http://www.white.com/red.html

我曾尝试迄今IWATCH这个XML:

 <?XML版本=1.0&GT?;
<!DOCTYPE配置SYSTEM/etc/iwatch/iwatch.dtd>
<配置>
  <后卫电子邮件=根@本地NAME =IWATCH/>
  <&监视列表GT;
    <标题>&URL_FILTER LT; /标题>
    < contactpoint电子邮件=admin@test.comNAME =管理员/>
    <路径类型=单系统日志=关于警惕=关闭事件=创造的exec =SED -i'S / HTTP / httx'%F>的/ var /测试< /路径>
  < /监视列表>
< /配置>

因此​​,与IWATCH我可以观察到新的文件的文件夹的/ var /测试。
随着sed命令,我可以代替所有的HTTP和httx。
但我不知道我怎么可以把一个白名单,使某些URL不会被替换...

---编辑---
附加信息:
我想编辑所有传入后缀的邮件,以便有中没有任何可点击的链接,除了一些领域,这是在白名单。这其中的原因是为了防止网络钓鱼邮件。

 返回路径:其中,example@gmail.com>
的X原-到:example@test.de
投递至:example@test.de
收稿日期:从mail-lf0-x236.google.com(mail-lf0-x236.google.com [IPv6的:2A00:1450:4010:C07 :: 236])
        由xxxxxxx.hosteurope.de(后缀)与ESMTPS ID D255223CB59
        为< example@test.de> ;;星期一,2016年4月11日14时44分10秒+0200(CEST)
收稿日期:由mail-lf0-x236.google.com,SMTP id为c126so154788483lfb.2
        为< example@test.de> ;;星期一,2016年4月11日五时39分二十秒-0700(PDT)
DKIM签名:V = 1; A = RSA-SHA256; C =轻松/轻松;
        D = gmail.com; S = 20120113;
        H = MIME版本:日期:邮件ID:主题:从:到;
        BH = WWH + NIkCWDEoIkwbeCI4pf0jP0ya / ctbQ81pUsA4G7s =;
        B = ZS3Uo / cpVGNw​​3k38Js2 + / DxVda0y2136oy4D4hsR0G25x2UjhyVU / yUcPl6qEdxt8i
         CQXZHQbaf8pzCdDaSq4VL9RC / sIgZy3PQzj6Cyrp3WTi6SMmQ65NwNBWLVGnpPcuzNW1
         IGC5N3rjj96ndYUAxia / tTcBX7ajS3Tw9Mc8yIaO13hSXMUCrTDIFZNzHR1ib7tLDpmX
         6EVyFhquhIfJVOhcuPgWUUxHly / FMZ ++ ucoHR0Yozj + dc1GJ6 / ZYzUAPdGICelDY7ieG
         nvA7KH6 + V6 / zoWlbfkO9BmGzAPs6M4LGHilOjpMf / 09Z2oMiV / WRDxe0WrCebQptpm2c
         xHPg ==
的X谷歌-DKIM签名:V = 1; A = RSA-SHA256; C =轻松/轻松;
        D = 1e100.net; S = 20130820;
        H =的X GM-信息状态:MIME版本:日期:邮件ID:主题:从:到;
        BH = WWH + NIkCWDEoIkwbeCI4pf0jP0ya / ctbQ81pUsA4G7s =;
        B = hAOSzKjertcsQIT / PHoZKsiKxLba8gaKOCmyNg7nmiPJjCWqobNvM5nf3sZP1Xhysi
         gGdvk9mmMugII8dsjc7mRhDkbCT1QKVz / 0UBQ + CaP6sK7kGdWfdarphGgzUGA6Il5JZi
         lP4DpEQHUpG1wJ1r + dN2f + UT8tyfIwapXwo3g7FnkPLxmCq9CeqJeRlagL6vAacon8z7
         CjdTHB7fzEtYToSp + cDi3 + yK4zS9p4rwF4H4Ds3bJqwM / PrcFJW0YYncDHdra5TwYf6U
         K6VRX19iUhQT4kTVFCtoNW9SU8Ri + Rc5VfvVTKRh4KwZ2uW5x8y07ucB0vZcAQdEnms4
         AWnQ ==
添加X-Gm-消息状态:AD7BkJJEDmk9P + Kzcn1MT4lQxpU1aYU6x8uABSpohCbT7EeOFAXjT1y6n3sFcRj7tcfWc6eBAOL6bJ78jvVOlQ ==
MIME-版本:1.0
的X收稿日期:由10.112.63.196,SMTP id为i4mr8426739lbs.93.1460378359811;
 星期一,2016年4月11日5时39分19秒-0700(PDT)
收稿日期:由10.114.66.51与HTTP;星期一,2016年4月11日5时39分19秒-0700(PDT)
日期:星期一,2016年4月11日14点39分十九秒+0200
消息编号:LT; CADF5gVU+C4BZCSFSiWeiBipBnDu5jTU+FVmLJbSQSbtMM9JZcQ@mail.gmail.com>
主题:测试
来源:实例< example@gmail.com>
要:example@test.de
内容类型:多重/替代;边界= 001a1133d4405fd878053034d55a
的X扫描-者:MIMEDefang 2.71上5.38.258.144--001a1133d4405fd878053034d55a
内容类型:text / plain的;字符集= UTF-8http://www.example.com
http://www.white.com--001a1133d4405fd878053034d55a
内容类型:text / html的;字符集= UTF-8&LT; D​​IV DIR =升&GT;&LT; D​​IV&GT;&LT; A HREF =htt​​p://www.example.com&GT; HTTP://www.example.com< / A&GT;&LT; BR&GT;&LT ; / DIV&GT;&LT; A HREF =htt​​p://www.white.com&GT; HTTP://www.white.com< / A&GT;&LT; BR&GT;&LT; / DIV&GT;--001a1133d4405fd878053034d55a--


解决方案

就实现了庆典脚本是联合国有必要,我们可以把它用做以下的单行但它确实神秘阅读

输入数据:

  $猫数据
sdfsdfsdfsdf http://www.whitedomain.com/red.html
喇嘛http://www.black.com/green/yellow.html布拉布拉
sdfsdfsdfsdf http://www.white.com/red.html
$猫白名单
http://www.white.com
http://www.whitedomain.com
$

最终输出:

  $ SED -r'/'$(SED -r 's/\\\\/\\\\\\\\/g;s/\\//\\\\\\//g;s/\\^/\\\\^/g;s/\\[/\\\\[/g;s/'\\''/'\\'\"\\\\\\\\\"\\'\\''/g;s/\\]/\\\\]/g;s/\\*/\\\\*/g;s/\\$/\\\\$/g;s/\\./\\\\./g'白名单|贴-s -d'|')'/! S / HTTP / httx / G'数据
sdfsdfsdfsdf http://www.whitedomain.com/red.html
BLA httx://www.black.com/green/yellow.html布拉布拉
sdfsdfsdfsdf http://www.white.com/red.html
$

说明:

内子shell命令的输出是一个正则表达式(筛选出在 SED 线替换命令)

  $ SED -r 's/\\\\/\\\\\\\\/g;s/\\//\\\\\\//g;s/\\^/\\\\^/g;s/\\[/\\\\[/g;s/'\\''/'\\'\"\\\\\\\\\"\\'\\''/g;s/\\]/\\\\]/g;s/\\*/\\\\*/g;s/\\$/\\\\$/g;s/\\./\\\\./g'白名单|粘贴-s -d|
HTTP:\\ / \\ / WWW \\。白\\ .COM | HTTP:\\ / \\ / WWW \\ .whitedomain \\ .COM

流量:


  1. 动态形成的正则表达式中的 SED 使用内子shell命令转义所有的元字符,然后输送至粘贴来添加交替

  2. 使用在 SED 命令上面的输出中过滤掉没有任何白名单域名,并利用这些线的代换系 HTTP httx

EDIT1:由于 SED 是面向行的,你必须将数据转换成这样的文本行:

  $猫DATA1
&所述;股利DIR =LTR&GT;&下; DIV&GT;&下; A HREF =htt​​p://www.white.com&GT; HTTP://www.white.com< / A&GT;&所述峰; br&GT;&下; / DIV&GT;&LT; A HREF =htt​​p://www.example.com&GT; HTTP://www.example.com< / A&GT;&LT; BR&GT;&LT; / DIV&GT;
$猫白名单
http://www.white.com
http://www.whitedomain.com
$ SED的/&LT; / \\ n&LT; / G'DATA1 | SED -r'/'$(SED -r 's/\\\\/\\\\\\\\/g;s/\\//\\\\\\//g;s/\\^/\\\\^/g;s/\\[/\\\\[/g;s/'\\''/'\\'\"\\\\\\\\\"\\'\\''/g;s/\\]/\\\\]/g;s/\\*/\\\\*/g;s/\\$/\\\\$/g;s/\\./\\\\./g'白名单|贴-s -d'|')'/! S / HTTP / httx / G'&LT; D​​IV DIR =升&GT;
&LT; D​​IV&GT;
&所述; A HREF =htt​​p://www.white.com&GT; HTTP://www.white.com
&所述; / A&GT;
&LT; BR&GT;
&LT; / DIV&GT;
&所述; A HREF =htt​​x://www.example.com中&GT; httx://www.example.com中
&所述; / A&GT;
&LT; BR&GT;
&LT; / DIV&GT;
$

I want to monitor a specific folder. Every new file in this folder should be scanned for URLs. These URLs should be edited, if the domain is not in a defined whitelist.

Example:

blabla http://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html

Whitelist:

http://www.white.com

Result:

blabla httx://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html

What i have tried so far is iwatch with this xml:

<?xml version="1.0" ?>
<!DOCTYPE config SYSTEM "/etc/iwatch/iwatch.dtd" >
<config>
  <guard email="root@localhost" name="IWatch"/>
  <watchlist>
    <title>URL_Filter</title>
    <contactpoint email="admin@test.com" name="Administrator"/>
    <path type="single" syslog="on" alert="off" events="create" exec="sed -i 's/http/httx' %f">/var/test</path>
  </watchlist>
</config>

So with iwatch i can observe the folder "/var/test" for new files. With the sed command i can replace every "http" with "httx". But i have no idea how i could put in a whitelist so that some URLs are not replaced...

--- edit --- Additional information: I want to edit all incoming postfix mails, so that there are no clickable links in it, except some domains, which are on the whitelist. The reason for that is to protect against phishing mails.

Return-Path: <example@gmail.com>
X-Original-To: example@test.de
Delivered-To: example@test.de
Received: from mail-lf0-x236.google.com (mail-lf0-x236.google.com [IPv6:2a00:1450:4010:c07::236])
        by xxxxxxx.hosteurope.de (Postfix) with ESMTPS id D255223CB59
        for <example@test.de>; Mon, 11 Apr 2016 14:44:10 +0200 (CEST)
Received: by mail-lf0-x236.google.com with SMTP id c126so154788483lfb.2
        for <example@test.de>; Mon, 11 Apr 2016 05:39:20 -0700 (PDT)
DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=gmail.com; s=20120113;
        h=mime-version:date:message-id:subject:from:to;
        bh=WwH+NIkCWDEoIkwbeCI4pf0jP0ya/ctbQ81pUsA4G7s=;
        b=ZS3Uo/cpVGNw3k38Js2+/DxVda0y2136oy4D4hsR0G25x2UjhyVU/yUcPl6qEdxt8i
         CQXZHQbaf8pzCdDaSq4VL9RC/sIgZy3PQzj6Cyrp3WTi6SMmQ65NwNBWLVGnpPcuzNW1
         IGC5N3rjj96ndYUAxia/tTcBX7ajS3Tw9Mc8yIaO13hSXMUCrTDIFZNzHR1ib7tLDpmX
         6EVyFhquhIfJVOhcuPgWUUxHly/FmZ++ucoHR0Yozj+dc1GJ6/ZYzUAPdGICelDY7ieG
         nvA7KH6+v6/zoWlbfkO9BmGzAPs6M4LGHilOjpMf/09Z2oMiV/WRDxe0WrCebQptpm2c
         xHPg==
X-Google-DKIM-Signature: v=1; a=rsa-sha256; c=relaxed/relaxed;
        d=1e100.net; s=20130820;
        h=x-gm-message-state:mime-version:date:message-id:subject:from:to;
        bh=WwH+NIkCWDEoIkwbeCI4pf0jP0ya/ctbQ81pUsA4G7s=;
        b=hAOSzKjertcsQIT/PHoZKsiKxLba8gaKOCmyNg7nmiPJjCWqobNvM5nf3sZP1Xhysi
         gGdvk9mmMugII8dsjc7mRhDkbCT1QKVz/0UBQ+CaP6sK7kGdWfdarphGgzUGA6Il5JZi
         lP4DpEQHUpG1wJ1r+dN2f+UT8tyfIwapXwo3g7FnkPLxmCq9CeqJeRlagL6vAacon8z7
         CjdTHB7fzEtYToSp+cDi3+yK4zS9p4rwF4H4Ds3bJqwM/PrcFJW0YYncDHdra5TwYf6U
         K6VRX19iUhQT4kTVFCtoNW9SU8Ri+Rc5VfvVTKRh4KwZ2uW5x8y07ucB0vZcAQdEnms4
         AWnQ==
X-Gm-Message-State: AD7BkJJEDmk9P+Kzcn1MT4lQxpU1aYU6x8uABSpohCbT7EeOFAXjT1y6n3sFcRj7tcfWc6eBAOL6bJ78jvVOlQ==
MIME-Version: 1.0
X-Received: by 10.112.63.196 with SMTP id i4mr8426739lbs.93.1460378359811;
 Mon, 11 Apr 2016 05:39:19 -0700 (PDT)
Received: by 10.114.66.51 with HTTP; Mon, 11 Apr 2016 05:39:19 -0700 (PDT)
Date: Mon, 11 Apr 2016 14:39:19 +0200
Message-ID: <CADF5gVU+C4BZCSFSiWeiBipBnDu5jTU+FVmLJbSQSbtMM9JZcQ@mail.gmail.com>
Subject: test
From: Example <example@gmail.com>
To: example@test.de
Content-Type: multipart/alternative; boundary=001a1133d4405fd878053034d55a
X-Scanned-By: MIMEDefang 2.71 on 5.38.258.144

--001a1133d4405fd878053034d55a
Content-Type: text/plain; charset=UTF-8

http://www.example.com
http://www.white.com

--001a1133d4405fd878053034d55a
Content-Type: text/html; charset=UTF-8

<div dir="ltr"><div><a href="http://www.example.com">http://www.example.com</a><br></div><a href="http://www.white.com">http://www.white.com</a><br></div>

--001a1133d4405fd878053034d55a--

解决方案

Just realized the bash script is un-necessary, we can do it using the following one-liner but it's really cryptic to read:

Input data:

$ cat data
sdfsdfsdfsdf http://www.whitedomain.com/red.html
bla http://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html
$ cat whitelist 
http://www.white.com
http://www.whitedomain.com
$

Final Output:

$ sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|')"'/! s/http/httx/g' data
sdfsdfsdfsdf http://www.whitedomain.com/red.html
bla httx://www.black.com/green/yellow.html blabla
sdfsdfsdfsdf http://www.white.com/red.html
$

Explanation:

Output of inner subshell command is a regex(to filter out lines during sed substitution command)

$ sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|'
http:\/\/www\.white\.com|http:\/\/www\.whitedomain\.com

Flow:

  1. form the regex dynamically using inner subshell command escaping all meta characters in sed and then piping it to paste to add alternations
  2. Using the above output in the sed command to filter out lines not having any of the whitelist domains and using those lines for substitution of http into httx

Edit1: Since sed is line oriented you will have to transform the data into lines of text like this:

$ cat data1 
<div dir="ltr"><div><a href="http://www.white.com">http://www.white.com</a><br></div><a href="http://www.example.com">http://www.example.com</a><br></div>
$ cat whitelist 
http://www.white.com
http://www.whitedomain.com
$ sed 's/</\n</g' data1 | sed -r '/'"$(sed -r 's/\\/\\\\/g;s/\//\\\//g;s/\^/\\^/g;s/\[/\\[/g;s/'\''/'\'"\\\\"\'\''/g;s/\]/\\]/g;s/\*/\\*/g;s/\$/\\$/g;s/\./\\./g' whitelist | paste -s -d '|')"'/! s/http/httx/g'

<div dir="ltr">
<div>
<a href="http://www.white.com">http://www.white.com
</a>
<br>
</div>
<a href="httx://www.example.com">httx://www.example.com
</a>
<br>
</div>
$

这篇关于查找和后缀的文件替换网址 - 的Linux / Ubuntu的的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆