从 HTML 中删除样式 [英] Removing styling from HTML

查看:44
本文介绍了从 HTML 中删除样式的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有一个充满产品描述的数据库,这些描述充满了可怕的计算机生成的 HTML 并散布着不同的样式信息……样式属性、字体标签、背景属性……

我必须重新设计网站,但首先我需要从产品描述中删除所有样式.在有人建议手动操作之前,有 100,000 种产品.我认为 PHP 中的一些有创意的正则表达式可能会起作用.

理想情况下,我想删除所有 HTML 并只有纯文本,但描述包含表格和表格表格......所以这只会以泪水结束.

期待您的创意解决方案:)

编辑-

再想一想,我也可以在 VBA 中完成,因为我可以将它们导出到 Excel 表中.所以 PHP 或 VBA 解决方案会很棒.

编辑-

 

<table border="0" cellpadding="0" cellspacing="0" style="border-collapse:collapse" bordercolor="#111111" width="694" id="AutoNumber1"><tbody><tr><td width="516" height="18" bgcolor="#999966" align="center"><p align="center"><font face="Verdana" color="#FFFFFF"><b>Mont Blanc Scott Roof 安装式自行车架<br><br>部件号:728540</b></font></p></td><td width="178" height="18" bgcolor="#999966" align="center"><a href="/shippingcalculator.html?SKU=728540" target="_blank"><img border="0" src="http://images.ZZZZpro.com/2145/" width="88" height="33"></a></td></tr><tr><td width="694" height="57" bgcolor="#CCCC99" align="center" colspan="2"><b><font face="Verdana" size="2" class="CustomStyle-CycleCarrier"><script type="text/javascript"><!--function click() { if (event.button==2) { alert('所有图形、描述和其他信息,包括此列表的 HTML 代码都是 XXXX Limited 的财产,不得在未经 XXXX Limited 明确许可的任何形式.给我们发电子邮件:sales@XXXX.com');} } document.onmousedown=click//--><!----><!----><!----><!----><!----><!----><!----><!----><!----><!----><!----><!---->--><div align="center"><中心><table height="336" background="http://images.ZZZZpro.com/2145/I/21/fade1.jpg" width="680" border="0"><tbody><tr><td height="49" width="136"><p align="center"><img height="62" src="http://XXXXbiz.ipage.com/XXXX/Images/Mont%20Blanc/montblanc.jpg" width="165" border="0"></p></td><td height="49" width="378"><p align="center"><font face="Verdana" color="#0000ff" size="5"><u><strong>勃朗峰</strong></u></font><u><strong><font face="Verdana" color="#0000FF" size="5">Scott Roof Bar Rack 1 Cycle Carrier</font></strong></u></p></td><td height="49" width="146"><img height="69" src="http://images.ZZZZpro.com/2145/I/20/logomed.gif" width="174" border="0"></td></tr><tr><td height="241" colspan="3" width="672"><hr><p align="center"><img height="223" src="http://XXXXbiz.ipage.com/XXXX/Images/Mont%20Blanc/scottlrg.jpg" width="237" border="0"></p><p><font color="black"><b>斯科特</b></font></p><ul><li>时尚且易于使用的车顶自行车架,独特的椭圆形承载杆.<br></li><li>超软车架夹安全、轻柔地保持自行车<br></li><li>超宽的车轮支架可以承受最厚的轮胎<br></li><li>坚固的织带将车轮牢固地固定在承载器上<br></li><li><font size="3" color="black">立式,安装在车顶杆上,锁定自行车架<br></font></li><li>font size="3" color="black">>锁到车顶导轨和锁自行车<br></font></li><li><font size="3" color="黑色">&nbsp;快速且易于使用<br></font></li><li><font size="3" color="black">可调整为大多数循环样式</font></li></ul><center><table cellpacing="0" width="100%" cellpadding="20" border="0" height="1" class="功能表"><tbody><tr><td height="55" class="featuretd" width="110"><p align="center"><a target="_blank" href="http://www.montblancuk.co.uk/support/inst/scott.pdf"><img width="20" alt="打开文档" src="http://espimages.biz/2145/I/20/mount_link.gif" border=0"高度=20"></a></p></td><td height="55" class="featuretd">要查看 PDF 格式的安装说明,请单击扳手</td></tr></tbody></table><table height="317"><tbody><tr class="technicaltr" valign="top"><td height="1" class="technicalfirstcolumn"><font class="technicalheader">技术数据</font></td><td height="1" class="technicalsecondcolumn"><p><font class="heading1">Mont </font>Blanc Scott</p><p align="center">;<img height="107" src="http://XXXXbiz.ipage.com/XXXX/Images/Mont%20Blanc/scottfaint.jpg" width="127" border="0"></p></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>自行车的最大数量</div></td><td height="21" class="technicalsecondcolumn"><div>1</div></td></tr><tr class="technicaltr" valign="top"><td height="18" class="technicalfirstcolumn"><div>承载能力(公斤)</div></td><td height="18" class="technicalsecondcolumn"><div>15 KG</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>重量(公斤)</div></td><td height="21" class="technicalsecondcolumn"><div>2.2KG</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>适合框架尺寸(mm)</div></td><td height="21" class="technicalsecondcolumn">高达 80mm</td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>适合车轮尺寸</div></td><td height="21" class="technicalsecondcolumn"><div>All</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>将自行车锁定到托架上</div></td><td height="21" class="technicalsecondcolumn"><div>是</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>将载体锁定到汽车</div></td><td height="21" class="technicalsecondcolumn"><div>是</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>倾斜功能,带自行车</div></td><td height="21" class="technicalsecondcolumn"><div>NA</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>TÜV/EuroBE 批准</div></td><td height="21" class="technicalsecondcolumn"><div>NA</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>满足城市崩溃规范</div></td><td height="21" class="technicalsecondcolumn"><div>NA</div></td></tr><tr class="technicaltr" valign="top"><td height="21" class="technicalfirstcolumn"><div>杂项</div></td><td height="21" class="technicalsecondcolumn"><div><p>适合所有类型的屋顶钢筋,</p></div></td></tr></tbody></table><p align="center"><font size="2" face="Verdana">自行车架是自购买之日起五年保修.<br><br>我们备有各种牵引杆和牵引配件.<a href="mailto:sales@XXXX.com?subject=Witter ZX88 自行车架"><br>点击在这里给我们发电子邮件如果您需要我们其他的详细信息牵引设备.</p><小时></中心></td></tr></tbody></table></中心>

<br>请注意,与您安装它的自行车架类型<br>在法兰球上,您可能需要长距离球,它会<br>让您与保险杠有足够的间隙</font></b></td></tr><tr><td width="694" height="57" bgcolor="#CCCC99" align="center" colspan="2"><a href="http://www.XXXXeuro.ZZZZprostorefront.co.uk/products/728540-mont-blanc-scott-roof-mounted-cycle-bike-carrier-728540.html" target="_blank"><img border="0" src="http://images.ZZZZpro.com/2145/" width="55" height="40"></a><b><font face="Verdana" size="2">不是来自英国?点击旗帜从我们的欧盟网站 </font></b><a href="http://www.XXXXeuro.ZZZZprostorefront.co.uk/products/728540-mont-blanc-scott-roof- 购买此商品mount-cycle-bike-carrier-728540.html" target="_blank"><img border="0" src="http://images.ZZZZpro.com/2145/" width="57" height=40"></a></td></tr></tbody></table>

编辑-

仔细看,我认为我需要摆脱以下内容:

属性:风格背景色背景

标签:字体

解决方案

我建议使用 XSLT 去除所有不需要的内容.一个简单的身份模板将是一个很好的起点.

I have a database full of product descriptions that have been entered riddled with horrible computer generated HTML and littered with different styling information...style attributes, font tags, background attributes...

I have to re-design the website, but first I need to remove all the styling from the product descriptions. There are 100,000 products before someone suggests doing it manually. I am thinking some creative regex's in PHP might do the trick.

Ideally I would like to remove all HTML and just have plain text, but the descriptions contain tables and tables of tables... so that would just end in tears.

Looking forward to your creative solutions :)

EDIT-

On second thoughts I could also do it in VBA as I can export them to an excel sheet. So PHP or VBA solutions would be great.

EDIT-

    <div class="XXXX-template-06">
          <table border="0" cellpadding="0" cellspacing="0" style="border-collapse: collapse" bordercolor="#111111" width="694" id="AutoNumber1">
            <tbody><tr>
              <td width="516" height="18" bgcolor="#999966" align="center">
              <p align="center"><font face="Verdana" color="#FFFFFF"><b>Mont Blanc Scott Roof mounted cycle bike carrier<br>
              <br>
              Part Number: 728540</b></font></p></td>
              <td width="178" height="18" bgcolor="#999966" align="center">
              <a href="/shippingcalculator.html?SKU=728540" target="_blank"><img border="0" src="http://images.ZZZZpro.com/2145/" width="88" height="33"></a></td>
            </tr>
            <tr>
              <td width="694" height="57" bgcolor="#CCCC99" align="center" colspan="2">
              <b><font face="Verdana" size="2" class="CustomStyle-CycleCarrier">
    <script type="text/javascript">
    <!--function click() { if (event.button==2) { alert('All graphics, descriptions and other information, including the HTML code of this listing are the property of XXXX Limited and may not be reproduced in any form without the express permission of XXXX Limited. Email us: sales@XXXX.com'); } } document.onmousedown=click // -->
    <!---->
    <!---->
    <!---->
    <!---->
    <!---->
    <!---->
    <!---->
    <!---->
    <!---->
    <!---->
    <!---->
    <!----> -->
    </script>


    <div align="center">
      <center>
        <table height="336" background="http://images.ZZZZpro.com/2145/I/21/fade1.jpg" width="680" border="0">
          <tbody><tr>
            <td height="49" width="136"><p align="center"><img height="62" src="http://XXXXbiz.ipage.com/XXXX/Images/Mont%20Blanc/montblanc.jpg" width="165" border="0"></p></td>
            <td height="49" width="378"><p align="center"><font face="Verdana" color="#0000ff" size="5"><u><strong>Mont Blanc </strong></u></font><u><strong><font face="Verdana" color="#0000FF" size="5">Scott Roof Bar Rack 1 Cycle Carrier</font></strong></u></p></td>
            <td height="49" width="146"><img height="69" src="http://images.ZZZZpro.com/2145/I/20/logomed.gif" width="174" border="0"></td>
          </tr>
          <tr>
            <td height="241" colspan="3" width="672"><hr><p align="center"><img height="223" src="http://XXXXbiz.ipage.com/XXXX/Images/Mont%20Blanc/scottlrg.jpg" width="237" border="0"></p><p><font color="black"><b>Scott</b> </font></p><ul><li>Stylish, easy to use roof mounted cycle carrier, distinctive oval carrying bar.<br></li><li>Extra Soft Frame clamps hold cycle safely and gently<br></li><li>Extra wide wheel holders take the fattest tyres<br></li><li>Strong Webbing straps fasten wheels securely to carrier<br></li><li><font size="3" color="black">Upright, roof bar mounted, locking cycle carrier<br></font></li><li><font size="3" color="black">&nbsp;Locks to roof rails and locks bikes<br></font></li><li><font size="3" color="black">&nbsp;Quick and easy to use<br></font></li><li><font size="3" color="black">Adjustable for most cycle styles</font></li></ul><center><table cellspacing="0" width="100%" cellpadding="20" border="0" height="1" class="featuretable">
                  <tbody><tr>
                    <td height="55" class="featuretd" width="110"><p align="center"><a target="_blank" href="http://www.montblancuk.co.uk/support/inst/scott.pdf"><img width="20" alt="Open document" src="http://espimages.biz/2145/I/20/mount_link.gif" border="0" height="20"></a></p></td>
                    <td height="55" class="featuretd">To view Fitting Instructions in PDF format please click the spanner</td>
                  </tr>
                </tbody></table>
                <table height="317">
                  <tbody><tr class="technicaltr" valign="top">
                    <td height="1" class="technicalfirstcolumn"><font class="technicalheader">Technical data</font></td>
                    <td height="1" class="technicalsecondcolumn"><p><font class="heading1">Mont </font>Blanc Scott</p><p align="center"><img height="107" src="http://XXXXbiz.ipage.com/XXXX/Images/Mont%20Blanc/scottfaint.jpg" width="127" border="0"></p></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Max number of bikes</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>1</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="18" class="technicalfirstcolumn"><div>Load capacity (kg)</div></td>
                    <td height="18" class="technicalsecondcolumn"><div>15 KG</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Weight (kg)</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>2.2KG</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Fits frame-dimensions (mm)</div></td>
                    <td height="21" class="technicalsecondcolumn">Up to 80mm</td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Fits wheel-dimensions</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>All</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Locks bikes to carrier</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>Yes</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Locks carrier to car</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>Yes</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Tilt function, with bikes</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>NA</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>TÜV/EuroBE approved</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>NA</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Fullfills City Crash norms</div></td>
                    <td height="21" class="technicalsecondcolumn"><div>NA</div></td>
                  </tr>
                  <tr class="technicaltr" valign="top">
                    <td height="21" class="technicalfirstcolumn"><div>Miscellaneous</div></td>
                    <td height="21" class="technicalsecondcolumn"><div><p>Fits all types of Roof Bars,</p></div></td>
                  </tr>
                </tbody></table>
                <p align="center">
                  <font size="2" face="Verdana">The cycle carrier is 
                  guaranteed for Five year from date of purchase.                  
<br>                  
<br>We stock a wide range of towbars and towing accessories.                   
<a href="mailto:sales@XXXX.com?subject=Witter ZX88 Cycle Carrier"><br>Click 
                  here to email us</a> if you require details of our other 
                  towing equipment.</font>
                </p>


<hr>                
              </center>

            </td>

          </tr>
        </tbody></table>
      </center>

    </div>

  <br>
              Please note that with the Type of cycle carrier where you mount it
              <br>
              onto a flange ball you may need the long reach ball which will <br>
              allow you enough clearance from the bumper</font></b></td>
            </tr>
            <tr>
              <td width="694" height="57" bgcolor="#CCCC99" align="center" colspan="2">
              <a href="http://www.XXXXeuro.ZZZZprostorefront.co.uk/products/728540-mont-blanc-scott-roof-mounted-cycle-bike-carrier-728540.html" target="_blank"><img border="0" src="http://images.ZZZZpro.com/2145/" width="55" height="40"></a>
              <b><font face="Verdana" size="2">Not from the UK ? Click the flag
              to purchase this item from our EU site </font></b><a href="http://www.XXXXeuro.ZZZZprostorefront.co.uk/products/728540-mont-blanc-scott-roof-mounted-cycle-bike-carrier-728540.html" target="_blank"><img border="0" src="http://images.ZZZZpro.com/2145/" width="57" height="40"></a></td>
            </tr>
          </tbody></table>
</div>

EDIT-

Looking through it I think I need to get rid of the following:

Atrributes: style bgcolor background

Tags: font

解决方案

I would recommend using XSLT to strip off all unwanted content. A simple identity template would be a good starting point.

这篇关于从 HTML 中删除样式的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆