XHTML到XML的转换 [英] XHTML to XML conversion

查看:58
本文介绍了XHTML到XML的转换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试做一些屏幕抓取,并且正在使用

< http://www.oreilly.com/catalog/xmlhks/>灵感。


首先我想将XHTML转换为XML,或者从XHTML中提取XML,我是

不知道怎么用短语那个。


使用Cocoon创建一个网页的形成良好的视图,然后刮掉它

for data"

< http://hacks.oreilly.com/pub/h/2125>


这就是我想要做的事情,但是现在我'我正在工作

更简单。

首先,


将HTML文档转换为带有HTML Tidy的XHTML

< http://hacks.oreilly.com/pub/h/2054>


而不是Tidy,我选择TagSoup

< http://mercury.ccil.org/~cowan/XML/tagsoup/>。

然后我想从XHTML转到XML以便:


"使用Relaxer生成XSLT标识样式表>

< http://hacks.oreilly.com/pub/h/2069>


如何从XHTML中获取XML?


这里是'我所拥有的:[thufir @ arrakis tagSoup] $

[thufir @ arrakis tagSoup] $ date

Sun Aug 14 23:34: 13 IST 2005

[thufir @ arrakis tagSoup] $ pwd

/ home / thufir / Desktop / tagSoup

[thufir @ arrakis tagSoup] $ l $

总计60

-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html

-rw- rw-r-- 1 thufir thufir 42207 8月14日23:32 tagsoup.jar

[thufir @ arrakis tagSoup] $ java -jar tagsoup.jar --files google.html

src:google.html dst:google.xhtml

[thufir @ arrakis tagSoup] $ ll

总计76

-rw-rw -r-- 1 thufir thufir 7662 Aug 13 22:08 google.html

-rw-rw-r-- 1 thufir thufir 10568 8月14日23:34 google.xhtml

-rw-rw-r-- 1 thufir thufir 42207 8月14日23:32 tagsoup.jar

[thufir @ arrakis tagSoup] $ cat google.xhtml -n

1<?xml version =" 1.0" standalone =" yes"?>

2

3< html version =" - // W3C // DTD HTML 4.01 Transitional // EN"

xmlns =" http://www.w3.org/1999/xhtml">< head>< title> Google

目录< / title>< style>& lt;! -

4 body,td,a,p,.h {font-family:arial,sans-serif;}

.. h {color:#008000}

..q {text-decoration:none;颜色:#0000cc;}

5 // - & gt;< / style>< script>

6& lt;! -

7函数sf(){document.fqfocus();}

8 // - & gt;

9< / script> ;< / head>< body bgcolor =" #ffffff" text ="#000000"

link ="#3300cc"的Vlink = QUOT;#660066" ALINK = QUOT;#FF0000" onload =" sf();">

10< center>

11< table cellpadding =" 0" CELLSPACING = QUOT; 0" border =" 0">< tr>< td

align =" right"列跨度= QUOT 1 QUOT;行跨度= QUOT 1 QUOT; valign =" bottom">< img

src =" http://www.google.com/images/hp0.gif"宽度= QUOT; 158" height =" 78"

alt =" Google Directory">< / img>< / td>< td colspan =" 1" rowspan =" 1"

valign =" bottom">< img src =" http://www.google.com/images/hp1.gif"

width =" 50"高度= QUOT; 78" alt ="">< / img>< / td>< td colspan =" 1" rowspan =" 1"

valign =" bottom">< img src =" http://www.google.com/images/hp2.gif"

width =" 68"高度= QUOT; 78" alt ="">< / img>< / td>< / tr>< tr>< td align =" right"

colspan =" 1"行跨度= QUOT 1 QUOT; VALIGN = QUOT;顶" class =" h">< b>目录< / b>< / td>< td

colspan =" 1"行跨度= QUOT 1 QUOT; valign =" top">< img

src =" http://www.google.com/images/hp3.gif"宽度= QUOT; 50" height =" 32"

alt ="">< / img>< / td>< td colspan =" 1"行跨度= QUOT 1 QUOT; valign =" top"

class =" h">< / td>< / tr>< / table>< br clear =" none">< / br>< table border =" 0"

cellspacing =" 0" cellpadding =" 0">< tr>< td colspan =" 1" rowspan =" 1"

width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap"

rowspan =" 1" ID = QUOT; 0" BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect"

class =" q" ID = QUOT; 0A" href =" http://www.google.com/webhp?hl = zh">< font

size =" -1"> Web< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1"

width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap"

rowspan =" 1" ID = QUOT 1 QUOT; BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect"

class =" q" ID = QUOT 1a是QUOT; href =" http://www.google.com/imghp?hl = zh">< font

size =" -1"> Images< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1"

width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap"

rowspan =" 1" ID = QUOT; 2英寸BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect"

class =" q" ID = QUOT 2a是QUOT; href =" http://www.google.com/grphp?hl = zh">< font

size =" -1"> Groups< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1"

width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap"

rowspan =" 1" ID = QUOT; 3英寸BGCOLOR = QUOT;#008000" width =" 95">< font color =" #ffffff"

size =" -1">< b> Directory< / b>< / font>< ; / td>< td colspan =" 1" rowspan =" 1"

width =" 15"> < / td>< td align =" center"列跨度= QUOT 1 QUOT; nowrap =" nowrap"

rowspan =" 1" ID = QUOT; 4英寸BGCOLOR = QUOT;#EFEFEF" width =" 95">< a shape =" rect"

class =" q" ID = QUOT 4a是QUOT; href =" http://www.google.com/nwshp?hl = zh">< font

size =" -1"> News< / font>< / a>< / td>< td colspan =" 1" rowspan =" 1"

width =" 15"> < / td>< td colspan =" 1" rowspan =" 1"

width =" 15"> < / td>< / tr>< tr>< td colspan =" 12" rowspan =" 1"

bgcolor ="#008000">< img width =" 1" height =" 1"

alt ="">< / img>< / td>< / tr>< / table>< br clear =" none" ;>< / br>< form

enctype =" application / x-www-form-urlencoded" method =" get"

action =" http://www.google.com/search" name =" f">< table cellpadding =" 0"

cellspacing =" 0">< tr align =" middle" valign =" center">< td colspan =" 1"

rowspan =" 1"宽度= QUOT; 150"> < / td>< td colspan =" 1" rowspan =" 1">< input

maxlength =" 256"类型= QUOT;文本"名称= QUOT; Q" size =" 40"

value ="">< / input>< script> document.fqfocus();< / script>< input

type =" submit"名称= QUOT; btnG" value =" Google Search">< / input>< input

type =" hidden"名称= QUOT; HL" value =" en">< / input>< input type =" hidden"

name =" cat" value =" gwd / Top">< / input>< / td>< td align =" left" colspan =" 1"

rowspan =" 1" width =" 150">< font size =" -2"> •< a

shape =" rect" href =" http://www.google.com/dirhelp.html">目录

帮助< / a>< / font>< / td>< / tr>< ; / table>< / form>< p>< font color ="#008000">< b>

网页按主题分类。< / b> < / font>< / p>< p>< / p>< table

align =" center"宽度= QUOT 1%QUOT;边界=" 0" cellspacing =" 7"

cellpadding =" 0">< tr>< td colspan =" 4"行跨度= QUOT 1 QUOT; bgcolor ="#008000">< img

width =" 1"高度= QUOT 1 QUOT; alt ="">< / img>< / td>< / tr>< tr>< td colspan =" 1"

rowspan =" 1" > < / td>< td colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1">

12< b>< a shape =" rect" href =" / Top / Arts /"> Arts< / a>< / b>< br

clear =" none">< / br>

13< font size =" -1">< a shape =" rect"

href =" / Top / Arts / Movies /">电影< / a>,< a shape =" rect"

href =" / Top / Arts / Music /">音乐< / a>,< a shape =" ; rect"

href =" / Top / Arts / Television /"> Television< / a> ;, ...< / font>< p>

14< b>< a shape =" rect" href =" / Top / Business /">业务< / a>< / b>< br

clear =" none">< / br>

15< font size =" -1">< a shape =" rect"

href =" / Top / Business / Major_Companies /">公司< / a>,< a shape =" rect"

href =" / Top / Business / Financial_Services /"> Finance< / a>,< a shape =" ; rect"

href =" / Top / Business / Employment /"> Jobs< / a> ;, ...< / font>< / p>< p>

16< b>< a shape =" rect" href =" / Top / Computers /"> Computers< / a>< / b>< br

clear =" none">< / br>

17< font size =" -1">< a shape =" rect"

href =" / Top / Computers / Internet /">互联网< / a>,< a shape =" rect"

href =" / Top / Computers / Hardware /">硬件< / a>,< a shape =" ; rect"

href =" / Top / Computers / Software /"> Software< / a> ;, ...< / font>< / p>< p>

18< b>< a shape =" rect" href =" / Top / Games /"> Games< / a>< / b>< br

clear =" none">< / br>

19< font size =" -1">< a shape =" rect"

href =" / Top / Games / Board_Games /"> Board< / a>,< a shape =" rect"

href =" / Top / Games / Roleplaying /"> Roleplaying< / a>,< a shape =" ; rect"

href =" / Top / Games / Video_Games /">视频< / a>,...< / font>< / p>< p>

20< b>< a shape =" rect" href =" / Top / Health /"> Health< / a>< / b>< br

clear =" none">< / br>

21< font size =" -1">< a shape =" rect"

href =" / Top / Health / Alternative /">替代< / a>,< a shape =" rect"

href =" / Top / Health / Fitness /"> Fitness< / a>,< a shape =" ; rect"

href =" / Top / Health / Medicine /"> Medicine< / a> ;, ...< / font>< / p>< p>

22< / p>< / td>< td colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1">

23< b>< a shape =" rect" href =" / Top / Home /"> Home< / a>< / b>< br

clear =" none">< / br>

24< font size =" -1">< a shape =" rect"

href =" / Top / Home / Consumer_Information /">消费者< / a>,< a shape =" rect"

href =" / Top / Home / Homeowners /">房主< / a>,< a shape =" ; rect"

href =" / Top / Home / Family /"> Family< / a> ;, ...< / font>< p>

25< b>< a shape =" rect" href =" / Top / Kids_and_Teens /"> Kids and

Teens< / a>< / b>< br clear =" none">< / br>

26< font size =" -1">< a shape =" rect"

href =" / Top / Kids_and_Teens / Computers /" >计算机< / a>,< a shape =" rect"

href =" / Top / Kids_and_Teens / Entertainment /">娱乐< / a>,< a

shape =" rect" href =" / Top / Kids_and_Teens / School_Time /"> School< / a>,

....< / font>< / p>< p>

27< b>< a shape =" rect" href =" / Top / News /"> News< / a>< / b>< br

clear =" none">< / br>

28< font size =" -1">< a shape =" rect"

href =" / Top / News / Media /">媒体< / a>,< a shape =" rect"

href =" / Top / News / Newspapers /"> Newspapers< / a>,< a shape =" ; rect"

href =" / Top / News / Current_Events /"> Current Events< / a>,...< / font>< / p>< p>

29< b>< a shape =" rect"

href =" / Top / Recreation /"> Recreation< / a>< / b>< br

clear =" none">< / br> 30< font size =" -1">< a shape =" rect"

href =" / Top / Recreation / Food /"> Food< / a> ,< a shape =" rect"

href =" / Top / Recreation / Outdoors /"> Outdoors< / a>,< a shape =" rect"

href =" / Top / Recreation / Travel /"> Travel< / a> ;, ...< / font>< / p>< p>

31< b>< a shape =" rect" href =" / Top / Reference /">参考< / a>< / b>< br

clear =" none">< / br>

32< font size =" -1">< a shape =" rect"

href =" / Top / Reference / Education /">教育< / a>,< a shape =" rect"

href =" / Top / Reference / Libraries /"> Libraries< / a>,< a shape =" ; rect"

href =" / Top / Reference / Maps /"> Maps< / a>,...< / font>< / p>< p>

33< / p>< / td>< td colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1">

34< b>< a shape =" rect" href =" / Top / Regional /"> Regional< / a>< / b>< br

clear =" none">< / br>

35< font size =" -1">< a shape =" rect"

href =" / Top / Regional / Asia /">亚洲< / a>,< a shape =" rect"

href =" / Top / Regional / Europe /"> Europe< / a> ;,< a shape =" ; rect"

href =" / Top / Regional / North_America /"> North America< / a> ;, ...< / font>< p>

36< b>< a shape =" rect" href =" / Top / Science /"> Science< / a>< / b>< br

clear =" none">< / br>

37< font size =" -1">< a shape =" rect"

href =" / Top / Science / Biology /"> Biology< / a>,< a shape =" rect"

href =" / Top / Science / Social_Sciences / Psychology /"> Psychology< / a> ;,&l; a

shape =" rect" href =" / Top / Science / Physics /"> Physics< / a>,

....< / font>< / p>< p>

38< b>< a shape =" rect" href =" / Top / Shopping /"> Shopping< / a>< / b>< br

clear =" none">< / br>

39< font size =" -1">< a shape =" rect"

href =" / Top / Shopping / Vehicles / Autos /" > Autos< / a>,< a shape =" rect"

href =" / Top / Shopping / Clothing /">服装< / a>,< a shape =" rect"

href =" / Top / Shopping / Gifts /"> Gifts< / a>,...< / font>< / p>< p> ;

40< b>< a shape =" rect" href =" / Top / Society /"> Society< / a>< / b>< br

clear =" none">< / br>

41< font size =" -1">< a shape =" rect"

href =" / Top / Society / Issues /">问题< / a>,< a shape =" rect"

href =" / Top / Society / People /"> People< / a>,< a shape =" ; rect"

href =" / Top / Society / Religion_and_Spirituality /"> Religion< / a> ;,

....< / font>< ; / p>< p>

42< b>< a shape =" rect" href =" / Top / Sports /"> Sports< / a>< / b>< br

clear =" none">< / br>

43< font size =" -1">< a shape =" rect"

href =" / Top / Sports / Basketball /">篮球< / a>,< a shape =" rect"

href =" / Top / Sports / Football /"> Football< / a>,< a shape =" ; rect"

href =" / Top / Sports / Soccer /"> Soccer< / a> ;, ...< / font>< / p>< p>

44< / p>< / td>< / tr>< tr>< td colspan =" 1"行跨度= QUOT 1 QUOT;> < / td>< td

colspan =" 3" rowspan =" 1">< b>< a shape =" rect"

href =" / Top / World /"> World< / a>< / b>< br clear =" none">< / br>

45< font size =" -1">< a shape =" rect"

href =" / Top / World / Deutsch /"> Deutsch< / a>,< a shape =" rect"

href =" / Top / World / Espa%C3%B1ol /"> Espa&#65533; ol< / a>,< a shape =" rect"

href =" / Top / World / Fran %C3%A7ais /"> Fran&#65533; ais< / a>,< a shape =" rect"

href =" / Top / World / Italiano /"> ; Italiano< / a>,< a shape =" rect"

href =" / Top / World / Japanese /"> Japanese< / a> ;,< a shape = " rect"

href =" / Top / World / Korean /"> Korean< / a> ;,< a shape =" rect"

href =" / Top / World / Nederlands /"> Nederlands< / a> ;,< a shape =" rect"

href =" ; / Top / World / Polska /"> Polska< / a>,< a shape =" rect"

href =" / Top / World / Svenska /"> Svenska< / a>,...< / font>< p>

46< / p>< / td>< / tr>< tr>< td colspan = QUOT 1 QUOT;行跨度= QUOT 1 QUOT;> < / td>< td

colspan =" 1" NOWRAP = QUOT; NOWRAP" rowspan =" 1">< font

size =" -1"> < / font>< / td>< / tr>< tr>< td colspan =" 4" rowspan =" 1"

bgcolor ="#008000">< img width =" 1" height =" 1"

alt ="">< / img>< / td>< / tr>< / table>< br clear =" none" ;>< / br>< font size =" -1">< a

shape =" rect"

href =" http ://www.google.com/ads/">刊登广告< / A> - < a

shape =" rect"

href =" http://www.google.com/about.html"> Jobs,Press, Cool Stuff ...< / a>< / font>< p>< font

face =" arial,sans-serif"大小= QUOT; -1 QUOT;> ©2004 Google< / font>< / p>< br

clear =" none">< / br>< table align =" center"边界=" 0" bgcolor ="#336600"

cellpadding =" 3" cellspacing =" 0">< tr>< td colspan =" 1"行跨度= QUOT 1 QUOT;> < table

width =" 100%" CELLPADDING = QUOT; 2英寸CELLSPACING = QUOT; 0" border =" 0">< tr

align =" center">< td colspan =" 1" rowspan =" 1">< font face =" sans-serif,

Arial,Helvetica"大小= QUOT; 2英寸color =" #ffffff">帮助在网络上构建最大的人工编辑目录。< / font>< / td>< / tr>< tr align =" ; center

bgcolor =" #cccccc">< td colspan =" 1" rowspan =" 1">< font face =" sans-serif,

Arial,Helvetica" size =" 2">

47< a shape =" rect" href =" http://dmoz.org/add.html">

48提交网站< / a> - < a shape =" rect"

href =" http://dmoz.org/about.html">< b> Open Directory Project< / b>< /一个> -

49< a shape =" rect" href =" http://dmoz.org/cgi-bin/apply.cgi">成为

编辑< / a> < / font>

50< / td>< / tr>< / table>

< / td>< / tr>< / table>

52< / center>< / body>< / html>



[thufir @ arrakis tagSoup] $ date

Sun Aug 14 23:34:57 IST 2005

[thufir @ arrakis tagSoup] $

谢谢,


Thufir

解决方案



[thufir @ arrakis tagSoup]

date

Sun Aug 14 23:34:13 IST 2005

[thufir @ arrakis tagSoup]


< blockquote> pwd

/ home / thufir / Desktop / tagSoup

[thufir @ arrakis tagSoup]


I''m trying do some "screen scraping", and am using
<http://www.oreilly.com/catalog/xmlhks/> for inspiration.

First I''d like to convert XHTML to XML, or extract XML from XHTML, I''m
not sure how to phrase that.

"Use Cocoon to Create a Well-Formed View of a Web Page, Then Scrape It
for Data"
<http://hacks.oreilly.com/pub/h/2125>

Is what I''d like to do down the line, but for now I''m working on
something simpler.
First,

"Convert an HTML Document to XHTML with HTML Tidy"
<http://hacks.oreilly.com/pub/h/2054>

Instead of Tidy, I went with TagSoup
<http://mercury.ccil.org/~cowan/XML/tagsoup/>.
Then I''d like go from XHTML to XML in order to:

"Generate an XSLT Identity Stylesheet with Relaxer"
<http://hacks.oreilly.com/pub/h/2069>

How do I get the XML from the XHTML, please?

here''s what I have:[thufir@arrakis tagSoup]$
[thufir@arrakis tagSoup]$ date
Sun Aug 14 23:34:13 IST 2005
[thufir@arrakis tagSoup]$ pwd
/home/thufir/Desktop/tagSoup
[thufir@arrakis tagSoup]$ ll
total 60
-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html
-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar
[thufir@arrakis tagSoup]$ java -jar tagsoup.jar --files google.html
src: google.html dst: google.xhtml
[thufir@arrakis tagSoup]$ ll
total 76
-rw-rw-r-- 1 thufir thufir 7662 Aug 13 22:08 google.html
-rw-rw-r-- 1 thufir thufir 10568 Aug 14 23:34 google.xhtml
-rw-rw-r-- 1 thufir thufir 42207 Aug 14 23:32 tagsoup.jar
[thufir@arrakis tagSoup]$ cat google.xhtml -n
1 <?xml version="1.0" standalone="yes"?>
2
3 <html version="-//W3C//DTD HTML 4.01 Transitional//EN"
xmlns="http://www.w3.org/1999/xhtml"><head><title>Google
Directory</title><style>&lt;!--
4 body,td,a,p,.h{font-family: arial,sans-serif;}
..h{color:#008000}
..q{text-decoration:none; color:#0000cc;}
5 //--&gt;</style><script>
6 &lt;!--
7 function sf(){document.f.q.focus();}
8 // --&gt;
9 </script></head><body bgcolor="#ffffff" text="#000000"
link="#3300cc" vlink="#660066" alink="#ff0000" onload="sf();">
10 <center>
11 <table cellpadding="0" cellspacing="0" border="0"><tr><td
align="right" colspan="1" rowspan="1" valign="bottom"><img
src="http://www.google.com/images/hp0.gif" width="158" height="78"
alt="Google Directory"></img></td><td colspan="1" rowspan="1"
valign="bottom"><img src="http://www.google.com/images/hp1.gif"
width="50" height="78" alt=""></img></td><td colspan="1" rowspan="1"
valign="bottom"><img src="http://www.google.com/images/hp2.gif"
width="68" height="78" alt=""></img></td></tr><tr><td align="right"
colspan="1" rowspan="1" valign="top" class="h"><b>Directory</b></td><td
colspan="1" rowspan="1" valign="top"><img
src="http://www.google.com/images/hp3.gif" width="50" height="32"
alt=""></img></td><td colspan="1" rowspan="1" valign="top"
class="h"></td></tr></table><br clear="none"></br><table border="0"
cellspacing="0" cellpadding="0"><tr><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="0" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="0a" href="http://www.google.com/webhp?hl=en"><font
size="-1">Web</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="1" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="1a" href="http://www.google.com/imghp?hl=en"><font
size="-1">Images</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="2" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="2a" href="http://www.google.com/grphp?hl=en"><font
size="-1">Groups</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="3" bgcolor="#008000" width="95"><font color="#ffffff"
size="-1"><b>Directory</b></font></td><td colspan="1" rowspan="1"
width="15"> </td><td align="center" colspan="1" nowrap="nowrap"
rowspan="1" id="4" bgcolor="#efefef" width="95"><a shape="rect"
class="q" id="4a" href="http://www.google.com/nwshp?hl=en"><font
size="-1">News</font></a></td><td colspan="1" rowspan="1"
width="15"> </td><td colspan="1" rowspan="1"
width="15"> </td></tr><tr><td colspan="12" rowspan="1"
bgcolor="#008000"><img width="1" height="1"
alt=""></img></td></tr></table><br clear="none"></br><form
enctype="application/x-www-form-urlencoded" method="get"
action="http://www.google.com/search" name="f"><table cellpadding="0"
cellspacing="0"><tr align="middle" valign="center"><td colspan="1"
rowspan="1" width="150"> </td><td colspan="1" rowspan="1"><input
maxlength="256" type="text" name="q" size="40"
value=""></input><script>document.f.q.focus();</script><input
type="submit" name="btnG" value="Google Search"></input><input
type="hidden" name="hl" value="en"></input><input type="hidden"
name="cat" value="gwd/Top"></input></td><td align="left" colspan="1"
rowspan="1" width="150"><font size="-2"> • <a
shape="rect" href="http://www.google.com/dirhelp.html">Directory
Help</a></font></td></tr></table></form><p><font color="#008000"><b>The
web organized by topic into categories.</b></font></p><p></p><table
align="center" width="1%" border="0" cellspacing="7"
cellpadding="0"><tr><td colspan="4" rowspan="1" bgcolor="#008000"><img
width="1" height="1" alt=""></img></td></tr><tr><td colspan="1"
rowspan="1"> </td><td colspan="1" nowrap="nowrap" rowspan="1">
12 <b><a shape="rect" href="/Top/Arts/">Arts</a></b><br
clear="none"></br>
13 <font size="-1"><a shape="rect"
href="/Top/Arts/Movies/">Movies</a>, <a shape="rect"
href="/Top/Arts/Music/">Music</a>, <a shape="rect"
href="/Top/Arts/Television/">Television</a>, ...</font><p>
14 <b><a shape="rect" href="/Top/Business/">Business</a></b><br
clear="none"></br>
15 <font size="-1"><a shape="rect"
href="/Top/Business/Major_Companies/">Companies</a>, <a shape="rect"
href="/Top/Business/Financial_Services/">Finance</a>, <a shape="rect"
href="/Top/Business/Employment/">Jobs</a>, ...</font></p><p>
16 <b><a shape="rect" href="/Top/Computers/">Computers</a></b><br
clear="none"></br>
17 <font size="-1"><a shape="rect"
href="/Top/Computers/Internet/">Internet</a>, <a shape="rect"
href="/Top/Computers/Hardware/">Hardware</a>, <a shape="rect"
href="/Top/Computers/Software/">Software</a>, ...</font></p><p>
18 <b><a shape="rect" href="/Top/Games/">Games</a></b><br
clear="none"></br>
19 <font size="-1"><a shape="rect"
href="/Top/Games/Board_Games/">Board</a>, <a shape="rect"
href="/Top/Games/Roleplaying/">Roleplaying</a>, <a shape="rect"
href="/Top/Games/Video_Games/">Video</a>, ...</font></p><p>
20 <b><a shape="rect" href="/Top/Health/">Health</a></b><br
clear="none"></br>
21 <font size="-1"><a shape="rect"
href="/Top/Health/Alternative/">Alternative</a>, <a shape="rect"
href="/Top/Health/Fitness/">Fitness</a>, <a shape="rect"
href="/Top/Health/Medicine/">Medicine</a>, ...</font></p><p>
22 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">
23 <b><a shape="rect" href="/Top/Home/">Home</a></b><br
clear="none"></br>
24 <font size="-1"><a shape="rect"
href="/Top/Home/Consumer_Information/">Consumers</a>, <a shape="rect"
href="/Top/Home/Homeowners/">Homeowners</a>, <a shape="rect"
href="/Top/Home/Family/">Family</a>, ...</font><p>
25 <b><a shape="rect" href="/Top/Kids_and_Teens/">Kids and
Teens</a></b><br clear="none"></br>
26 <font size="-1"><a shape="rect"
href="/Top/Kids_and_Teens/Computers/">Computers</a>, <a shape="rect"
href="/Top/Kids_and_Teens/Entertainment/">Entertainment</a>, <a
shape="rect" href="/Top/Kids_and_Teens/School_Time/">School</a>,
....</font></p><p>
27 <b><a shape="rect" href="/Top/News/">News</a></b><br
clear="none"></br>
28 <font size="-1"><a shape="rect"
href="/Top/News/Media/">Media</a>, <a shape="rect"
href="/Top/News/Newspapers/">Newspapers</a>, <a shape="rect"
href="/Top/News/Current_Events/">Current Events</a>, ...</font></p><p>
29 <b><a shape="rect"
href="/Top/Recreation/">Recreation</a></b><br
clear="none"></br> 30 <font size="-1"><a shape="rect"
href="/Top/Recreation/Food/">Food</a>, <a shape="rect"
href="/Top/Recreation/Outdoors/">Outdoors</a>, <a shape="rect"
href="/Top/Recreation/Travel/">Travel</a>, ...</font></p><p>
31 <b><a shape="rect" href="/Top/Reference/">Reference</a></b><br
clear="none"></br>
32 <font size="-1"><a shape="rect"
href="/Top/Reference/Education/">Education</a>, <a shape="rect"
href="/Top/Reference/Libraries/">Libraries</a>, <a shape="rect"
href="/Top/Reference/Maps/">Maps</a>, ...</font></p><p>
33 </p></td><td colspan="1" nowrap="nowrap" rowspan="1">
34 <b><a shape="rect" href="/Top/Regional/">Regional</a></b><br
clear="none"></br>
35 <font size="-1"><a shape="rect"
href="/Top/Regional/Asia/">Asia</a>, <a shape="rect"
href="/Top/Regional/Europe/">Europe</a>, <a shape="rect"
href="/Top/Regional/North_America/">North America</a>, ...</font><p>
36 <b><a shape="rect" href="/Top/Science/">Science</a></b><br
clear="none"></br>
37 <font size="-1"><a shape="rect"
href="/Top/Science/Biology/">Biology</a>, <a shape="rect"
href="/Top/Science/Social_Sciences/Psychology/">Psychology</a>, <a
shape="rect" href="/Top/Science/Physics/">Physics</a>,
....</font></p><p>
38 <b><a shape="rect" href="/Top/Shopping/">Shopping</a></b><br
clear="none"></br>
39 <font size="-1"><a shape="rect"
href="/Top/Shopping/Vehicles/Autos/">Autos</a>, <a shape="rect"
href="/Top/Shopping/Clothing/">Clothing</a>, <a shape="rect"
href="/Top/Shopping/Gifts/">Gifts</a>, ...</font></p><p>
40 <b><a shape="rect" href="/Top/Society/">Society</a></b><br
clear="none"></br>
41 <font size="-1"><a shape="rect"
href="/Top/Society/Issues/">Issues</a>, <a shape="rect"
href="/Top/Society/People/">People</a>, <a shape="rect"
href="/Top/Society/Religion_and_Spirituality/">Religion</a>,
....</font></p><p>
42 <b><a shape="rect" href="/Top/Sports/">Sports</a></b><br
clear="none"></br>
43 <font size="-1"><a shape="rect"
href="/Top/Sports/Basketball/">Basketball</a>, <a shape="rect"
href="/Top/Sports/Football/">Football</a>, <a shape="rect"
href="/Top/Sports/Soccer/">Soccer</a>, ...</font></p><p>
44 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><td
colspan="3" rowspan="1"><b><a shape="rect"
href="/Top/World/">World</a></b><br clear="none"></br>
45 <font size="-1"><a shape="rect"
href="/Top/World/Deutsch/">Deutsch</a>, <a shape="rect"
href="/Top/World/Espa%C3%B1ol/">Espa�ol</a>, <a shape="rect"
href="/Top/World/Fran%C3%A7ais/">Fran�ais</a>, <a shape="rect"
href="/Top/World/Italiano/">Italiano</a>, <a shape="rect"
href="/Top/World/Japanese/">Japanese</a>, <a shape="rect"
href="/Top/World/Korean/">Korean</a>, <a shape="rect"
href="/Top/World/Nederlands/">Nederlands</a>, <a shape="rect"
href="/Top/World/Polska/">Polska</a>, <a shape="rect"
href="/Top/World/Svenska/">Svenska</a>, ...</font><p>
46 </p></td></tr><tr><td colspan="1" rowspan="1"> </td><td
colspan="1" nowrap="nowrap" rowspan="1"><font
size="-1"> </font></td></tr><tr><td colspan="4" rowspan="1"
bgcolor="#008000"><img width="1" height="1"
alt=""></img></td></tr></table><br clear="none"></br><font size="-1"><a
shape="rect"
href="http://www.google.com/ads/">Advertise with Us</a> - <a
shape="rect"
href="http://www.google.com/about.html">Jobs, Press, Cool Stuff...</a></font><p><font
face="arial,sans-serif" size="-1"> ©2004 Google</font></p><br
clear="none"></br><table align="center" border="0" bgcolor="#336600"
cellpadding="3" cellspacing="0"><tr><td colspan="1" rowspan="1"> <table
width="100%" cellpadding="2" cellspacing="0" border="0"><tr
align="center"><td colspan="1" rowspan="1"><font face="sans-serif,
Arial, Helvetica" size="2" color="#ffffff">Help build the largest
human-edited directory on the web.</font></td></tr><tr align="center"
bgcolor="#cccccc"><td colspan="1" rowspan="1"><font face="sans-serif,
Arial, Helvetica" size="2">
47 <a shape="rect" href="http://dmoz.org/add.html">
48 Submit a Site</a> - <a shape="rect"
href="http://dmoz.org/about.html"><b>Open Directory Project</b></a> -
49 <a shape="rect" href="http://dmoz.org/cgi-bin/apply.cgi">Become
an Editor</a> </font>
50 </td></tr></table>
51 </td></tr></table>
52 </center></body></html>
53
[thufir@arrakis tagSoup]$ date
Sun Aug 14 23:34:57 IST 2005
[thufir@arrakis tagSoup]$
Thanks,

Thufir

解决方案


[thufir@arrakis tagSoup]


date
Sun Aug 14 23:34:13 IST 2005
[thufir@arrakis tagSoup]


pwd
/home/thufir/Desktop/tagSoup
[thufir@arrakis tagSoup]


这篇关于XHTML到XML的转换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆