使用XSLT从网站提取数据 [英] Extracting data from website with XSLT

查看:325
本文介绍了使用XSLT从网站提取数据的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我试图学习XSLT,我遇到了一个问题。我想做的事情是从网站提取一些数据,用xslt模板转换,并最终显示在我自己的xhtml页面。



我说一个xml文件(这将是我的xhtml网站):

 <!DOCTYPE html PUBLIC -  // W3C // DTD XHTML 1.0 Transitional // EN
http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd\">
<?xml-stylesheet type =text / xslhref =myXSLTFile.xsl?>


<! - 这里我想有由xslt文件生成的标记 - >

问题是如何实现?我想让我的xslt文件在特定网站的节点上工作(例如 http://www.example.com ),并将结果生成到我自己的xml文件中。



如果你发现我的解释混乱,请问,我会尽量解释这个问题一点更好。 >

EDIT。我举个例子。假设我们有这个网页: http://www.w3.org/TR/xhtml1/。我想开发XSLT文档从完整的目录中提取章节和节的标题,并将它们放入我自己的xml文件中的表。 我遇到的问题是如何参考网页: http://www.w3.org / TR / xhtml1 / 在我的xslt文件,以便它的工作在其节点(此页面是用xhtml编写的,所以我不必担心将html转换为xml)。



EDIT2。经过进一步的研究,似乎Thomas W.的答案是解决问题,但你必须处理XSS问题(LarsH的答案提示)。

解决方案

理论上,您可以执行

 <?xml version =1.0encoding =UTF-8?> 
<?xml-stylesheet type =text / xslhref =test.xsl?>
< page href =http://www.w3.org/TR/xslt/index.htm/>

并且有一个样式表,例如

 <?xml version =1.0encoding =UTF-8?> 
< xsl:stylesheet version =1.0
xmlns:xsl =http://www.w3.org/1999/XSL/Transform
xmlns =http:// www.w3.org/1999/xhtml
xmlns:h =http://www.w3.org/1999/xhtml>

< xsl:template match =/>
< html>
< head>< / head>
< body>
< xsl:for-each select =document(* / @ href)// h:h2>
< xsl:copy-of select =。/>
< / xsl:for-each>
< / body>
< / html>
< / xsl:template>

< / xsl:stylesheet>

但是这并不适用于各种浏览器(仅Chrome,一个原因可能是阻止加载外部页面的XSS安全功能。


I'm trying to learn XSLT and I came across a problem. The thing I would like to do is to extract some data from a website, transform it with xslt templates and finally show it in my own xhtml page.

Lets say i have a xml file (this will be my xhtml site):

<!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Transitional//EN"
"http://www.w3.org/TR/xhtml1/DTD/xhtml1-transitional.dtd">
<?xml-stylesheet type="text/xsl" href="myXSLTFile.xsl"?>


<!--here I want to have markup produced by xslt file-->

The question is how to achieve this? I want my xslt file to work on nodes from a particular web site (for example http://www.example.com) and produce result into my own xml file.

If you find my explanation confusing please ask and I'll try to explain that problem a little better.

EDIT. I'll give an example. Lets say we have this page: http://www.w3.org/TR/xhtml1/. I want to Develop XSLT document extracting titles of chapters and sections from Full table of contents and putting them into a table in my own xml file. The thing I have problem with is how to reference page: http://www.w3.org/TR/xhtml1/ in my xslt file so that it works on its nodes (this page is written in xhtml so I don't have to worry about transforming html to xml).

EDIT2. After further research it seems as though Thomas W.'s answer is the solution to the problem, but you have to deal with XSS problems (tips in LarsH's answer).

解决方案

In theory, you can do something like

<?xml version="1.0" encoding="UTF-8"?>
<?xml-stylesheet type="text/xsl" href="test.xsl"?>
<page href="http://www.w3.org/TR/xslt/index.htm"/>

and have a stylesheet like

<?xml version="1.0" encoding="UTF-8"?>
<xsl:stylesheet version="1.0" 
    xmlns:xsl="http://www.w3.org/1999/XSL/Transform" 
    xmlns="http://www.w3.org/1999/xhtml"
    xmlns:h="http://www.w3.org/1999/xhtml">

  <xsl:template match="/">
    <html>
      <head></head>
      <body>
        <xsl:for-each select="document(*/@href)//h:h2">
          <xsl:copy-of select="."/>
        </xsl:for-each>
      </body>
    </html>
  </xsl:template>

</xsl:stylesheet>

But this doesn't really work across browsers (Chrome only, as it seems to me). One reason might be XSS security features that block loading the foreign page.

这篇关于使用XSLT从网站提取数据的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆