XSLT 中的字频计数器 [英] Word Frequency Counter in XSLT

查看:23
本文介绍了XSLT 中的字频计数器的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试在 XSLT 中制作一个词频计数器.我希望它使用停用词.我开始使用 Michael Kay 的书.但是我很难让停用词起作用.

I am trying to make a word frequency counter in XSLT. I want it to use stop words. I got started with Michael Kay's book. But I have trouble getting the stop words to work.

此代码适用于任何源 XML 文件.

This code will work on any source XML file.

<?xml version="1.0" encoding="iso-8859-1"?>
<xsl:stylesheet
   version="2.0"
   xmlns:xsl="http://www.w3.org/1999/XSL/Transform">

<xsl:output method="xml" indent="yes"/>

<xsl:template match="/">   
    <xsl:variable name="stopwords" select="'a about an are as at be by for from how I in is it of on or that the this to was what when where who will with'"/>
     <wordcount>
        <xsl:for-each-group group-by="." select="
            for $w in //text()/tokenize(., '\W+')[not(.=$stopwords)] return $w">
            <word word="{current-grouping-key()}" frequency="{count(current-group())}"/>
        </xsl:for-each-group>
     </wordcount>
</xsl:template>

</xsl:stylesheet>

我认为 not(.=$stopwords) 是我的问题所在.但我不知道该怎么办.

I think the not(.=$stopwords) is where my problem is. But I'm not sure what to do about it.

此外,我还会提示如何从外部文件加载停用词.

Also I'll take hints on how to load the stop words from a external file.

推荐答案

你的 $stopwords 变量现在是一个字符串;你希望它是一个字符串序列.您可以通过以下任一方式执行此操作:

Your $stopwords variable is now a single string; you want it to be a sequence of strings. You can do this in any of the following ways:

  • 将其声明改为

  • Change its declaration to

<xsl:variable name="stopwords" 
  select="('a', 'about', 'an', 'are', 'as', 'at', 
           'be', 'by', 'for', 'from', 'how', 
           'I', 'in', 'is', 'it', 
           'of', 'on', 'or', 
           'that', 'the', 'this', 'to', 
           'was', 'what', 'when', 'where', 
           'who', 'will', 'with')"/>

  • 将其声明改为

  • Change its declaration to

    <xsl:variable name="stopwords" 
      select="tokenize('a about an are as at 
                        be by for from how I in is it 
                        of on or that the this to was 
                        what when where who will with',
                        '\s+')"/>
    

  • 从名为(例如)stoplist.xml 的外部 XML 文档中读取它,格式为

  • Read it from an external XML document named (e.g.) stoplist.xml, of the form

    <stop-list>
      <p>This is a sample stop list [further description ...]</p>
      <w>a</w>
      <w>about</w>
      ...
    </stop-list>
    

    然后加载它,例如与

    <xsl:variable name="stopwords"
      select="document('stopwords.xml')//w/string()"/>
    

  • 这篇关于XSLT 中的字频计数器的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

    查看全文
    登录 关闭
    扫码关注1秒登录
    发送“验证码”获取 | 15天全站免登陆