在Python中的令牌的所有字母映射功能 [英] Mapping a function over all the letters of a token in python

查看:168
本文介绍了在Python中的令牌的所有字母映射功能的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

该计划的目的是在令牌的数组来读取,删除标点符号,把所有的字母均为小写,然后打印结果数组。在readTokens和depunctuateTokens功能都正常工作。我的问题是与decapitalizeTokens功能。当我运行程序我收到此错误:

 程序的名称是words.py
['你好','hello1','hello2']
回溯(最近通话最后一个):
  文件words.py41行,上述<&模块GT;
    主要()
  文件words.py,10号线,在主
    字= decapitalizeTokens(cleanTokens)
  文件words.py35行,在decapitalizeTokens
    如果(ORD(CH)LT = ORD('Z')):
类型错误:ORD()预计长度为1的字符串,但列表中找到

我的问题是什么形式参数,我应该投入decapitalizeTokens功能,以返回从depunctuateTokens函数产生的数组,但与小写的所有字母。

这是我的程序:

 进口SYS
从扫描仪导入*
ARR = []
高清的main():
    打印(节目的名称是,sys.argv中[0])
    对于i在范围(1,LEN(sys.argv中),1):
        打印(说法,我是,sys.argv中[I])
    令牌= readTokens(的text.txt)
    cleanTokens = depunctuateTokens(ARR)
    字= decapitalizeTokens(cleanTokens)高清readTokens(S):
    S =扫描仪(的text.txt)
    令牌= s.readtoken()
    而(!标记=):
        arr.append(令牌)
        令牌= s.readtoken()
    S.CLOSE()
    ARR回报高清depunctuateTokens(ARR):
    结果= []
    因为我在范围(0,len个(ARR),1):
        字符串= ARR [I]
        清洗=
        标点符号=#$%&放大器;'()* +, - 。/:;!。?< = GT; @ [\\] ^ _`{|}〜
        对于i在范围(0,LEN(字符串),1):
            如果string [我]没有标点符号:
                清洁+ =字符串[我]
        result.append(清理)
    打印(结果)
    返回结果高清decapitalizeTokens(结果):
    如果(ORD(结果)< = ORD('Z')):
        返回CHR(ORD(结果)+ ORD('A') - (ORD('A')))
    其他:
        打印(结果)
        返回结果
主要()


解决方案

decapitalizeTokens 函数适用于单个字符。你传递给它的字符串列表。如果你想调用它在该列表中每个字符串的每个字符,你需要遍历所有的列表,然后在每个串回路的某处。

您可以用明确的循环语句做到这一点,是这样的:

 字= []
在令牌令牌:
    字=''
    在标记字符:
        字+ = decaptializeTokens(炭)
    词+ =字

...或使用COM prehensions:

 字= [''。加入(decapitalizeTokens(炭),用于标记字符)
         在cleanTokens令牌]


不过,我相信这会令远更有意义的循环移动到 decapitalizeTokens 函数都基于它的复数名称,并在事实上,你有完全相同在类似命名的 depunctuateTokens 功能相同的循环。如果你建立 decapitalizeTokens 你建立同样的方式 depunctuateTokens ,那么你的现有呼叫正常工作:

 字= decapitalizeTokens(cleanTokens)


作为一个方面说明,内置 对字符串的方法已经这样做你想要什么,所以你可以替换整个的混乱:

 字= [token.lower()在cleanTokens令牌]

...这也将解决在尝试一个讨厌的错误。考虑一下,比方说, decapitalizeTokens 会做以数字或空格。

,同样, depunctuateTokens 同样可以通过向的 翻译 方法。例如(对于Python 2.x的略有不同,但你可以阅读文档和看着办吧)

 标点符号=#$%&放大器;'()* +, - 。/:;!。?< => @ [\\] ^ _`{| }〜
punctmap = {ORD(字符):无,标点字符}
cleanTokens = [token.translate(punctmap),用于在cleanTokens令牌]

The purpose of this program is to read in an array of tokens, remove the punctuation, turn all the letters lower case, and then print the resulting array. the readTokens and depunctuateTokens functions both work correctly. My problem is with the decapitalizeTokens function. When I run the program I receive this error:

the name of the program is words.py
['hello', 'hello1', 'hello2']
Traceback (most recent call last):
  File "words.py", line 41, in <module>
    main()    
  File "words.py", line 10, in main
    words = decapitalizeTokens(cleanTokens)
  File "words.py", line 35, in decapitalizeTokens
    if (ord(ch) <= ord('Z')):
TypeError: ord() expected string of length 1, but list found

My question is what formal parameters I should put into the decapitalizeTokens function in order to return the array resulting from the depunctuateTokens function, but with all the letters lowercase.

This is my program:

import sys
from scanner import *
arr=[]
def main():
    print("the name of the program is",sys.argv[0])
    for i in range(1,len(sys.argv),1):
        print("   argument",i,"is", sys.argv[i])
    tokens = readTokens("text.txt")
    cleanTokens = depunctuateTokens(arr)
    words = decapitalizeTokens(cleanTokens)

def readTokens(s):
    s=Scanner("text.txt")
    token=s.readtoken()
    while (token != ""):
        arr.append(token)
        token=s.readtoken()
    s.close()
    return arr

def depunctuateTokens(arr):
    result=[]
    for i in range(0,len(arr),1):
        string=arr[i]
        cleaned=""
        punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
        for i in range(0,len(string),1):
            if string[i] not in punctuation:
                cleaned += string[i]
        result.append(cleaned)
    print(result)
    return result

def decapitalizeTokens(result):
    if (ord(result) <= ord('Z')):
        return chr(ord(result) + ord('a') - (ord('A')))
    else:
        print(result)
        return result


main()

解决方案

Your decapitalizeTokens function works on a single character. You're passing it a list of strings. If you want to call it on every character of every string in that list, you need to loop over the list, and then loop over each string, somewhere.

You can do this with explicit loop statements, like this:

words = []
for token in tokens:
    word = ''
    for char in token:
        word += decaptializeTokens(char)
    words += word

… or by using comprehensions:

words = [''.join(decapitalizeTokens(char) for char in token) 
         for token in cleanTokens]


However, I think it would make far more sense to move the loops into the decapitalizeTokens function—both based on its plural name, and on the fact that you have exactly the same loops in the similarly-named depunctuateTokens function. If you build decapitalizeTokens the same way you built depunctuateTokens, then your existing call works fine:

words = decapitalizeTokens(cleanTokens)


As a side note, the built-in lower method on strings already does what you want, so you could replace this whole mess with:

words = [token.lower() for token in cleanTokens]

… which would also fix a nasty bug in your attempt. Consider what, say, decapitalizeTokens would do with a digit or a space.

And, likewise, depunctuateTokens can be similarly replaced by a call to the translate method. For example (slightly different for Python 2.x, but you can read the docs and figure it out):

punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
punctmap = {ord(char): None for char in punctuation}
cleanTokens = [token.translate(punctmap) for token in cleanTokens]

这篇关于在Python中的令牌的所有字母映射功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆