在Python中的令牌的所有字母映射功能 [英] Mapping a function over all the letters of a token in python
问题描述
该计划的目的是在令牌的数组来读取,删除标点符号,把所有的字母均为小写,然后打印结果数组。在readTokens和depunctuateTokens功能都正常工作。我的问题是与decapitalizeTokens功能。当我运行程序我收到此错误:
程序的名称是words.py
['你好','hello1','hello2']
回溯(最近通话最后一个):
文件words.py41行,上述<&模块GT;
主要()
文件words.py,10号线,在主
字= decapitalizeTokens(cleanTokens)
文件words.py35行,在decapitalizeTokens
如果(ORD(CH)LT = ORD('Z')):
类型错误:ORD()预计长度为1的字符串,但列表中找到
我的问题是什么形式参数,我应该投入decapitalizeTokens功能,以返回从depunctuateTokens函数产生的数组,但与小写的所有字母。
这是我的程序:
进口SYS
从扫描仪导入*
ARR = []
高清的main():
打印(节目的名称是,sys.argv中[0])
对于i在范围(1,LEN(sys.argv中),1):
打印(说法,我是,sys.argv中[I])
令牌= readTokens(的text.txt)
cleanTokens = depunctuateTokens(ARR)
字= decapitalizeTokens(cleanTokens)高清readTokens(S):
S =扫描仪(的text.txt)
令牌= s.readtoken()
而(!标记=):
arr.append(令牌)
令牌= s.readtoken()
S.CLOSE()
ARR回报高清depunctuateTokens(ARR):
结果= []
因为我在范围(0,len个(ARR),1):
字符串= ARR [I]
清洗=
标点符号=#$%&放大器;'()* +, - 。/:;!。?< = GT; @ [\\] ^ _`{|}〜
对于i在范围(0,LEN(字符串),1):
如果string [我]没有标点符号:
清洁+ =字符串[我]
result.append(清理)
打印(结果)
返回结果高清decapitalizeTokens(结果):
如果(ORD(结果)< = ORD('Z')):
返回CHR(ORD(结果)+ ORD('A') - (ORD('A')))
其他:
打印(结果)
返回结果
主要()
您 decapitalizeTokens
函数适用于单个字符。你传递给它的字符串列表。如果你想调用它在该列表中每个字符串的每个字符,你需要遍历所有的列表,然后在每个串回路的某处。
您可以用明确的循环语句做到这一点,是这样的:
字= []
在令牌令牌:
字=''
在标记字符:
字+ = decaptializeTokens(炭)
词+ =字
...或使用COM prehensions:
字= [''。加入(decapitalizeTokens(炭),用于标记字符)
在cleanTokens令牌]
不过,我相信这会令远更有意义的循环移动到 decapitalizeTokens
函数都基于它的复数名称,并在事实上,你有完全相同在类似命名的 depunctuateTokens
功能相同的循环。如果你建立 decapitalizeTokens
你建立同样的方式 depunctuateTokens
,那么你的现有呼叫正常工作:
字= decapitalizeTokens(cleanTokens)
作为一个方面说明,内置 低
对字符串的方法已经这样做你想要什么,所以你可以替换整个的混乱:
字= [token.lower()在cleanTokens令牌]
...这也将解决在尝试一个讨厌的错误。考虑一下,比方说, decapitalizeTokens
会做以数字或空格。
,同样, depunctuateTokens
同样可以通过向的 翻译
方法。例如(对于Python 2.x的略有不同,但你可以阅读文档和看着办吧)
标点符号=#$%&放大器;'()* +, - 。/:;!。?< => @ [\\] ^ _`{| }〜
punctmap = {ORD(字符):无,标点字符}
cleanTokens = [token.translate(punctmap),用于在cleanTokens令牌]
The purpose of this program is to read in an array of tokens, remove the punctuation, turn all the letters lower case, and then print the resulting array. the readTokens and depunctuateTokens functions both work correctly. My problem is with the decapitalizeTokens function. When I run the program I receive this error:
the name of the program is words.py
['hello', 'hello1', 'hello2']
Traceback (most recent call last):
File "words.py", line 41, in <module>
main()
File "words.py", line 10, in main
words = decapitalizeTokens(cleanTokens)
File "words.py", line 35, in decapitalizeTokens
if (ord(ch) <= ord('Z')):
TypeError: ord() expected string of length 1, but list found
My question is what formal parameters I should put into the decapitalizeTokens function in order to return the array resulting from the depunctuateTokens function, but with all the letters lowercase.
This is my program:
import sys
from scanner import *
arr=[]
def main():
print("the name of the program is",sys.argv[0])
for i in range(1,len(sys.argv),1):
print(" argument",i,"is", sys.argv[i])
tokens = readTokens("text.txt")
cleanTokens = depunctuateTokens(arr)
words = decapitalizeTokens(cleanTokens)
def readTokens(s):
s=Scanner("text.txt")
token=s.readtoken()
while (token != ""):
arr.append(token)
token=s.readtoken()
s.close()
return arr
def depunctuateTokens(arr):
result=[]
for i in range(0,len(arr),1):
string=arr[i]
cleaned=""
punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
for i in range(0,len(string),1):
if string[i] not in punctuation:
cleaned += string[i]
result.append(cleaned)
print(result)
return result
def decapitalizeTokens(result):
if (ord(result) <= ord('Z')):
return chr(ord(result) + ord('a') - (ord('A')))
else:
print(result)
return result
main()
Your decapitalizeTokens
function works on a single character. You're passing it a list of strings. If you want to call it on every character of every string in that list, you need to loop over the list, and then loop over each string, somewhere.
You can do this with explicit loop statements, like this:
words = []
for token in tokens:
word = ''
for char in token:
word += decaptializeTokens(char)
words += word
… or by using comprehensions:
words = [''.join(decapitalizeTokens(char) for char in token)
for token in cleanTokens]
However, I think it would make far more sense to move the loops into the decapitalizeTokens
function—both based on its plural name, and on the fact that you have exactly the same loops in the similarly-named depunctuateTokens
function. If you build decapitalizeTokens
the same way you built depunctuateTokens
, then your existing call works fine:
words = decapitalizeTokens(cleanTokens)
As a side note, the built-in lower
method on strings already does what you want, so you could replace this whole mess with:
words = [token.lower() for token in cleanTokens]
… which would also fix a nasty bug in your attempt. Consider what, say, decapitalizeTokens
would do with a digit or a space.
And, likewise, depunctuateTokens
can be similarly replaced by a call to the translate
method. For example (slightly different for Python 2.x, but you can read the docs and figure it out):
punctuation="""!"#$%&'()*+,-./:;<=>?@[\]^_`{|}~"""
punctmap = {ord(char): None for char in punctuation}
cleanTokens = [token.translate(punctmap) for token in cleanTokens]
这篇关于在Python中的令牌的所有字母映射功能的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!