从单词表中查找字符串和替换 [英] Find strings and subtring from the wordlist

查看:81
本文介绍了从单词表中查找字符串和替换的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我有test.txt文件,从单词表中查找字符串和子字符串

i have test.txt file, Find strings and subtring from the wordlist

<aardwolf>
<Aargau>
<Aaronic>
<aac>
<akac>
<abaca>
<abactinal>
<abacus>  

test.py文件

import sys  # the sys module
import os
import re
def hasattr(str,list):
    expr = re.compile(str)
    # yield the elements
    return [elem for elem in list if expr.match(elem)]

isword = {}
FH = open(sys.argv[1],'r',encoding="ISO-8859-1")
for strLine in FH.readlines():  isword.setdefault(''.join(sorted(strLine[1:strLine.find('>')].upper())),[]).append(strLine[:-1])
print (isword)
basestring=str()
for ARGV in sys.argv[2:]:
    print ("\n*** %s\n" %ARGV )#print Argv

diffpatletters = re.compile(u'[a-zA-Z]').findall(ARGV.upper())
#print (diffpatletters)
diffpat = '.*' + '(.*)'.join(sorted(diffpatletters)) + '.*'
#print (diffpat)
for KEY in hasattr(diffpat,isword.keys()):
#       print (KEY)
       SUBKEY = KEY
       for X in diffpatletters:
         #print (X)
         SUBKEY1 = SUBKEY.replace(X,'')
          #print (SUBKEY)
       if SUBKEY1 in isword:
           #print (SUBKEY)
           basestring+=  "%s -> %s" %(isword[KEY], isword[SUBKEY1])
print (basestring + "\n")

下面是在命令行中运行文件

Below is to run the file in command line

python test.py test.txt  aack aadfl

预计将在第二个参数之后找到匹配的字符串和子字符串.My basestring not printing

Expected out is find the matched the string and sub-string of each after second argument.My basestring not printing

推荐答案

您必须使用regexp吗? 如果没关系,您想要这样的结果吗?

have you had to use regexp? if it doesn't matter, do you want results like this?

with open('test.txt', 'r')as f:
    s = f.read()
s = s.split('\n')
s

Out[1]:
['<aardwolf>',
 '<Aargau>',
 '<Aaronic>',
 '<aac>',
 '<akac>',
 '<abaca>',
 '<abactinal>',
 '<abacus>  ']

对于列表类型的结果:

ARGVs = ['aard', 'onic', 'abacu']

matches = [x for x in s for arg in ARGVs if arg.lower() in x.lower()]
print(matches)

Out[2]:
['<aardwolf>', '<Aaronic>', '<abacus>  ']

对于字典类型的结果

ARGVs = ['aard', 'onic', 'abacu', 'aaro', 'ac']

{key:[x for x in s if key in x] for key in ARGVs if len([x for x in s if key in x]) != 0}

Out[3]:

{'aard': ['<aardwolf>'],
 'onic': ['<Aaronic>'],
 'abacu': ['<abacus>  '],
 'ac': ['<aac>', '<akac>', '<abaca>', '<abactinal>', '<abacus>  ']}

使用RegExp

import re

with open('test.txt', 'r')as f:
    s = f.read()

ARGVs = ['wol','ac']
cond = '|'.join([f'\w*{patt}\w*' for patt in ARGVs])
re.findall(cond,s)  

Out[4]:
['aardwolf', 'aac', 'akac', 'abaca', 'abactinal', 'abacus']

这篇关于从单词表中查找字符串和替换的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆