如何提取特定符号前的常用词并找到特定词 [英] How to extract the common words before particular symbol and find particular word

查看:27
本文介绍了如何提取特定符号前的常用词并找到特定词的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

如果我有字典:

mydict = {"g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt" : 0,
          "g18_84pp_2A_MVP2_GoodiesT0-HKJ-DFG_MIX-CMVP2_Y1000-MIX.txt" : 1,
          "g18_84pp_2A_MVP3_GoodiesT0-HKJ-DFG_MIX-CMVP3_Y1000-MIX.txt" : 2,
          "g18_84pp_2A_MVP4_GoodiesT0-HKJ-DFG_MIX-CMVP4_Y1000-MIX.txt" : 3,
          "g18_84pp_2A_MVP5_GoodiesT0-HKJ-DFG_MIX-CMVP5_Y1000-MIX.txt" : 4,
          "g18_84pp_2A_MVP6_GoodiesT0-HKJ-DFG_MIX-CMVP6_Y1000-MIX.txt" : 5,
          "h18_84pp_3A_MVP1_GoodiesT1-HKJ-DFG-CMVP1_Y1000-FIX.txt" : 6,
          "g18_84pp_2A_MVP7_GoodiesT0-HKJ-DFG_MIX-CMVP7_Y1000-MIX.txt" : 7,
          "h18_84pp_3A_MVP2_GoodiesT1-HKJ-DFG-CMVP2_Y1000-FIX.txt" : 8,
          "h18_84pp_3A_MVP3_GoodiesT1-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 9,
          "p18_84pp_2B_MVP1_GoodiesT2-HKJ-DFG-CMVP3_Y1000-FIX.txt" : 10}

  1. 我想提取第一个 - 之前的公共部分 g18_84pp_2A_MVP_GoodiesT0.

我还想在第一组中找到特定单词 MIX 时添加一个 _MIX 以跟随 g18_84pp_2A_MVP_GoodiesT0 .假设我可以根据 myDict 中是 MIX 还是 FIX 来分类两组,然后是最终的输出字典:

also I want add a _MIX to follow g18_84pp_2A_MVP_GoodiesT0 when finding the particular word MIX in first group . Assume that I am able to classify two groups depending on whether is MIX or FIX in myDict, then the final Output dictionary:

OutputNameDict= {"g18_84pp_2A_MVP_GoodiesT0_MIX" : 0,
                  "h18_84pp_3A_MVP_GoodiesT1_FIX" : 1,
                  "p18_84pp_2B_MVP_FIX": 2}

有什么函数可以用来查找公共部分吗?如何在-等特定符号之前或之后提取单词并找到MIXFIX等特定单词?

Is there any function I could use to find common part? How pick up the word before or after particular symbol like - and find particular words like MIX or FIX?

推荐答案

可以使用 split 获取公共部分:

You can use split to get the common part:

s = "g18_84pp_2A_MVP1_GoodiesT0-HKJ-DFG_MIX-CMVP1_Y1000-MIX.txt"
n = s.split('-')[0]

事实上,split 会给你一个由 '-' 分隔的每个标记的列表,所以 s.split('-') 收益:

In fact, split will give you a list of each token delimited by '-', so s.split('-') yields:

['g18_84pp_2A_MVP1_GoodiesT0', 'HKJ', 'DFG_MIX', 'CMVP1_Y1000', 'MIX.txt']

要查看MIXFIX是否在字符串中,可以使用in:

To see if MIX or FIX is in a string, you can use in:

if 'MIX' in s:
    print "then MIX is in the string s"

如果要去掉'MVP'后面的数字,可以使用re模块:

If you want to get rid if the numbers after 'MVP', you can use re module:

import re
s = 'g18_84pp_2A_MVP1_GoodiesT0'
s = re.sub('MVP[0-9]*','MVP',s)

这是一个获取公共部分列表的示例函数:

Here is a sample function to get a list of the common parts:

def foo(mydict):
    return [re.sub('MVP[0-9]*', 'MVP', k.split('-')[0]) for k in mydict]

这篇关于如何提取特定符号前的常用词并找到特定词的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆