pandas :使功能图部分与Dict匹配 [英] Pandas: Make function map partial Dict match

查看:52
本文介绍了 pandas :使功能图部分与Dict匹配的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

此函数查看熊猫DataFrame中的字符串.如果该字符串包含与字典中的条目匹配的正则表达式,则它将捕获的字符串传递给函数的其他部分,最后返回statement.

This function looks at strings in a pandas DataFrame. If the string contains a regular expression matching an entry in the dictionary, it passes on the captured string to other parts of the function and finally returns statement.

def f(value):
    f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
    match = f1(value)
    #Do stuff
    return statement

问题:

如何使它接受 partial 匹配项,并替换匹配的单词,同时保持字符串的其余部分不变?目前,它只接受文字匹配.

How can I make it accept partial matches, and replace the matching word, while keeping the rest of the string intact? Right now it only accepts literal matches.

目标:

字符串是"BULL GOOGLE X3 VON".我希望字典中的{"GOOG":足以将单词转换为:"Google"}.转换后的字符串将为"BULL Google X3 VON",该函数将通过"Google"传递.

The string is "BULL GOOGLE X3 VON". I would like {"GOOG": in the dictionary to be sufficient to transform the word to :"Google"}. The transformed string would be "BULL Google X3 VON", and the function passes on "Google".

注意:我想继续使用dict进行实现,因为程序的其他部分依赖于它.

Note: I want to continue using dict for the implementation because other parts of the program depends on it.

代码:

#DataFrame
df = pd.DataFrame(["BULL GOOGLE X3 VON", "BEAR TWITTER 12X S"], columns=["Name"])

#Dict
google = {"GOOG":"Google"}
twitter = {"TWITT":"Twitter"}
dictionary = goog.copy()
dictionary.update(twitter)

#Regex
regex = re.compile(r"\s(\S+)\s", flags=re.IGNORECASE)

#Function
def f(value):
    f1 = lambda x: dictionary[regex.findall(x)[0]] if regex.findall(x)[0] in dictionary else ""
    match = f1(value)
    #Do stuff
    return statement

#Map Function
df["Statement"] = df["Name"].map(lambda x:f(x))

想法:

如果可以直接修改该函数以接受部分匹配,那就很好了.

If it's possible to modify the function directly to accept partial matches, that would be good.

否则,一种解决方案可能是首先replace匹配字符串中的单词–保持字符串的其余部分不变–然后将正则表达式子字符串与字典进行匹配.这些步骤可能会在临时列中发生,以使列"Name"仍保持其原始状态以备将来使用.

Otherwise, a solution might be to first replace the matching word in the string – keeping the rest of the string intact – and then match the regex substring with the dictionary. These steps could happen in a temporary column so that the column "Name" is still in its original state for future use.

推荐答案

我认为这可能是您想要的.

I think this might be what you are looking for.

df = pd.DataFrame(["BULL GOOGLE X3 VON", "BEAR TWITTER 12X S"], columns ["Name"])

#Dict
google = {"GOOG":"Google"}
twitter = {"TWITT":"Twitter"}
dictionary = google.copy()
dictionary.update(twitter)

#Regex
regex = re.compile(r"\b((%s)\S*)\b" %"|".join(dictionary.keys()), re.I)

def dictionary_lookup(match):
    return dictionary[match.group(2)]

#Function
def f(value):
    match = dictionary[regex.search(value).group(2)]
    #Do stuff
    statement = regex.sub(dictionary_lookup, value)
    return statement

#Map Function
df["Statement"] = df["Name"].map(lambda x:f(x))

这将匹配以字典中的键之一开头的任何单词,将字典中的匹配值分配给变量match,然后返回替换了匹配单词的原始字符串.

This will match any word that starts with one of the keys in the dictionary, assign the value of the match from the dictionary to the variable match and then return the original string with the matched word replaced.

这篇关于 pandas :使功能图部分与Dict匹配的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆