pandas 仅在条件为true的情况下才替换数据框中的列的值 [英] Pandas replace the value of a column in dataframe only where if condition is true

查看:74
本文介绍了 pandas 仅在条件为true的情况下才替换数据框中的列的值的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在替换熊猫数据框中的值时遇到问题.

I have a problem with replacing values in a pandas dataframe.

我想在数据框的列("URL")中搜索(如果它包含多个字符串).

I want to search in a column ('URL') of a dataframe if it contains several string.

如果这是真的,我想替换数据框中但在SAME LINE上另一列的值. 如果在"URL"列的URL中找到一个字符串,我想将该字符串写在"Model"列的同一行上,并在"Brand"列中写"Samsung"

If this is true, I want to replace the value of another column in the dataframe but on the SAME LINE. If a string is found in a url from the 'URL' column I want to write that string on the same line in the column 'Model' and to write 'Samsung' for example on the column 'Brand'

目前,当contains的if条件为true时,它将替换我在其他列上的所有值,而我不希望那样.

For the moment, when an if condition for contains is true, it replaces all my values on the other columns, and I don't want that.

Python代码:

import pandas as pd

dataframe_initial = pd.DataFrame()
dataframe_initial = pd.read_excel('tele2.xlsx')
dataframe_initial['Model'] = ""
dataframe_initial['Brand'] = ""

str1 = 'galaxy-S9'
str2 = 'note-9'
str3 = 'galaxy-a6'
str4 = 'Huawei'
str5 = 'P20'
str6 = 'Apple'
str7 = 'Iphone-X'

for url in dataframe_initial['URL']:
    if str1 in url:
        dataframe_initial['Model'] = str(str1)
        dataframe_initial['Brand'] = str('Samsung')
    if str3 in url:
        dataframe_initial['Model'] = str(str3)
        dataframe_initial['Brand'] = str('Samsung')
    if str2 in url:
        dataframe_initial['Model'] = str(str2)
        dataframe_initial['Brand'] = str('Samsung')

推荐答案

首先,您应该避免创建数量可变的变量.您可以改用list:

First you should avoid creating a variable number of variables. You can use list instead:

values = ['galaxy-S9', 'note-9', 'galaxy-a6', 'Huawei', 'P20', 'Apple', 'Iphone-X']

接下来,您要迭代行,同时这样做每次迭代一行时都会更新整个系列.这是效率低下的 和不正确的.一个更好的主意是迭代值列表并使用Pandas布尔索引:

Next you are iterating rows and, while doing so, updating an entire series each time you iterate a row. This is inefficient and incorrect. A better idea is to iterate your list of values and use Pandas Boolean indexing:

for value in values:
    mask = df['URL'].str.contains(value, regex=False)
    df.loc[mask, 'Model'] = value
    df.loc[mask, 'Brand'] = 'Samsung'

请注意,您不需要在已经是字符串的对象上调用str.

Note you don't need to call str on objects which are already strings.

这篇关于 pandas 仅在条件为true的情况下才替换数据框中的列的值的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆