如果一个数据框列中包含的匹配字符串与另一数据框列匹配,则取消该字符串 [英] Nullify the matched string contained in one Data frame column if that is a match with another Data frame column

查看:43
本文介绍了如果一个数据框列中包含的匹配字符串与另一数据框列匹配,则取消该字符串的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要编写一个脚本来读取 CSV 并删除出现在另一个单元格中的字符.即:

I need to do a script that read a CSV and delete the characters that appears in another cell. I.e:

示例

在第 4 行的calle"列中,出现cod_postal"列中出现的28011"我需要从calle"列中删除28011",但保持其余部分不变

In the line 4, in "calle" column, appear the '28011', that appear in column "cod_postal" I need to delete '28011' from "calle" column but keep the rest untouched

我尝试了一些简单的脚本并进行了研究,但无法达到我所需要的.

I tried some simple scripts and researching but I can't reach what I need.

是的,图像是一个例子,我有一个包含 2k 行的完整 CSV

Yeah, the image is a example, I have a full CSV with 2k lines

我试过这样的事情,但我无法让它工作..

I tried something like this but I can't get it to work..

#-*-coding: latin1 -*-
import csv
import pandas

with open ('C:/trabajos/dani_cliente.csv') as csvfile:
    readcsv = csv.reader (csvfile, delimiter = ';')
    for row in readcsv:
        df ['cod_postal'] = np.where(df["cod_postal"]) < threshold, 
0,alt_value)
        print (row)    

编辑 3:也试试这个,可以开始工作,但仅限于指定的字符,我需要 CSV 中的每个cod_postal"

EDIT 3: Trying also this, can get to work but only for specified character, and I would need every "cod_postal" in the CSV

#-*-coding: latin1 -*-

with open("C:/trabajos/extraccion_copia2.csv", 'r') as infile, \
     open("C:/trabajos/dani_cliente.test.csv", 'w') as outfile:


   # for row in infile
    #readcsv = csv.reader(infile, delimiter=';')
    data = infile.read()
    data = data.replace("28011", " ")
    outfile.write(data)

但是使用完整的 CSV 而不是示例 CSV,我收到以下错误

But using the full CSV instead of the sample one, I get the following error

回溯(最近一次调用最后一次):文件C:/Users/dalo​​nso/PycharmProjects/untitled/switchtest.py",第 18 行,在数据 = infile.read()文件C:\Users\dalonso\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py",第 23 行,解码返回 codecs.charmap_decode(input,self.errors,decoding_table)[0]UnicodeDecodeError: 'charmap' 编解码器无法解码位置 577860 中的字节 0x90:字符映射到未定义

Traceback (most recent call last): File "C:/Users/dalonso/PycharmProjects/untitled/switchtest.py", line 18, in data = infile.read() File "C:\Users\dalonso\AppData\Local\Programs\Python\Python37\lib\encodings\cp1252.py", line 23, in decode return codecs.charmap_decode(input,self.errors,decoding_table)[0] UnicodeDecodeError: 'charmap' codec can't decode byte 0x90 in position 577860: character maps to undefined

推荐答案

我想我明白这个问题了...如果它只是一个值,你可以简单地使用

I think I understand the question... If it's just one value, you can simply use

df.loc[4,'cod_postal'] = 0
#if you want, can use NaN, but suggest just keeping 0. 

df['cod_postal].iloc[4] = 0

如果有一些特定的指南,请使用 np.where() 或 pd.where()

If there's some specific guideline, use np.where() or pd.where()

https://pandas.pydata.org/pandas-docs/stable/reference/api/pandas.DataFrame.where.html

np.where(condition, true_val, false_val) 
np.where(condition, true_val) # or if you want untouched in else condition

df['cod_postal'] = np.where(df["cod_postal"] < threshold, 0, alt_value)

下次您提问时,请在您的问题中输入数据框/您的代码

next time you ask, please enter the dataframe/your code in your question

这篇关于如果一个数据框列中包含的匹配字符串与另一数据框列匹配,则取消该字符串的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
相关文章
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆