带条件添加新列 [英] Adding new column with condition

查看:77
本文介绍了带条件添加新列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我需要通过添加更多列来管理数据框。
我的数据标题示例是

I would need to manage a data frame by adding more columns. My sample of data headers is

`Date` `Sentence` 
28 Jan      who.c   
30 Jan      house.a
02 Feb      eurolet.it

我需要添加另一列 Tp ,该列为每个链接分配一个值:

I would need to add another column, Tp, that for each link assigns a value:


  • if句子以 a 结尾,然后分配 apartment ;如果以 b 结尾,则分配平房,依此类推,如原始;

  • 如果句子以 UK 结尾,则指定 United Kingdom ;如果以 IT 结尾,则分配 Italy ,依此类推。值来自国家
    我会期望像这样:

  • if a sentence ends with a then assign apartment; if it ends with b then assign bungalow and so on as shown in original;
  • if a sentence ends with UK then assign United Kingdom; if it ends with IT then assign Italy, and so on. Values are from country. I would expect something like this:

日期句子Tp
1月28日who.c教堂
1月30日house.a apartment
2月2日eurolet.it。意大利

Date Sentence Tp 28 Jan who.c church 30 Jan house.a apartment 02 Feb eurolet.it. Italy

我写了以下内容:

conditions = [df['Sentence'].str.endswith(original), df['Sentence'].str.endswith(country)]
choices = [original, country]
# df['Tp'] = df.apply(lambda row: urlparse(row['Sentence']).netloc, axis = 1)
df['Tp'] = np.select(conditions, choices, default ='Unknown')
print(df)

其中

original= [('a', 'apartment'), ('b', 'bungalow'), ('c', 'church')]

country = [('UK', 'United Kingdom'), ('IT', 'Italy'), ('DE', 'Germany'), ('H', 'Holland'), ..., ('F', 'France'), ('S', 'Spain')]

国家包含50多个元素。

您能告诉我如何解决吗?该列应添加到数据框中,然后添加到csv文件中。

Could you tell me how to fix it? The column should be added in the data frame, then to a csv file.

谢谢

更新:

                      Sentences  \
    0                                      
    1                       who.c  
    2                  citta.me.it   
    3                    office.of
    4                   eurolet.eu   
    ..                               ...   
    995                    uilpa.ie   
    996                      fog.de

Original and country are from

list_country=np.array(country).tolist()
list_country_name=np.array(country_name).tolist()
flat_name_country = [item for sublist in list_country for item in sublist]
flat_country_name = [item for sublist in list_country_name for item in sublist] 

zip_domains=list(zip(flat_name_country, flat_country_name))


推荐答案

首先,让您从元组中创建一些词典并将其组合起来

First, lets make some dictionaries from your tuples and combine them

country = {k.lower() : v for (k,v) in country}
og = {k : v for (k,v) in original}
country.update(og)

print(country)

{'uk': 'United Kingdom',
 'it': 'Italy',
 'de': 'Germany',
 'h': 'Holland',
 'f': 'France',
 's': 'Spain',
 'a': 'apartment',
 'b': 'bungalow',
 'c': 'church'}

然后分割并获取max元素-这允许忽略文本中的所有句号,仅查看最后一个元素。最后,我们使用 .map 关联您的值。

then lets split and get the max element - this allows for any full stops in your text to be ignored, only looking at the final element. finally, we use .map to associate your values.

df['value'] = df["Sentence"].str.split(".", expand=True).stack().reset_index(1).query(
    "level_1 == level_1.max()"
)[0].map(country)

print(df)

     Date    Sentence      value
0  28 Jan       who.c     church
1  30 Jan     house.a  apartment
2  02 Feb  eurolet.it      Italy

这篇关于带条件添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆