带条件添加新列 [英] Adding new column with condition
问题描述
我需要通过添加更多列来管理数据框。
我的数据标题示例是
I would need to manage a data frame by adding more columns. My sample of data headers is
`Date` `Sentence`
28 Jan who.c
30 Jan house.a
02 Feb eurolet.it
我需要添加另一列 Tp
,该列为每个链接分配一个值:
I would need to add another column, Tp
, that for each link assigns a value:
- if句子以
a
结尾,然后分配apartment
;如果以b
结尾,则分配平房
,依此类推,如原始
; -
如果句子以
UK
结尾,则指定United Kingdom
;如果以IT
结尾,则分配Italy
,依此类推。值来自国家
。
我会期望像这样:
- if a sentence ends with
a
then assignapartment
; if it ends withb
then assignbungalow
and so on as shown inoriginal
; if a sentence ends with
UK
then assignUnited Kingdom
; if it ends withIT
then assignItaly
, and so on. Values are fromcountry
. I would expect something like this:
日期句子Tp
1月28日who.c教堂
1月30日house.a apartment
2月2日eurolet.it。意大利
Date Sentence Tp 28 Jan who.c church 30 Jan house.a apartment 02 Feb eurolet.it. Italy
我写了以下内容:
conditions = [df['Sentence'].str.endswith(original), df['Sentence'].str.endswith(country)]
choices = [original, country]
# df['Tp'] = df.apply(lambda row: urlparse(row['Sentence']).netloc, axis = 1)
df['Tp'] = np.select(conditions, choices, default ='Unknown')
print(df)
其中
original= [('a', 'apartment'), ('b', 'bungalow'), ('c', 'church')]
和
country = [('UK', 'United Kingdom'), ('IT', 'Italy'), ('DE', 'Germany'), ('H', 'Holland'), ..., ('F', 'France'), ('S', 'Spain')]
国家
包含50多个元素。
您能告诉我如何解决吗?该列应添加到数据框中,然后添加到csv文件中。
Could you tell me how to fix it? The column should be added in the data frame, then to a csv file.
谢谢
更新:
Sentences \
0
1 who.c
2 citta.me.it
3 office.of
4 eurolet.eu
.. ...
995 uilpa.ie
996 fog.de
Original and country are from
list_country=np.array(country).tolist()
list_country_name=np.array(country_name).tolist()
flat_name_country = [item for sublist in list_country for item in sublist]
flat_country_name = [item for sublist in list_country_name for item in sublist]
zip_domains=list(zip(flat_name_country, flat_country_name))
推荐答案
首先,让您从元组中创建一些词典并将其组合起来
First, lets make some dictionaries from your tuples and combine them
country = {k.lower() : v for (k,v) in country}
og = {k : v for (k,v) in original}
country.update(og)
print(country)
{'uk': 'United Kingdom',
'it': 'Italy',
'de': 'Germany',
'h': 'Holland',
'f': 'France',
's': 'Spain',
'a': 'apartment',
'b': 'bungalow',
'c': 'church'}
然后分割并获取max元素-这允许忽略文本中的所有句号,仅查看最后一个元素。最后,我们使用 .map
关联您的值。
then lets split and get the max element - this allows for any full stops in your text to be ignored, only looking at the final element. finally, we use .map
to associate your values.
df['value'] = df["Sentence"].str.split(".", expand=True).stack().reset_index(1).query(
"level_1 == level_1.max()"
)[0].map(country)
print(df)
Date Sentence value
0 28 Jan who.c church
1 30 Jan house.a apartment
2 02 Feb eurolet.it Italy
这篇关于带条件添加新列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!