在 pandas 中创建年份列 [英] Creating a year column in Pandas

查看:43
本文介绍了在 pandas 中创建年份列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在尝试创建一个年份列,并将年份取自数据框中的标题列.该代码有效,但是列dtype是object.例如,在第1行中,年份显示为[2013].

I'm trying to create a year column with the year taken from the title column in my dataframe. This code works, but the column dtype is object. For example, in row 1 the year displays as [2013].

我该怎么做,但是将dtype列更改为float?

How can i do this, but change the column dtype to a float?

year_list = []

for i in range(title_length):
    year = re.findall('\d{4}', wine['title'][i])
    year_list.append(year)

wine['year'] = year_list

这是我数据框的开头:

country   designation     points    province               title             year
Italy     Vulkà Bianco     87        Sicily     Nicosia 2013 Vulkà Bianco   [2013]

推荐答案

您可以使用

Instead of re.findall that returns a list of strings, you may use str.extract():

wine['year'] = wine['title'].str.extract(r'\b(\d{4})\b')

或者,如果您只想匹配1900-2000s年:

Or, in case you want to only match 1900-2000s years:

wine['year'] = wine['title'].str.extract(r'\b((?:19|20)\d{2})\b')

请注意,str.extract中的模式必须包含至少1个捕获组,其值将用于填充新列.仅考虑第一个匹配项,因此,如果需要,您可能需要稍后调整上下文.

Note that the pattern in str.extract must contain at least 1 capturing group, its value will be used to populate the new column. The first match will only be considered, so you might have to precise the context later if need be.

我建议在\d{4}模式周围使用单词边界\b来将4位数字块匹配为整个单词,并避免像1234567890这样的字符串中出现部分匹配.

I suggest using word boundaries \b around the \d{4} pattern to match 4-digit chunks as whole words and avoid partial matches in strings like 1234567890.

这篇关于在 pandas 中创建年份列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆