根据特定关键字从CSV文件中提取行 [英] extracting rows from CSV file based on specific keywords

查看:879
本文介绍了根据特定关键字从CSV文件中提取行的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我创建了一个代码来帮助我从csv文件中检索数据

I have created a code to help me retrieving the data from csv file

  import re
keywords = {"metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
            "electronic", "workers"}  # all your keywords


keyre=re.compile("energy",re.IGNORECASE)
with open("2006-data-8-8-2016.csv") as infile:
    with open("new_data.csv", "w") as outfile:
        outfile.write(infile.readline())  # Save the header
        for line in infile:
            if len(keyre.findall(line))>0:
                outfile.write(line)

我需要它在"position"和"Job description"两个主要列中查找每个关键字,然后将包含这些单词的整个行写到新文件中.关于如何以最简单的方式完成操作的任何想法?

I need it to look for each keyword in two main columns which are "position" and "Job description" , and then take the whole row that includes these words and write them in the new file. Any ideas on how this can be done in the simplest way?

推荐答案

如果您要从关键字列表中查找仅包含一个单词的行,则可以使用pandas进行以下操作:

You can do this using pandas as follows, if you are looking for rows that contain exactly one word from the list of keywords:

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
            "electronic", "workers"]

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns
df = df[df["position"].isin(keywords) | df["Job description"].isin(keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False) 

如果要在行中查找子字符串(例如,在financial engineering中查找financial),则可以执行以下操作:

If you are looking for substrings in the rows (e.g looking financial in financial engineering) then you can do the following:

keywords = ["metal", "energy", "team", "sheet", "solar" "financial", "transportation", "electrical", "scientists",
            "electronic", "workers"]
searched_keywords = '|'.join(keywords)

# read the csv data into a dataframe 
# change "," to the data separator in your csv file 
df = pd.read_csv("2006-data-8-8-2016.csv", sep=",")
# filter the data: keep only the rows that contain one of the keywords 
# in the position or the Job description columns
df = df[df["position"].str.contains(searched_keywords) | df["Job description"].str.contains(searched_keywords)] 
# write the data back to a csv file 
df.to_csv("new_data.csv",sep=",", index=False) 

这篇关于根据特定关键字从CSV文件中提取行的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆