在Tensorflow中创建许多功能列 [英] Creating many feature columns in Tensorflow

查看:110
本文介绍了在Tensorflow中创建许多功能列的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我正在开始Tensorflow项目,并且正在定义和创建特征列.但是,我有成百上千的功能-这是一个非常广泛的数据集.即使经过预处理和洗涤,我也有很多列.

I'm getting started on a Tensorflow project, and am in the middle of defining and creating my feature columns. However, I have hundreds and hundreds of features- it's a pretty extensive dataset. Even after preprocessing and scrubbing, I have a lot of columns.

Tensorflow教程feature_column的传统方法. >甚至是 StackOverflow帖子.本质上,您为每个功能列声明并初始化一个Tensorflow对象:

The traditional way of creating a feature_column is defined in the Tensorflow tutorial and even this StackOverflow post. You essentially declare and initialize a Tensorflow object for each feature column:

gender = tf.feature_column.categorical_column_with_vocabulary_list(
    "gender", ["Female", "Male"])

如果您的数据集只有几列,那么这很好,也很好,但是就我而言,我当然不希望有数百行代码来初始化不同的feature_column对象.

This works all well and good if your dataset has only a few columns, but in my case, I surely don't want to have hundreds of lines of code initializing different feature_column objects.

解决此问题的最佳方法是什么?我注意到在本教程中,所有列均作为列表收集:

What's the best way to resolve this issue? I notice that in the tutorial, all the columns are collected as a list:

base_columns = [
    gender, native_country, education, occupation, workclass, relationship,
    age_buckets,
]

哪个最终会传递给您的估算器:

Which is ultimately passed into your estimator:

m = tf.estimator.LinearClassifier(
    model_dir=model_dir, feature_columns=base_columns)

那么处理数百个列的feature_column创建的理想方法是将它们直接附加到列表中吗?像这样吗?

So would the ideal way of handling feature_column creation for hundreds of columns be to append them directly into a list? Something like this?

my_columns = []

for col in df.columns:
    if is_string_dtype(df[col]): #is_string_dtype is pandas function
        my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
            hash_bucket_size= len(df[col].unique())))

    elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
        my_column.append(tf.feature_column.numeric_column(col))

这是创建这些功能列的最佳方法吗?还是我缺少Tensorflow的某些功能,可让我解决此步骤?

Is this the best way of creating these feature columns? Or am I missing some functionality to Tensorflow that allows me to work around this step?

推荐答案

您的想法对我来说很有意义. :)从您自己的代码复制:

What you have makes sense to me. :) copying from your own code:

import pandas.api.types as ptypes
my_columns = []
for col in df.columns:
  if ptypes.is_string_dtype(df[col]): 
    my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col, 
        hash_bucket_size= len(df[col].unique())))

  elif ptypes.is_numeric_dtype(df[col]): 
    my_columns.append(tf.feature_column.numeric_column(col))

  elif ptypes.is_categorical_dtype(df[col]): 
    my_columns.append(tf.feature_column.categorical_column(col, 
        hash_bucket_size= len(df[col].unique())))

这篇关于在Tensorflow中创建许多功能列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆