在Tensorflow中创建许多功能列 [英] Creating many feature columns in Tensorflow
问题描述
我正在开始Tensorflow项目,并且正在定义和创建特征列.但是,我有成百上千的功能-这是一个非常广泛的数据集.即使经过预处理和洗涤,我也有很多列.
I'm getting started on a Tensorflow project, and am in the middle of defining and creating my feature columns. However, I have hundreds and hundreds of features- it's a pretty extensive dataset. Even after preprocessing and scrubbing, I have a lot of columns.
Tensorflow教程feature_column的传统方法. >甚至是 StackOverflow帖子.本质上,您为每个功能列声明并初始化一个Tensorflow对象:
The traditional way of creating a feature_column
is defined in the Tensorflow tutorial and even this StackOverflow post. You essentially declare and initialize a Tensorflow object for each feature column:
gender = tf.feature_column.categorical_column_with_vocabulary_list(
"gender", ["Female", "Male"])
如果您的数据集只有几列,那么这很好,也很好,但是就我而言,我当然不希望有数百行代码来初始化不同的feature_column
对象.
This works all well and good if your dataset has only a few columns, but in my case, I surely don't want to have hundreds of lines of code initializing different feature_column
objects.
解决此问题的最佳方法是什么?我注意到在本教程中,所有列均作为列表收集:
What's the best way to resolve this issue? I notice that in the tutorial, all the columns are collected as a list:
base_columns = [
gender, native_country, education, occupation, workclass, relationship,
age_buckets,
]
哪个最终会传递给您的估算器:
Which is ultimately passed into your estimator:
m = tf.estimator.LinearClassifier(
model_dir=model_dir, feature_columns=base_columns)
那么处理数百个列的feature_column
创建的理想方法是将它们直接附加到列表中吗?像这样吗?
So would the ideal way of handling feature_column
creation for hundreds of columns be to append them directly into a list? Something like this?
my_columns = []
for col in df.columns:
if is_string_dtype(df[col]): #is_string_dtype is pandas function
my_column.append(tf.feature_column.categorical_column_with_hash_bucket(col,
hash_bucket_size= len(df[col].unique())))
elif is_numeric_dtype(df[col]): #is_numeric_dtype is pandas function
my_column.append(tf.feature_column.numeric_column(col))
这是创建这些功能列的最佳方法吗?还是我缺少Tensorflow的某些功能,可让我解决此步骤?
Is this the best way of creating these feature columns? Or am I missing some functionality to Tensorflow that allows me to work around this step?
推荐答案
您的想法对我来说很有意义. :)从您自己的代码复制:
What you have makes sense to me. :) copying from your own code:
import pandas.api.types as ptypes
my_columns = []
for col in df.columns:
if ptypes.is_string_dtype(df[col]):
my_columns.append(tf.feature_column.categorical_column_with_hash_bucket(col,
hash_bucket_size= len(df[col].unique())))
elif ptypes.is_numeric_dtype(df[col]):
my_columns.append(tf.feature_column.numeric_column(col))
elif ptypes.is_categorical_dtype(df[col]):
my_columns.append(tf.feature_column.categorical_column(col,
hash_bucket_size= len(df[col].unique())))
这篇关于在Tensorflow中创建许多功能列的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!