数据集映射表中的Tensorflow功能列已初始化问题 [英] Tensorflow feature columns in Dataset map Table already initialized issue

查看:73
本文介绍了数据集映射表中的Tensorflow功能列已初始化问题的处理方法,对大家解决问题具有一定的参考价值,需要的朋友们下面随着小编来一起学习吧!

问题描述

我在尝试使用Tensorflow的feature_column映射到传递给Dataset映射方法的函数内部时遇到问题.尝试使用Dataset.map对输入数据集的分类字符串特征进行热编码作为输入管道的一部分时,会发生这种情况.我收到的错误消息是: tensorflow.python.framework.errors_impl.FailedPreconditionError:表已初始化.

I've run into an issue trying to use Tensorflow's feature_column mappings inside of a function passed in to the Dataset map method. This happens when trying to one hot encode categorical string features of a Dataset as part of the input pipeline using Dataset.map. The error message I'm getting is that: tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized.

以下代码是重现该问题的基本示例:

The following code is a basic example that recreates the problem:

import numpy as np    
import tensorflow as tf
from tensorflow.contrib.lookup import index_table_from_tensor

# generate tfrecords with two string categorical features and write to file
vlists = dict(season=['Spring', 'Summer', 'Fall', 'Winter'],
              day=['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])

writer = tf.python_io.TFRecordWriter('test.tfr')
for s,d in zip(np.random.choice(vlists['season'],50), 
               np.random.choice(vlists['day'],50)):
    example = tf.train.Example(
        features = tf.train.Features(
            feature={
                'season':tf.train.Feature(
                    bytes_list=tf.train.BytesList(value=[s.encode()])),
                'day':tf.train.Feature(
                    bytes_list=tf.train.BytesList(value=[d.encode()]))
            }
        )
    )
serialized = example.SerializeToString()
writer.write(serialized)
writer.close()

现在,在cwd中有一个名为test.tfr的tfrecord文件,其中包含50条记录,每条记录都包含两个字符串功能:季节"和日期", 然后,下面的代码将创建一个数据集,该数据集将解析tfrecords并创建大小为4的批处理

Now there's a tfrecord file in the cwd called test.tfr with 50 records, and each record consists of two string features, 'season' and 'day', The following will then create a Dataset that will parse the tfrecords and create batches of size 4

def parse_record(element):
    feats = {
        'season': tf.FixedLenFeature((), tf.string),
        'day': tf.FixedLenFeature((), tf.string)
    }
    return tf.parse_example(element, feats)

fname = tf.placeholder(tf.string, [])
ds = tf.data.TFRecordDataset(fname)
ds = ds.batch(4).map(parse_record)

在这一点上,如果您创建一个迭代器并对其多次调用get_next,它将按预期运行,并且每次运行时您都会看到如下所示的输出:

At this point if you create an iterator and call get_next on it several times, it works as expected and you would see output like this each run:

iterator = ds.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)
# output of run(nxt) would look like
# {'day': array([b'Sat', b'Thu', b'Fri', b'Thu'], dtype=object), 'season': array([b'Winter', b'Winter', b'Fall', b'Summer'], dtype=object)}

但是,如果我想使用feature_columns使用map对这些分类进行热编码为数据集转换,那么它将运行一次以产生正确的输出,但是在随后的对run(nxt)的每次调用中,它都会给出Tables已经初始化的错误,例如:

However, if I wanted to use feature_columns to one hot encode those categoricals as a Dataset transformation using map, then it runs once producing correct output, but on every subsequent call to run(nxt) it gives the Tables already initialized error, eg:

# using the same Dataset ds from above
season_enc = tf.feature_column.categorical_column_with_vocabulary_list(
    key='season', vocabulary_list=vlists['season'])
season_col = tf.feature_column.indicator_column(season_enc)
day_enc = tf.feature_column.categorical_column_with_vocabulary_list(
    key='day', vocabulary_list=vlists['day'])
day_col = tf.feature_column.indicator_column(day_enc)
cols = [season_col, day_col]

def _encode(element, feat_cols=cols):
    return tf.feature_column.input_layer(element, feat_cols)

ds1 = ds.map(_encode)
iterator = ds1.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)
# first run will produce correct one hot encoded output
sess.run(nxt)
# second run will generate

W tensorflow/core/framework/op_kernel.cc:1192] Failed precondition: Table 
already initialized.
2018-01-25 19:29:55.802358: W tensorflow/core/framework/op_kernel.cc:1192] 
Failed precondition: Table already initialized.
2018-01-25 19:29:55.802612: W tensorflow/core/framework/op_kernel.cc:1192] 
Failed precondition: Table already initialized.

tensorflow.python.framework.errors_impl.FailedPreconditionError:表格 已经初始化.

tensorflow.python.framework.errors_impl.FailedPreconditionError: Table already initialized.

但是,如果我尝试在不使用Feature_columns的情况下手动进行一种热编码,则只有在表功能在map函数之前创建表的情况下,该方法才有效,否则上面给出的错误相同

However, if I try to do the one hot encoding manually without feature_columns as below, then it only works if tables are created before the map function, otherwise it gives the same error above

# using same original Dataset ds
tables = dict(season=index_table_from_tensor(vlists['season']),
              day=index_table_from_tensor(vlists['day']))
def to_dummy(element):
    s = tables['season'].lookup(element['season'])
    d = tables['day'].lookup(element['day'])
    return (tf.one_hot(s, depth=len(vlists['season']), axis=-1),
            tf.one_hot(d, depth=len(vlists['day']), axis=-1))

ds2 = ds.map(to_dummy)
iterator = ds2.make_initializable_iterator()
nxt = iterator.get_next()
sess.run(tf.tables_initializer())
sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
sess.run(nxt)

似乎与feature_columns创建的索引查找表的范围或名称空间有关,但是我不确定如何弄清楚这里发生了什么,我试图更改feature_column的位置和时间对象已定义,但没有任何改变.

It seems as if it has something to do with the scope or namespace of the index lookup tables created by feature_columns, but I'm not sure how to figure out what's happening here, I've tried changing where and when the feature_column objects are defined, but it hasn't made a difference.

推荐答案

我刚刚通过来到这里问题,并希望提出一个可能的解决方案.由于这个问题已经很晚了,所以我不确定这里的问题是否已经解决.如果已经有了好的解决方案,请纠正我.

I just came here through another recent question and would like to propose a potential solution. Since it's pretty late for this question, I'm not sure if the problem here has been solved or not. Please correct me if there's already a good solution.

我真的不知道该错误是如何发生的.但是要从罐头估算器中学习,我意识到,可能存在另一种完成工作的方法,该方法是在解析示例之前对数据集进行迭代.此方法的优点是将要素列映射从映射功能映射到数据集.这可能与此处的未知错误原因有关,因为已知:

I really don't know exactly how this error happens. But learning from canned estimator, I realize there might be an alternative way to do the job which is iterating the dataset before parsing example. One good thing for this method is to separate feature column mapping from mapping function to dataset. This may be related to the unknown cause of error here since it is known that:

在tf.data.Dataset.map函数的"tensorflow.python.ops.gen_lookup_ops"中使用hash_table时 因为tf.data.Dataset.map不使用默认图形,所以无法初始化hash_table.

when using hash_table in "tensorflow.python.ops.gen_lookup_ops" in tf.data.Dataset.map function because tf.data.Dataset.map do not use the default graph, the hash_table can not be initialized.

我不确定这是否适合您的实际需求,但是在您的代码中使用"test.tfr"生成的一个潜在示例可能是:

I'm not sure if this will fit what you really want, but a potential example using the "test.tfr" generate in your code could be:

import tensorflow as tf

# using the same Dataset ds from above
vlists = dict(season=['Spring', 'Summer', 'Fall', 'Winter'],
              day=['Sun', 'Mon', 'Tue', 'Wed', 'Thu', 'Fri', 'Sat'])
season_enc = tf.feature_column.categorical_column_with_vocabulary_list(
    key='season', vocabulary_list=vlists['season'])
season_col = tf.feature_column.indicator_column(season_enc)
day_enc = tf.feature_column.categorical_column_with_vocabulary_list(
    key='day', vocabulary_list=vlists['day'])
day_col = tf.feature_column.indicator_column(day_enc)
cols = [season_col, day_col]

def _encode(element, feat_cols=cols):
    element = tf.parse_example(element, features=tf.feature_column.make_parse_example_spec(feat_cols))
    return tf.feature_column.input_layer(element, feat_cols)

fname = tf.placeholder(tf.string, [])
ds = tf.data.TFRecordDataset(fname)
ds = ds.batch(4)
ds1 = ds#.map(_encode)
iterator = ds1.make_initializable_iterator()
nxt = iterator.get_next()
nxt = _encode(nxt)

with tf.Session() as sess:
    sess.run(tf.tables_initializer())
    sess.run(iterator.initializer, feed_dict={fname:'test.tfr'})
    print(sess.run(nxt))
    # first run will produce correct one hot encoded output
    print(sess.run(nxt))

这篇关于数据集映射表中的Tensorflow功能列已初始化问题的文章就介绍到这了,希望我们推荐的答案对大家有所帮助,也希望大家多多支持IT屋!

查看全文
登录 关闭
扫码关注1秒登录
发送“验证码”获取 | 15天全站免登陆