tensorflow.python.framework.errors_impl.InvalidArgumentError: исключение узла при отключении режима ожидания
Я написал модель, которая наследуется от tf.keras.models.Model и перегружает ее метод call(). Когда я запускаю его в нетерпеливом режиме, все работает нормально. Теперь я пытаюсь повысить производительность (длинные прогоны с разными параметрами) в режиме без ожидания, но у меня возникает эта ошибка.
Работа с TF 2.3 на 64-битной Ubuntu 18.04.4 LTS
2020-08-03 11:15:49.852459: I tensorflow/stream_executor/platform/default/dso_loader.cc:48] Successfully opened dynamic library libcuda.so.1
2020-08-03 11:15:49.864526: E tensorflow/stream_executor/cuda/cuda_driver.cc:314] failed call to cuInit: CUDA_ERROR_NO_DEVICE: no CUDA-capable device is detected
2020-08-03 11:15:49.864590: I tensorflow/stream_executor/cuda/cuda_diagnostics.cc:156] kernel driver does not appear to be running on this host (tau): /proc/driver/nvidia/version does not exist
2020-08-03 11:15:49.864990: I tensorflow/core/platform/cpu_feature_guard.cc:142] This TensorFlow binary is optimized with oneAPI Deep Neural Network Library (oneDNN)to use the following CPU instructions in performance-critical operations: AVX2 FMA
To enable them in other operations, rebuild TensorFlow with the appropriate compiler flags.
2020-08-03 11:15:49.876833: I tensorflow/core/platform/profile_utils/cpu_utils.cc:104] CPU Frequency: 2904005000 Hz
2020-08-03 11:15:49.877129: I tensorflow/compiler/xla/service/service.cc:168] XLA service 0x5725890 initialized for platform Host (this does not guarantee that XLA will be used). Devices:
2020-08-03 11:15:49.877147: I tensorflow/compiler/xla/service/service.cc:176] StreamExecutor device (0): Host, Default Version
2020-08-03 11:15:50.836795: W tensorflow/c/c_api.cc:326] Operation '{name:'train_full_model_cell/StatefulPartitionedCall' id:189 op device:{} def:{{{node train_full_model_cell/StatefulPartitionedCall}} = StatefulPartitionedCall[Tin=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_BOOL, ..., DT_RESOURCE, DT_RESOURCE, DT_RESOURCE, DT_RESOURCE, DT_RESOURCE], Tout=[DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, ..., DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT, DT_FLOAT], _collective_manager_ids=[], _read_only_resource_inputs=[5, 6, 7, 8, 9, 10, 11, 12, 13, 14], config="", config_proto="\n\007\n\003CPU\020\001\n\007\n\003GPU\020\0002\002J\0008\001\202\001\000", executor_type="", f=__forward_call_1504[]](input_1, input_2, input_3, input_4, keras_learning_phase, emb_net_hid_0/kernel, emb_net_hid_0/bias, emb_net_hid_1/kernel, emb_net_hid_1/bias, base_net_hid_0/kernel, base_net_hid_0/bias, base_net_hid_1/kernel, base_net_hid_1/bias, Output_Layer/kernel, Output_Layer/bias)}}' was changed by setting attribute after it was run by a session. This mutation will have no effect, and will trigger an error in the future. Either don't modify nodes after running them or create a new session.
Traceback (most recent call last):
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1365, in _do_call
return fn(*args)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1348, in _run_fn
self._extend_graph()
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1388, in _extend_graph
tf_session.ExtendSession(self._session)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'training/Adam/gradients/gradients/train_full_model_cell/StatefulPartitionedCall_grad/PartitionedCall': Connecting to invalid output 1 of source node train_full_model_cell/StatefulPartitionedCall which has 1 outputs.
During handling of the above exception, another exception occurred:
Traceback (most recent call last):
File "/mnt/Projects/ForcesNN/train_full_model_cell.py", line 205, in <module>
main(**vars(arguments))
File "/mnt/Projects/ForcesNN/train_full_model_cell.py", line 173, in main
callbacks=callbacks)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_v1.py", line 809, in fit
use_multiprocessing=use_multiprocessing)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 666, in fit
steps_name='steps_per_epoch')
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 206, in model_iteration
val_iterator = _get_iterator(val_inputs, model._distribution_strategy)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_arrays.py", line 542, in _get_iterator
return training_utils.get_iterator(inputs)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 1715, in get_iterator
initialize_iterator(iterator)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/engine/training_utils.py", line 1722, in initialize_iterator
K.get_session((init_op,)).run(init_op)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 630, in get_session
_initialize_variables(session)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/keras/backend.py", line 1053, in _initialize_variables
[variables_module.is_variable_initialized(v) for v in candidate_vars])
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 958, in run
run_metadata_ptr)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1181, in _run
feed_dict_tensor, options, run_metadata)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1359, in _do_run
run_metadata)
File "/home/user/venvs/venv/lib/python3.6/site-packages/tensorflow/python/client/session.py", line 1384, in _do_call
raise type(e)(node_def, op, message)
tensorflow.python.framework.errors_impl.InvalidArgumentError: Node 'training/Adam/gradients/gradients/train_full_model_cell/StatefulPartitionedCall_grad/PartitionedCall': Connecting to invalid output 1 of source node train_full_model_cell/StatefulPartitionedCall which has 1 outputs.
Process finished with exit code 1
Это часть соответствующего кода:
class Model(tf.keras.models.Model):
def __init__(self, **parameters):
super(Model, self).__init__()
self.parameters = parameters # Save model parameters
class TrainFullModelCell(Model):
def __init__(self, feature_list, k, r_cs, r_c, base_dim_l, base_act_l, g_dim_l, g_act_l):
super(TrainFullModelCell, self).__init__(feature_list=feature_list, k=k, r_cs=r_cs, r_c=r_c,
base_dim_l=base_dim_l, base_act_l=base_act_l,
g_dim_l=g_dim_l, g_act_l=g_act_l)
self.G = self._embedding_net(k, g_dim_l, g_act_l)
self.base_net = self._base_net(len(feature_list) * g_dim_l[-1], base_dim_l, base_act_l)
def dataset_preprocess(self, ds, with_rotation_aug):
"""
Runs all pre-processing mapping on dataset
:param ds: Tensorflow dataset
:param with_rotation_aug: Apply rotation augmentation of samples
:return: Preprocessed dataset
"""
# Generate parameter-dependent preprocess mapping functions
find_neighbours_in_radius = preprocess.generate_find_neighbours_in_radius(self.parameters["r_c"])
trim_neighbours = preprocess.generate_trim_neighbours(self.parameters["k"])
neighbour_info_to_features = preprocess.generate_neighbour_info_to_features(self.parameters["feature_list"],
self.parameters["r_cs"],
self.parameters["r_c"])
# Preprocess
if with_rotation_aug:
ds = ds.map(preprocess.remove_mean_of_pos_vectors)
ds = ds.map(preprocess.rotation_augmentation)
ds = ds.map(find_neighbours_in_radius)
ds = ds.map(trim_neighbours)
ds = ds.map(neighbour_info_to_features)
# Create a zip of two dataset: Inputs and Labels
ds = ds.flat_map(lambda *args: zip(tf.data.Dataset.from_tensor_slices(args[:-1]),
tf.data.Dataset.from_tensor_slices(args[-1])))
return ds
@tf.function
def call(self, inputs, training=None, mask=None):
X, s_r, _, _ = inputs
s_r = tf.expand_dims(s_r, axis=-1)
X = tf.transpose(self.G(s_r), [0, 2, 1]) @ X
X = tf.keras.layers.Flatten()(X)
return self.base_net(X)
@staticmethod
def _embedding_net(input_size, layer_dim_list, layer_act_list):
"""
Create embedding network.
We can't use keras embedding layer because we are not dealing with integers.
Instead we use Conv1D layer with filter_size as output_dim and kernel_size 1.
This way we get an output tensor with shape (batch_size, input_size, filter_size).
To really make it an embedding layer, we need to transpose the resulted matrix afterwards.
:param layer_dim_list: List of layer dimensions
:param layer_act_list: List of layer activation functions
:return: Embedding network
"""
emb_net = tf.keras.Sequential(name="Embedding_Net")
emb_net.add(tf.keras.layers.Input((int(input_size), 1), name='emb_net_input'))
for i, (layer_i_dim, layer_i_act) in enumerate(zip(layer_dim_list, layer_act_list)):
layer_i_act = tf.compat.as_text(layer_i_act)
emb_net.add(tf.keras.layers.Conv1D(layer_i_dim, 1, activation=layer_i_act, name='emb_net_hid_{}'.format(i)))
return emb_net
@staticmethod
def _base_net(input_size, layer_dim_list, layer_act_list):
"""
Create base-net
:param layer_dim_list: List of layer dimensions
:param layer_act_list: List of layer activation functions
:return: Sequential model
"""
base_net = tf.keras.Sequential(name="Base_Net")
base_net.add(tf.keras.layers.Input(int(input_size), name='base_net_input'))
for i, (layer_i_dim, layer_i_act) in enumerate(zip(layer_dim_list, layer_act_list)):
layer_i_act = tf.compat.as_text(layer_i_act)
base_net.add(tf.keras.layers.Dense(layer_i_dim, activation=layer_i_act, name='base_net_hid_{}'.format(i)))
base_net.add(tf.keras.layers.Dense(3, activation='linear', name='Output_Layer'))
return base_net