PB2 Ray.Tune постоянно дает сбой для одного и того же актера в одной и той же точке обучения, потому что код Tune возвращает ValueError

Я начал несколько испытаний с использованием Ray.tune PB2. Они используют 8 актеров и возмущают каждые 20 шагов. У актеров 0-6 нет никаких проблем, но затем у актера 7 во второй 20-шаговой эпохе постоянно возникает ошибка. В терминале я получаю следующее сообщение:

      Traceback (most recent call last):  
  File "./tune_pb2.py", line 303, in <module>  
    raise_on_failed_trial=False)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/tune.py", line 411, in run  
    runner.step()  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 572, in step  
    self.trial_executor.on_no_available_trials(self)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/trial_executor.py", line 183, in on_no_available_trials  
    raise TuneError("There are paused trials, but no more pending "
ray.tune.error.TuneError: There are paused trials, but no more pending trials with sufficient resources.

Я тренируюсь с 2 ГП и 2 ЦП, по одному на каждого актера. В этот момент актеры 0-6 завершили вторую эпоху и остановились. Актер 7 - единственный, кто бежит. Файл error.txt для этой пробной версии содержит следующее:

      Traceback (most recent call last):  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/trial_runner.py", line 755, in _process_trial
    self, trial, flat_result)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/schedulers/pbt.py", line 415, in on_trial_result
    lower_quantile)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/schedulers/pbt.py", line 479, in _perturb_trial
    self._exploit(trial_runner.trial_executor, trial, trial_to_clone)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/schedulers/pbt.py", line 532, in _exploit
    new_config = self._get_new_config(trial, trial_to_clone)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/schedulers/pb2.py", line 357, in _get_new_config
    trial_to_clone.config)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/schedulers/pb2.py", line 174, in explore
    X, y, current, newpoint, bounds, num_f=len(t_r.columns))  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/schedulers/pb2.py", line 83, in select_config
    m = GPy.models.GPRegression(X, y, kernel)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/paramz/parameterized.py", line 58, in __call__
    self.initialize_parameter()  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/paramz/core/parameter_core.py", line 337, in initialize_parameter
    self.trigger_update()  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/paramz/core/updateable.py", line 79, in trigger_update
    self._trigger_params_changed(trigger_parent)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/paramz/core/parameter_core.py", line 134, in _trigger_params_changed
    self.notify_observers(None, None if trigger_parent else -np.inf)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/paramz/core/observable.py", line 91, in notify_observers
    [callble(self, which=which) for _, _, callble in self.observers]  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/paramz/core/observable.py", line 91, in <listcomp>
    [callble(self, which=which) for _, _, callble in self.observers]  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/paramz/core/parameter_core.py", line 508, in _parameters_changed_notification
    self.parameters_changed()  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/GPy/core/gp.py", line 267, in parameters_changed
    self.posterior, self._log_marginal_likelihood, self.grad_dict = self.inference_method.inference(self.kern, self.X, self.likelihood, self.Y_normalized, self.mean_function, self.Y_metadata)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/GPy/inference/latent_function_inference/exact_gaussian_inference.py", line 53, in inference
    K = kern.K(X)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/GPy/kern/src/kernel_slice_operations.py", line 110, in wrap
    ret = f(self, s.X, s.X2, *a, **kw)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/ray/tune/schedulers/pb2_utils.py", line 42, in K
    dists = pairwise_distances(T1, T2, "cityblock")  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/metrics/pairwise.py", line 1779, in pairwise_distances
    return _parallel_pairwise(X, Y, func, n_jobs, **kwds)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/metrics/pairwise.py", line 1360, in _parallel_pairwise
    return func(X, Y, **kwds)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/metrics/pairwise.py", line 781, in manhattan_distances
    X, Y = check_pairwise_arrays(X, Y)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/metrics/pairwise.py", line 147, in check_pairwise_arrays
    estimator=estimator)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/utils/validation.py", line 72, in inner_f
    return f(**kwargs)  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/utils/validation.py", line 645, in check_array
    allow_nan=force_all_finite == 'allow-nan')  
  File "/home/john/anaconda3/envs/python3.7/lib/python3.7/site-packages/sklearn/utils/validation.py", line 99, in _assert_all_finite
    msg_dtype if msg_dtype is not None else X.dtype)  
ValueError: Input contains NaN, infinity or a value too large for dtype('float64').

Похоже, что сообщение об ошибке возникает внутри самого кода ray.tune, если я чего-то не упускаю. Если мой код настройки актуален, я могу предоставить и его.

Любая помощь будет принята с благодарностью.

0 ответов

Другие вопросы по тегам