Несовместимые формы: [11768] vs. [1,5768] - вывод в производстве с сохраненной моделью huggingface

Question

Несовместимые формы: [11768] vs. [1,5768] - вывод в производстве с сохраненной моделью huggingface

Я сохранил предварительно обученную версию distilbert, distilbert-base-uncased-finetuned-sst-2-english из моделей huggingface, и я пытаюсь обслуживать ее через Tensorflow Serve и делать прогнозы. На данный момент все проходит тестирование в Colab.

У меня возникла проблема с переводом прогноза в правильный формат для модели с помощью TensorFlow Serve. Сервисы Tensorflow работают нормально, обслуживая модель, однако мой код предсказания неверен, и мне нужна помощь в понимании того, как делать предсказания через json через API.

# tokenize and encode a simple positive instance
instances = tokenizer.tokenize('this is the best day of my life!')
instances = tokenizer.encode(instances)
data = json.dumps({"signature_name": "serving_default", "instances": instances, })
print(data)

{"signature_name": "serv_default", "instance": [101, 2023, 2003, 1996, 2190, 2154, 1997, 2026, 2166, 999, 102]}

# setup json_response object
headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/my_model:predict', data=data, headers=headers)
predictions = json.loads(json_response.text)

предсказания

{'error': '{{function_node __inference__wrapped_model_52602}} {{function_node __inference__wrapped_model_52602}} Incompatible shapes: [11,768] vs. [1,5,768]\n\t [[{{node tf_distil_bert_for_sequence_classification_3/distilbert/embeddings/add}}]]\n\t [[StatefulPartitionedCall/StatefulPartitionedCall]]'}

Любое направление здесь будет оценено.

1

tensorflow-serving distilbert

Источник

user13795545 29 авг '20 в 03:47

1 ответ

Другие вопросы по тегам tensorflow-serving distilbert

user13795545 31 авг '20 в 03:48 2020-08-31 03:48 · Answer 1 · 2020-08-31 03:48

Смог найти решение, установив подписи для формы ввода и маски внимания, как показано ниже. Это простая реализация, которая использует фиксированную форму ввода для сохраненной модели и требует, чтобы вы дополняли ввод до ожидаемой формы ввода 384. Я видел реализации вызова пользовательских подписей и создания модели для соответствия ожидаемым формам ввода, однако ниже простой случай сработал для того, что я хотел достичь с помощью модели huggingface через TF Serve. Если у кого-нибудь есть лучшие примеры или способы улучшить эту функциональность, пожалуйста, опубликуйте их для использования в будущем.

# create callable
from transformers import TFDistilBertForQuestionAnswering
distilbert = TFDistilBertForQuestionAnswering.from_pretrained('distilbert-base-cased-distilled-squad')
callable = tf.function(distilbert.call)

Вызывая get_concrete_function, мы отслеживаем и компилируем операции TensorFlow модели для входной сигнатуры, состоящей из двух тензоров формы [None, 384], первый из которых является входными идентификаторами, а второй - маской внимания.

concrete_function = callable.get_concrete_function([tf.TensorSpec([None, 384], tf.int32, name="input_ids"), tf.TensorSpec([None, 384], tf.int32, name="attention_mask")])

сохраняем модель с подписями:

# stored model path for TF Serve (1 = version 1) --> '/path/to/my/model/distilbert_qa/1/'
distilbert_qa_save_path = 'path_to_model'
tf.saved_model.save(distilbert, distilbert_qa_save_path, signatures=concrete_function)

убедитесь, что он содержит правильную подпись:

saved_model_cli show --dir 'path_to_model' --tag_set serve --signature_def serving_default

вывод должен выглядеть так:

The given SavedModel SignatureDef contains the following input(s):
  inputs['attention_mask'] tensor_info:
      dtype: DT_INT32
      shape: (-1, 384)
      name: serving_default_attention_mask:0
  inputs['input_ids'] tensor_info:
      dtype: DT_INT32
      shape: (-1, 384)
      name: serving_default_input_ids:0
The given SavedModel SignatureDef contains the following output(s):
  outputs['output_0'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 384)
      name: StatefulPartitionedCall:0
  outputs['output_1'] tensor_info:
      dtype: DT_FLOAT
      shape: (-1, 384)
      name: StatefulPartitionedCall:1
Method name is: tensorflow/serving/predict

ИСПЫТАТЕЛЬНАЯ МОДЕЛЬ:

from transformers import DistilBertTokenizer
tokenizer = DistilBertTokenizer.from_pretrained('distilbert-base-cased')

question, text = "Who was Benjamin?", "Benjamin was a silly dog."
input_dict = tokenizer(question, text, return_tensors='tf')

start_scores, end_scores = distilbert(input_dict)

all_tokens = tokenizer.convert_ids_to_tokens(input_dict["input_ids"].numpy()[0])
answer = ' '.join(all_tokens[tf.math.argmax(start_scores, 1)[0] : tf.math.argmax(end_scores, 1)[0]+1])

ДЛЯ TF SERVE (в colab): (это было моим первоначальным намерением)

!echo "deb http://storage.googleapis.com/tensorflow-serving-apt stable tensorflow-model-server tensorflow-model-server-universal" | tee /etc/apt/sources.list.d/tensorflow-serving.list && \
curl https://storage.googleapis.com/tensorflow-serving-apt/tensorflow-serving.release.pub.gpg | apt-key add -
!apt update

!apt-get install tensorflow-model-server

import os
# path_to_model --> versions directory --> '/path/to/my/model/distilbert_qa/'
# actual stored model path version 1 --> '/path/to/my/model/distilbert_qa/1/'
MODEL_DIR = 'path_to_model'
os.environ["MODEL_DIR"] = os.path.abspath(MODEL_DIR)

%%bash --bg
nohup tensorflow_model_server --rest_api_port=8501 --model_name=my_model --model_base_path="${MODEL_DIR}" >server.log 2>&1

!tail server.log

СДЕЛАТЬ ЗАПРОС:

import json
!pip install -q requests
import requests
import numpy as np

max_length = 384  # must equal model signature expected input value
question, text = "Who was Benjamin?", "Benjamin was a good boy."

# padding='max_length' pads the input to the expected input length (else incompatible shapes error)
input_dict = tokenizer(question, text, return_tensors='tf', padding='max_length', max_length=max_length)

input_ids = input_dict["input_ids"].numpy().tolist()[0]
att_mask = input_dict["attention_mask"].numpy().tolist()[0]
features = [{'input_ids': input_ids, 'attention_mask': att_mask}]

data = json.dumps({ "signature_name": "serving_default", "instances": features})

headers = {"content-type": "application/json"}
json_response = requests.post('http://localhost:8501/v1/models/my_model:predict', data=data, headers=headers)
print(json_response)

predictions = json.loads(json_response.text)['predictions']

all_tokens = tokenizer.convert_ids_to_tokens(input_dict["input_ids"].numpy()[0])
answer = ' '.join(all_tokens[tf.math.argmax(predictions[0]['output_0']) : tf.math.argmax(predictions[0]['output_1'])+1])
print(answer)