Ошибка CUDA из-за нехватки памяти во время точной настройки PEFT LoRA
Я пытаюсь точно настроить вес модели FLAN-T5, загруженной с обнимающего лица. Я пытаюсь сделать это с помощью PEFT и, в частности, LoRA. Я использую код Python 3 ниже. Я запускаю это на сервере Ubuntu 18.04LTS с графическим процессором Invidia с 8 ГБ оперативной памяти. Я получаю сообщение об ошибке «CUDA недостаточно памяти», полное сообщение об ошибке приведено ниже. Я попробовал добавить:
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"
но я все еще получаю то же сообщение об ошибке. Код и сообщение об ошибке приведены ниже. Может ли кто-нибудь увидеть, в чем может быть проблема, и подсказать, как ее решить?
Код:
from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np
# added to deal with memory allocation error
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"
#
# ### Load Dataset and LLM
huggingface_dataset_name = "knkarthick/dialogsum"
dataset = load_dataset(huggingface_dataset_name)
dataset
# Load the pre-trained [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5) and its tokenizer directly from HuggingFace. Using the [small version](https://huggingface.co/google/flan-t5-base) of FLAN-T5. Setting `torch_dtype=torch.bfloat16` specifies the memory type to be used by this model.
# In[17]:
model_name='google/flan-t5-base'
original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)
index = 200
dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']
prompt = f"""
Summarize the following conversation.
{dialogue}
Summary:
"""
inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
original_model.generate(
inputs["input_ids"],
max_new_tokens=200,
)[0],
skip_special_tokens=True
)
dash_line = '-'.join('' for x in range(100))
# updated 11/1/23 to ensure using gpu
def tokenize_function(example):
start_prompt = 'Summarize the following conversation.\n\n'
end_prompt = '\n\nSummary: '
prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids\
.cuda()
example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids\
.cuda()
return example
# The dataset actually contains 3 diff splits: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches.
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])
# To save some time subsample the dataset:
tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 100 == 0, with_indices=True)
# Check the shapes of all three parts of the dataset:
# In[7]:
# print(f"Shapes of the datasets:")
# print(f"Training: {tokenized_datasets['train'].shape}")
# print(f"Validation: {tokenized_datasets['validation'].shape}")
# print(f"Test: {tokenized_datasets['test'].shape}")
#
# print(tokenized_datasets)
# The output dataset is ready for fine-tuning.
#
# ### Perform Parameter Efficient Fine-Tuning (PEFT)
# - use LoRA
#
# ### Setup the PEFT/LoRA model for Fine-Tuning
#
# - set up the PEFT/LoRA model for fine-tuning with a new layer/parameter adapter
# - freezing the underlying LLM and only training the adapter
# - LoRA configuration below
# - Note the rank (`r`) hyper-parameter, which defines the rank/dimension of the adapter to be trained
# In[8]:
from peft import LoraConfig, get_peft_model, TaskType
lora_config = LoraConfig(
# r=4, # Rank
# lora_alpha=4,
r=32, # Rank
lora_alpha=32,
target_modules=["q", "v"],
lora_dropout=0.05,
bias="none",
task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5
)
# Add LoRA adapter layers/parameters to the original LLM to be trained.
# In[9]:
peft_model = get_peft_model(original_model,
lora_config)
# print(print_number_of_trainable_model_parameters(peft_model))
# Enable gradient checkpointing in the model's configuration.
# peft_model.config.gradient_checkpointing = True
#
# ### Train PEFT Adapter
#
# Define training arguments and create `Trainer` instance.
# In[10]:
output_dir = f'/home/username/stuff/username_storage/LLM/PEFT/train_args/no_log_max_depth_{str(int(time.time()))}'
peft_training_args = TrainingArguments(
output_dir=output_dir,
# auto_find_batch_size=True,
per_device_train_batch_size=4,
learning_rate=1e-3, # Higher learning rate than full fine-tuning.
num_train_epochs=1,
# max_steps=1
)
peft_trainer = Trainer(
model=peft_model,
args=peft_training_args,
train_dataset=tokenized_datasets["train"],
)
# In[11]:
peft_trainer.train()
peft_model_path="/home/username/stuff/username_storage/LLM/PEFT/peft-dialogue-summary-checkpoint-local"
peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)
ошибка:
return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.79 GiB total capacity; 1.10 GiB already allocated; 17.31 MiB free; 1.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
0%| | 0/32 [00:00<?, ?it/s]
обновлять:
Я попытался перейти к размеру пакета 1 и получил сообщение об ошибке ниже.
attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 7.79 GiB total capacity; 1.10 GiB already allocated; 11.31 MiB free; 1.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation. See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF