Ошибка CUDA из-за нехватки памяти во время точной настройки PEFT LoRA

Я пытаюсь точно настроить вес модели FLAN-T5, загруженной с обнимающего лица. Я пытаюсь сделать это с помощью PEFT и, в частности, LoRA. Я использую код Python 3 ниже. Я запускаю это на сервере Ubuntu 18.04LTS с графическим процессором Invidia с 8 ГБ оперативной памяти. Я получаю сообщение об ошибке «CUDA недостаточно памяти», полное сообщение об ошибке приведено ниже. Я попробовал добавить:

      import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"

но я все еще получаю то же сообщение об ошибке. Код и сообщение об ошибке приведены ниже. Может ли кто-нибудь увидеть, в чем может быть проблема, и подсказать, как ее решить?

Код:

      from datasets import load_dataset
from transformers import AutoModelForSeq2SeqLM, AutoTokenizer, GenerationConfig, TrainingArguments, Trainer
import torch
import time
import evaluate
import pandas as pd
import numpy as np

# added to deal with memory allocation error
import os
os.environ["PYTORCH_CUDA_ALLOC_CONF"] = "max_split_size_mb:512"

#
# ### Load Dataset and LLM   

huggingface_dataset_name = "knkarthick/dialogsum"

dataset = load_dataset(huggingface_dataset_name)

dataset


# Load the pre-trained [FLAN-T5 model](https://huggingface.co/docs/transformers/model_doc/flan-t5) and its tokenizer directly from HuggingFace. Using the [small version](https://huggingface.co/google/flan-t5-base) of FLAN-T5. Setting `torch_dtype=torch.bfloat16` specifies the memory type to be used by this model.

# In[17]:


model_name='google/flan-t5-base'

original_model = AutoModelForSeq2SeqLM.from_pretrained(model_name, torch_dtype=torch.bfloat16)
tokenizer = AutoTokenizer.from_pretrained(model_name)



index = 200

dialogue = dataset['test'][index]['dialogue']
summary = dataset['test'][index]['summary']

prompt = f"""
Summarize the following conversation.

{dialogue}

Summary:
"""

inputs = tokenizer(prompt, return_tensors='pt')
output = tokenizer.decode(
    original_model.generate(
        inputs["input_ids"],
        max_new_tokens=200,
    )[0],
    skip_special_tokens=True
)

dash_line = '-'.join('' for x in range(100))


# updated 11/1/23 to ensure using gpu
def tokenize_function(example):
    start_prompt = 'Summarize the following conversation.\n\n'
    end_prompt = '\n\nSummary: '
    prompt = [start_prompt + dialogue + end_prompt for dialogue in example["dialogue"]]
    example['input_ids'] = tokenizer(prompt, padding="max_length", truncation=True, return_tensors="pt").input_ids\
    .cuda()
    example['labels'] = tokenizer(example["summary"], padding="max_length", truncation=True, return_tensors="pt").input_ids\
    .cuda()

    return example

# The dataset actually contains 3 diff splits: train, validation, test.
# The tokenize_function code is handling all data across all splits in batches.
tokenized_datasets = dataset.map(tokenize_function, batched=True)
tokenized_datasets = tokenized_datasets.remove_columns(['id', 'topic', 'dialogue', 'summary',])


# To save some time subsample the dataset:

tokenized_datasets = tokenized_datasets.filter(lambda example, index: index % 100 == 0, with_indices=True)


# Check the shapes of all three parts of the dataset:

# In[7]:


# print(f"Shapes of the datasets:")
# print(f"Training: {tokenized_datasets['train'].shape}")
# print(f"Validation: {tokenized_datasets['validation'].shape}")
# print(f"Test: {tokenized_datasets['test'].shape}")
#
# print(tokenized_datasets)


# The output dataset is ready for fine-tuning.

#
# ### Perform Parameter Efficient Fine-Tuning (PEFT)
# - use LoRA

#
# ### Setup the PEFT/LoRA model for Fine-Tuning
#
# - set up the PEFT/LoRA model for fine-tuning with a new layer/parameter adapter
# - freezing the underlying LLM and only training the adapter
# - LoRA configuration below
# - Note the rank (`r`) hyper-parameter, which defines the rank/dimension of the adapter to be trained

# In[8]:


from peft import LoraConfig, get_peft_model, TaskType

lora_config = LoraConfig(
#     r=4, # Rank
#     lora_alpha=4,
    r=32, # Rank
    lora_alpha=32,
    target_modules=["q", "v"],
    lora_dropout=0.05,
    bias="none",
    task_type=TaskType.SEQ_2_SEQ_LM # FLAN-T5

)


# Add LoRA adapter layers/parameters to the original LLM to be trained.

# In[9]:


peft_model = get_peft_model(original_model,
                            lora_config)
# print(print_number_of_trainable_model_parameters(peft_model))

# Enable gradient checkpointing in the model's configuration.
# peft_model.config.gradient_checkpointing = True


#
# ### Train PEFT Adapter
#
# Define training arguments and create `Trainer` instance.

# In[10]:


output_dir = f'/home/username/stuff/username_storage/LLM/PEFT/train_args/no_log_max_depth_{str(int(time.time()))}'

peft_training_args = TrainingArguments(
    output_dir=output_dir,
#     auto_find_batch_size=True,
    per_device_train_batch_size=4, 
    learning_rate=1e-3, # Higher learning rate than full fine-tuning.
    num_train_epochs=1,
#     max_steps=1
)

peft_trainer = Trainer(
    model=peft_model,
    args=peft_training_args,
    train_dataset=tokenized_datasets["train"],
)


# In[11]:


peft_trainer.train()

peft_model_path="/home/username/stuff/username_storage/LLM/PEFT/peft-dialogue-summary-checkpoint-local"

peft_trainer.model.save_pretrained(peft_model_path)
tokenizer.save_pretrained(peft_model_path)

ошибка:

      return _VF.dropout_(input, p, training) if inplace else _VF.dropout(input, p, training)
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 20.00 MiB (GPU 0; 7.79 GiB total capacity; 1.10 GiB already allocated; 17.31 MiB free; 1.11 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF
  0%|          | 0/32 [00:00<?, ?it/s]

обновлять:

Я попытался перейти к размеру пакета 1 и получил сообщение об ошибке ниже.

      attn_weights = nn.functional.softmax(scores.float(), dim=-1).type_as(
torch.cuda.OutOfMemoryError: CUDA out of memory. Tried to allocate 12.00 MiB (GPU 0; 7.79 GiB total capacity; 1.10 GiB already allocated; 11.31 MiB free; 1.12 GiB reserved in total by PyTorch) If reserved memory is >> allocated memory try setting max_split_size_mb to avoid fragmentation.  See documentation for Memory Management and PYTORCH_CUDA_ALLOC_CONF

0 ответов

Другие вопросы по тегам