LightningModule - Lightning AI

↧

How to correctly initialize latent vector parameters that have size dependent...

July 24, 2023, 6:43 am

Hi, May I ask how do you correctly create a set of latents for each sample in the training dataset? I.e., suppose you would like to have optimizable latent codes for each of the frame. The total...

View Article

Custom model definition is not included in checkpoint hyper_parameters

July 26, 2023, 2:10 am

Hi, i have the following dummy LightningModule class MyLightningModule(LightningModule): def __init__( self, param_1: torch.nn.Module = torch.nn.Conv2d(1,1,1) param_2: torch.nn.Module =...

View Article

Save_hyperparameters and OptimizerCallable

August 15, 2023, 2:56 am

If I have an OptimizerCallable argument in my models constructor, using save_hyperparameters just gives python/name:jsonargparse._typehints.partial_instance rather than the arguments used to build the...

View Article

Disabling autocast for certain modules

September 1, 2023, 9:25 am

Hi, I was wondering what is the way in Lightning to disable mixed precision for certain sub-modules? Is there a way to do this through callbacks? Thanks 2 posts - 2 participants Read full topic

View Article

Size mismatch for model

September 4, 2023, 12:55 am

Hi! I load checkpoint from model with head size = 1599 to same model with head size = 59. Set strict=False, but got the error: Traceback (most recent call last): File...

View Article

Where should I load the model checkpoint when using configure_model?

September 9, 2023, 7:10 am

When i load the model checkpoint in configure_model, the following error occurs. It seems to create an empty model, where should I load the model checkpoint? size mismatch for...

View Article

Load checkpoint with dynamically created model

September 12, 2023, 8:34 am

Hi, In the Lightingmodule docs, the setup hook is described as a possibility to dynamically build a model (instead of initiating in __init__). See the example here. However, when I load a...

View Article

ERROR:root:Attempting to deserialize object on a CUDA device but...

September 22, 2023, 3:44 am

Dear I trained a model that came from huggingface and the training works and saving the checkpoint. After when I try to load the model on a pc withouth CUDA I obtain the error: ERROR:root:Attempting...

View Article

Logging one value per epoch?

October 4, 2023, 11:16 am

Reading the documentation and following the examples, there doesn’t seem to be a way to log just one value per epoch. This is insane, because when you’re trying to figure out a model architecture,...

View Article

ValueError: too many values to unpack (expected 3)

October 14, 2023, 7:53 am

For studying purposes, I am trying to create a simple fine-tuning example using t5 and lighting: import pandas as pd df = pd.DataFrame({ "text": ["O Brasil é um país localizado na América do Sul.", "A...

View Article

Image may be NSFW.
Clik here to view.

Question about recover nested model from checkpoint

October 17, 2023, 12:12 pm

I have a Nested model class MovieScoreTask(pl.LightningModule): def __init__(self, base_model:nn.Module, learning_rate:float): super().__init__() self.save_hyperparameters() # self.example_input_array...

View Article

Metrics not logged properly in PyTorch Lightning

October 18, 2023, 6:33 am

The feature of logging is not working fine. It is giving following logs on console → v_num:z3_3 val_loss:3.105 val_kappa:0.34 val_accuracy:0.295 train_loss:2.436 train_kappa: nan train_accuracy:0.0...

View Article

Mixed precision training (how to appropriately scale the manual gradient...

December 5, 2023, 7:11 am

I’m working with mixed precision training. My loss has conceptually two components: loss1 and loss2. I call self.manual_backward(loss1,retain_graph=True). This fills gradients to all params. For...

View Article

RuntimeError: one of the variables needed for gradient computation has been...

December 8, 2023, 9:37 pm

My first forward pass went on smoothly but then i encounter this runtime error Variable._execution_engine.run_backward( # Calls into the C++ engine to run the backward pass RuntimeError: one of the...

View Article

How can I remove metric parameters from model?

January 5, 2024, 8:56 am

Hi, I meet a problem that lightning will save my metric parameters and make pytorch cannot load weights directly, how can I exlude it? Below is my code and class IAT_enhancement(L.LightningModule):...

View Article

confusions about load_from_checkpoint() and save_hyperparameters()

January 21, 2024, 8:16 am

according to Saving and loading checkpoints (basic) — PyTorch Lightning 2.1.3 documentation, There is a model like this: class Encoder(L.LightningModule): ... class Decoder(L.LightningModule): ......

View Article

Save and restore persisted DataLoader states from checkpoint

February 21, 2024, 12:24 pm

Hi! I am working on a project to save and restore persisted DataLoader states from checkpoint, especially working with vanilla Pytorch DataLoader Can you provide suggestions on how to implement that?...

View Article

How to interactively run inference with a model in jupyter notebook created...

March 1, 2024, 1:46 pm

example: RAD-MMM/tts_main.py at main · NVIDIA/RAD-MMM · GitHub 1 post - 1 participant Read full topic

View Article

Do I need to detach when using self.logger.experiment.add_scalars?

March 12, 2024, 7:45 am

I am aware that when we use self.log("train_loss",loss) for instance, the loss tensor is automatically detached to avoid CPU RAM leak. However, if I am logging something else through the method...

View Article

Skip instances during training

March 17, 2024, 5:04 am

Hi, I am using the LightningModule to train a neural network across many instances/GPUs, however the data is imbalanced ( I cannot change this ), so I want to skip over some instances during training...

View Article

LightningModule.train_dataloader()

March 19, 2024, 8:12 am

How do the hooks for the LightningModule interact with the hooks for the LightningDataModule? Does one overwrite the other? Previously, I was able to call the LightningDataModule.train_dataloader()...

View Article

Go pass the sanity check but get CUDA OUT OF MEMORY when in validation loop

April 3, 2024, 10:05 pm

Hi, when I run the train code. It pass the sanity check and use about 15GB/24GB memory. But when the code went to validation loop, I got CUDA OUT OF MEMORY error (it was fine in train loop. my...

View Article

Save torchmetrics plots after logging them in LightningModule

August 10, 2024, 2:46 pm

Hello, I am using a LightningModule and a Trainer and I’m using multiple Metrics from torchmetrics, some are native metrics to the library and some are customized Metrics objects. I’m only interested...

View Article

Fine tuning using LLAMA models

November 12, 2024, 12:55 am

Hello, My code was working with the T5 model for finetuning # train.py import os import torch import datasets from transformers import T5ForConditionalGeneration, T5Tokenizer import lightning as L...

View Article

DLRM run failed in torchrec+lightning

December 1, 2024, 4:19 am

model: recipes/torchrecipes/rec at main · facebookresearch/recipes · GitHub error: dlrm_main/0 [0]:[rank0]: Traceback (most recent call last): dlrm_main/0 [0]:[rank0]: File...

View Article