Quantcast
Channel: LightningModule - Lightning AI
Browsing all 34 articles
Browse latest View live

Logging using a torchmetric object that returns dictionary

Hi everone, Some of metrics in torchmetrics returns dictionary when I call its compute() method. For example, torchmetrics.SQuAD() returns {'exact_match': tensor(0., device='cuda:0'), 'f1':...

View Article


Image may be NSFW.
Clik here to view.

Run_training_epoch duration increases with more epochs

Reposting this discussion question here because I read in another discussion that lightning wants to move from discussions to this forum: I have a LightningModule, DataModule and Trainer that I am...

View Article

[CLI] How to Pass Arguments to Initialize an Object in L.LightningModule?

I want to use Lightning CLI to pass arguments to initialize a LightningModule and some objects inside (e.g., a nn.Module). Lightning CLI provides some helpful features that allow me to create and...

View Article

How to get the checkpoint without saving it?

When I train a LightningModule using a Trainer, how do I get the checkpoint object (which is presumably a python dict) without saving it to disk? 2 posts - 2 participants Read full topic

View Article

Improving poor training efficiency on A100 40 GB

Hi all! First, thank you for the amazing framework and blog. I am training falcon-7b on a custom dataset, with following hyperparams: batch_size = 2 aggregate_batch = 4 epochs = 10 train set size =...

View Article


Image may be NSFW.
Clik here to view.

What's wrong with the pytorch lighting doc

I can’t see any details of each option in the navigation menu. It’s hard to learn from this document without navigation buttons like on_train_start, on_save_checkpoint, etc 5 posts - 2 participants...

View Article

How to access the returned values of *_step()

hey,guys , I got a question when overriding LightningModule, the newest version deleted _epoch_end functions , but remain the return in all the _step() functions. Now how to access the returned values...

View Article

How do I get the metric in on_validation_epoch_end()?

def validation_step(self, batch, batch_idx, dataloader_idx=None): I calculate metrci here. metric = XXXX def on_validation_epoch_end(self): I would like to get the metric here . 3 posts - 2...

View Article


Auto grad issue

Hey folks! I am having an issue where I am executing a code that throws an error related to autograd I suppose. I have defined a forward step as follows. def step(self, batch, mode): anc, pos = batch...

View Article


`self.lr_schedulers().optimizer` and `self.optimizers()` return different...

I’m training a GPT2 network and my configure_optimizers() is as follows: def configure_optimizers(self): opt = optim.Adam(self.model.parameters(), self.lr) # logging.info() total_steps = \...

View Article

Lightning Module isn't loading checkpoint from the path as per documentation

Hi. Im trying to use the below methodology to load my checkpoint however, it throws isADirectoryError when I passed ckpt path as shown in documentation. Here’s my code. def main(): <some data...

View Article

How to correctly initialize latent vector parameters that have size dependent...

Hi, May I ask how do you correctly create a set of latents for each sample in the training dataset? I.e., suppose you would like to have optimizable latent codes for each of the frame. The total...

View Article

Custom model definition is not included in checkpoint hyper_parameters

Hi, i have the following dummy LightningModule class MyLightningModule(LightningModule): def __init__( self, param_1: torch.nn.Module = torch.nn.Conv2d(1,1,1) param_2: torch.nn.Module =...

View Article


Save_hyperparameters and OptimizerCallable

If I have an OptimizerCallable argument in my models constructor, using save_hyperparameters just gives python/name:jsonargparse._typehints.partial_instance rather than the arguments used to build the...

View Article

Disabling autocast for certain modules

Hi, I was wondering what is the way in Lightning to disable mixed precision for certain sub-modules? Is there a way to do this through callbacks? Thanks 2 posts - 2 participants Read full topic

View Article


Size mismatch for model

Hi! I load checkpoint from model with head size = 1599 to same model with head size = 59. Set strict=False, but got the error: Traceback (most recent call last): File...

View Article

Where should I load the model checkpoint when using configure_model?

When i load the model checkpoint in configure_model, the following error occurs. It seems to create an empty model, where should I load the model checkpoint? size mismatch for...

View Article


Load checkpoint with dynamically created model

Hi, In the Lightingmodule docs, the setup hook is described as a possibility to dynamically build a model (instead of initiating in __init__). See the example here. However, when I load a...

View Article

ERROR:root:Attempting to deserialize object on a CUDA device but...

Dear I trained a model that came from huggingface and the training works and saving the checkpoint. After when I try to load the model on a pc withouth CUDA I obtain the error: ERROR:root:Attempting...

View Article

Logging one value per epoch?

Reading the documentation and following the examples, there doesn’t seem to be a way to log just one value per epoch. This is insane, because when you’re trying to figure out a model architecture,...

View Article
Browsing all 34 articles
Browse latest View live