Quantcast
Channel: LightningModule - Lightning AI
Viewing all articles
Browse latest Browse all 34

`self.lr_schedulers().optimizer` and `self.optimizers()` return different optimizers after resuming training

$
0
0

I’m training a GPT2 network and my configure_optimizers() is as follows:

    def configure_optimizers(self):
        opt = optim.Adam(self.model.parameters(), self.lr)
        # logging.info()
        total_steps = \
            len(self.trainer.datamodule.train_dataset) * self.trainer.max_epochs \
            // (self.trainer.datamodule.train_batch_size) \
            // (self.trainer.num_devices * self.trainer.accumulate_grad_batches) \

Due to some issues about my training data, I interrupted training.

After resuming training (specifying ckpt_path), I’ve found some weird issues about lr, that is, when I call self.optimizers().param_groups[0]['lr'] to get lr, the returned value is always 0, while I can get the appropriate lr when calling self.lr_schedulers().optimizer.param_groups[0]['lr'].

Through debugging, I’ve found that the former call will return a LightningAdam instance while the latter one will return a pytorch’s Adam instance.

So which optimizer is used to optimize the model, the optimizer derived from self.lr_schedulers().optimizer or self.optimizers()? And does it mean that there are some bugs about saving and resuming lr_schedulers?

1 post - 1 participant

Read full topic


Viewing all articles
Browse latest Browse all 34

Trending Articles