A replayable fit() method - diff/patch attached

Hi Daniel

First of all nice library. Thanks.

I am training a large job on a remote supercomputing cluster which gives me fixed time limits for each job. Once the time limit is up, the job terminates.

Simply saving the learned parameters at the end using save_params_to() does not help when the job terminates before fit() is completed. All learned parameters are lost and training has to be done from the beginning.

So I modified the fit() method to periodically save the learned parameters at a given number of epoch steps. If the job is terminated prematurely before fit() is completed, the fit() can be invoked again by re-running the same job. The fit() will resume training (a warm start) where it stopped by loading the saved parameters using load_params_from().

Attached is a diff/patch of what I did.

[patch-replay.txt](https://github.com/dnouri/nolearn/files/568064/patch-replay.txt)

If you find it useful, feel free to upstream the code - or modify as you think fit.

Would be happy to contribute more at some point.

Cheers

Srimal.


Provide feedback

Saved searches

Use saved searches to filter your results more quickly

A replayable fit() method - diff/patch attached #305

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

A replayable fit() method - diff/patch attached #305

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions