feat: Allow study progress to be saved & resumed#106
Conversation
Summary of ChangesHello @anrp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request significantly enhances the robustness of optimization studies by implementing a comprehensive checkpointing and resumption system. It leverages Optuna's Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature to save and resume study progress, which is crucial for long-running optimization tasks. The implementation correctly uses Optuna's JournalStorage for persistence, and the logic for handling resumed studies, including the continuation of startup trials, is well-implemented. I've identified a critical typo that would prevent the code from running, along with a medium-severity style guide violation. After addressing these points, this will be a solid contribution.
af67313 to
b49e0ce
Compare
b49e0ce to
3c37150
Compare
src/heretic/main.py
Outdated
| remaining_trials = settings.n_trials - start_index | ||
| if remaining_trials > 0 and random_trials_to_run > 0: | ||
| if start_index > 0: | ||
| print(f"Running additional {random_trials_to_run} random trials") |
There was a problem hiding this comment.
This should not be displayed in the frontend. Sampling is an implementation detail that the user can do nothing with.
|
Ultimately from all of this discussion - would you like "configuration option for directory", "all settings saved in study name so that it can't restart from a different place" and "no logic for more random tries"? That converts this to a resume-only capability (and you can extend the number of tries interactively, that's it)? (Maybe just don't include n_tries in the study name, and read from study file (resuming) or command line (initial)? |
|
We need to decide what we actually want here. My idea was to have a system that can recover from a crash by resuming where it left off (e.g. after another program used up VRAM and led to an OOM), or allow the user to stop the run and continue the next day. In that case, it's obvious what should happen: All settings are restored, and the trials continue from the cutoff point. But it seems that you and @spikymoth have a more ambitious vision: That the user should be able to stop the run and resume it with different settings. I agree that this ability would be nice to have in principle, but it seems very difficult to implement in a way that's correct, non-confusing, and maintainable. |
|
The only setting that really makes sense to resume with differently is the n_trials, and that can effectively already be overriden with a resume-only capability by the interactive menu option. I would be fine with that, because it does simplify every other part of this. |
|
Here's a sketch of how I think a reasonable implementation might look:
|
|
1-4 SGTM but I'd like to push back on 5 being automatic, since it's sometimes useful to just be able to drop back in to the chat interface to test things i.e. I'd like to keep the few-KB result file which represents the $hours of computation. Thoughts? |
|
I think a "Delete study log? (y/n)" question at the end if you select "None (exit program)" would make sense, personally. And I agree that |
Okay, but how exactly will this work? Let's say the study is complete and we keep the journal. Now Heretic is run again with the same model, and the user chooses to "resume" the study. But the study is already complete, and the number of completed trials is equal to the number of trials to complete. Now what? We just tell them that the run they just started is already complete, and show the trial selection menu with the Pareto front?
But we already have that functionality. When the Pareto front is displayed, one of the options is "Continue optimization (run more trials)". This was implemented in #76. |
is necessary if you want to
after deciding to exit. |
|
Ok, how about this: Instead of asking the user whether to delete the journal file when they exit (which interrupts the exit process and might be the wrong time to make that decision), we prompt them when the program starts? If the previous run is complete: If the previous run is incomplete: |
3c37150 to
ba1f085
Compare
|
I like that, basically moves the delete question to startup time. Implemented, PTAL. |
ba1f085 to
2d1fdbb
Compare
2d1fdbb to
19d9f52
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature to save and resume study progress, which is a great improvement for long-running optimization tasks. The implementation is mostly solid, leveraging Optuna's journaling storage. I've found one area in src/heretic/main.py that could be improved for robustness and efficiency by handling potential missing keys in study attributes and avoiding redundant API calls. My suggestion addresses a potential crash and makes the code cleaner. The rest of the changes are well-implemented and consistent with the new feature.
19d9f52 to
73d7dba
Compare
f95b0ce to
a68fc36
Compare
ca72b30 to
083a6b3
Compare
083a6b3 to
d1d0fb6
Compare
d1d0fb6 to
e38921c
Compare
e38921c to
6483012
Compare
d83a056 to
3b63699
Compare
|
/gemini review |
|
Warning Gemini encountered an error creating the review. You can try again by commenting |
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable feature allowing study progress to be saved and resumed, which significantly improves the user experience by preventing loss of work. The changes are well-structured, touching .gitignore for checkpoint exclusion, config.py for new settings, and main.py and utils.py for the core logic of saving, loading, and resuming studies. The refactoring of settings source customization in config.py is also a good improvement. Overall, the implementation seems robust and directly addresses the stated goal of the pull request.
|
Yup, that looks good now. Merged! |
Fixes #95
Followup/improvement of https://github.com/spikymoth/heretic/tree/save-and-resume