feat: add max_memory parameter to limit memory usage#83
Conversation
|
Closes #80 |
src/heretic/model.py
Outdated
| try: | ||
| max_memory = ( | ||
| { | ||
| int(k) if k.isdigit() else k: v |
There was a problem hiding this comment.
Doesn't this happen automatically in Accelerate? And if not, why would we do it? The interface we offer should match what Accelerate does, not add conveniences on top.
There was a problem hiding this comment.
It doesn't seem that way, when I try without it, I get Trying dtype bfloat16... Failed (name 'max_memory' is not defined)
There was a problem hiding this comment.
Well, then we shouldn't support string keys at all, if Accelerate doesn't.
There was a problem hiding this comment.
Ah wait, is TOML the problem, because it doesn't support integer keys?
There was a problem hiding this comment.
It guess so, when I try to pass it as-is I get the error
Trying dtype bfloat16... Failed (Device 0 is not recognized, available devices are integers(for GPU/XPU), 'mps', 'cpu' and 'disk')
src/heretic/model.py
Outdated
| settings.model, | ||
| dtype=dtype, | ||
| device_map=settings.device_map, | ||
| max_memory=max_memory, |
There was a problem hiding this comment.
This also needs to happen in reload_model below, otherwise abliteration will fail.
There was a problem hiding this comment.
I forgot to add that, silly me
src/heretic/model.py
Outdated
| } | ||
| if settings.max_memory | ||
| else None | ||
| ) |
There was a problem hiding this comment.
No need to duplicate this code. You should be able to reuse the max_memory value stored in the model object. See the equivalent code for dtype a few lines up.
There was a problem hiding this comment.
I'm really sorry for the mess, I'm a scatter brain today
…ble, then reuse it in both locations
|
Merged, thank you! |
|
Thank you, for this fantastic project! |
with this pull request we add the max_memory parameter, so that we can limit the memory we use on the specified devices.
for example if we use it in the config.toml as this:
max_memory = {0 = "16GB", "cpu" = "64GB"}we will use at most 16GB of the first graphics card VRAM.