Skip to content

perf: empty cache after using residuals and between trials#15

Merged
p-e-w merged 1 commit intop-e-w:masterfrom
red40maxxer:optimize-memory-usage
Nov 17, 2025
Merged

perf: empty cache after using residuals and between trials#15
p-e-w merged 1 commit intop-e-w:masterfrom
red40maxxer:optimize-memory-usage

Conversation

@red40maxxer
Copy link
Contributor

Great work btw, this is an awesome project and looking forward to seeing it grow, I'm still learning and this has been a fun repo to hack around in :)

I'm pretty sure we can free the residuals as soon as the refusal directions have been computed, and we can also clear the cache between trials.

Tested on my laptop 4060 while abliterating Qwen3-0.6B and peak GPU usage saw a slight decline

score, kl_divergence, refusals = evaluator.get_score()

# free memory between trials
empty_cache()
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is done by reload_model() already, which happens at the start of each trial. I'd be very surprised if that line makes a difference because we're literally milliseconds away from calling that function anyway.

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

oops you're right, i'll remove that line

Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you retest and check if the performance gain is still there with just the residuals garbage collected?

Copy link
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Memory usage without residual gc:

█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀  v1.0.1
█▀█░█▀▀░█▀▄░█▀▀░░█░░█░█░░
▀░▀░▀▀▀░▀░▀░▀▀▀░░▀░░▀░▀▀▀  https://github.com/p-e-w/heretic

GPU type: NVIDIA GeForce RTX 4060 Laptop GPU

Loading model Qwen/Qwen3-0.6B...
* Trying dtype auto... Ok
* Transformer model with 28 layers
* Abliterable components:
  * attn.o_proj: 1 matrices per layer
  * mlp.down_proj: 1 matrices per layer

Loading good prompts from mlabonne/harmless_alpaca...
* 400 prompts loaded

Loading bad prompts from mlabonne/harmful_behaviors...
* 400 prompts loaded

Loading good evaluation prompts from mlabonne/harmless_alpaca...
* 100 prompts loaded
* Obtaining first-token probability distributions...

Loading bad evaluation prompts from mlabonne/harmful_behaviors...
* 100 prompts loaded
* Counting model refusals...
* Initial refusals: 52/100

Calculating per-layer refusal directions...
GPU memory before residuals: 1261976064 (peak so far: 3135785472)
* Obtaining residuals for good prompts...
* Obtaining residuals for bad prompts...
GPU memory after residuals: 1358445056 (peak so far: 3135785472)
GPU memory after clearing residuals: 1358563840 (peak so far: 3135785472)

With GC:

█░█░█▀▀░█▀▄░█▀▀░▀█▀░█░█▀▀  v1.0.1
█▀█░█▀▀░█▀▄░█▀▀░░█░░█░█░░
▀░▀░▀▀▀░▀░▀░▀▀▀░░▀░░▀░▀▀▀  https://github.com/p-e-w/heretic

GPU type: NVIDIA GeForce RTX 4060 Laptop GPU

Loading model Qwen/Qwen3-0.6B...
* Trying dtype auto... Ok
* Transformer model with 28 layers
* Abliterable components:
  * attn.o_proj: 1 matrices per layer
  * mlp.down_proj: 1 matrices per layer

Loading good prompts from mlabonne/harmless_alpaca...
* 400 prompts loaded

Loading bad prompts from mlabonne/harmful_behaviors...
* 400 prompts loaded

Loading good evaluation prompts from mlabonne/harmless_alpaca...
* 100 prompts loaded
* Obtaining first-token probability distributions...

Loading bad evaluation prompts from mlabonne/harmful_behaviors...
* 100 prompts loaded
* Counting model refusals...
* Initial refusals: 52/100

Calculating per-layer refusal directions...
GPU memory before residuals: 1261976064 (peak so far: 3135785472)
* Obtaining residuals for good prompts...
* Obtaining residuals for bad prompts...
GPU memory after residuals: 1358445056 (peak so far: 3135785472)
GPU memory after clearing residuals: 1262094848 (peak so far: 3135785472)

So we save ~100MB by clearing the residuals but it doesn't reduce the peak usage, it would scale with number of prompts I think?

@red40maxxer red40maxxer force-pushed the optimize-memory-usage branch from e29e667 to cec47df Compare November 17, 2025 16:42
@p-e-w p-e-w merged commit 7bad84b into p-e-w:master Nov 17, 2025
@p-e-w
Copy link
Owner

p-e-w commented Nov 17, 2025

Thanks, this is a reasonable change!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants