Skip to content

feat: make parameter ranges configurable#138

Open
spikymoth wants to merge 1 commit intop-e-w:masterfrom
spikymoth:configurable-params
Open

feat: make parameter ranges configurable#138
spikymoth wants to merge 1 commit intop-e-w:masterfrom
spikymoth:configurable-params

Conversation

@spikymoth
Copy link
Contributor

Alright, this will probably still need some fine-tuning, but I think it should be relatively simple to extend #53 for this or vice versa.

The idea is as follows:

  • Parameter ranges are specified by CategoricalParameterSpecification and FloatParameterSpecification.
  • A particular ablation method defines a set of default ranges that correspond to the parameters used by the objective function.
  • For each parameter, we look up the most specific parameter specification - either the default, or an override from the config. More specific overrides are preferred: The longer the matching suffix, the higher the priority.
  • Once a parameter specification is obtained, we call suggest_float or suggest_categorical on that specification.

The logic in config.py doesn't make any assumptions about what parameters are used, so it should be reusable for any set. In the current implementation, we call settings.set_parameter_range_defaults() from main.py, but this could be abstracted into a Modifier or similar in the context of #53 (as far as I can see, that PR currently does not touch the objective function).

The parameter ranges themselves mostly match the current defaults, with 1 caveat:

  • To make the ranges model-agnostic (required for the defaults), I used ranges between 0 and 1 for the positional parameters, multiplying with last_layer_index afterward.
  • For min_weight_distance, this meant I had to set the minimum to 0 which is a slight change. I think this is harmless - a distance of 0 just means that set of parameters will do nothing. Technically values between 0 and 1 post-multiplication can also still be effective since the nearest layer should have a distance < 1. Either way it shouldn't impact the sampler much in practice.

A few more notes:

  • The type syntax in Python 3.10 (the minimum version targeted by Heretic) is somewhat limited, requiring use of Type, TypeVar, TypeAlias and TypeAliasType. Python 3.12 adds a cleaner syntax for this.
  • ty doesn't seem to support narrowing generics with unions (see e.g. Incorrect type narrowing for generic variable with bound astral-sh/ty#2726), so I had to add a suppression.
  • To avoid having to repeat the module and parameter name when calling suggest_float or suggest_categorical, I set private attribute _name during _param_spec(). It works fine, but it does mean that the returned object is specific to that loop iteration. Alternatives would be to create a copy with specific name, or repeat the name twice, but the former felt wasteful and the latter felt like a potential footgun.
  • I added all the current parameter ranges to config.default.toml. That's not really required, and it makes config.default.toml specific to the default ablation method, but I think it's very helpful for discoverability.

@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @spikymoth, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a robust and configurable system for managing parameter ranges within the application, particularly for ablation experiments. By defining parameter specifications and a lookup mechanism that supports both global defaults and specific overrides, the changes enhance flexibility and maintainability. This allows users to easily customize the sampling behavior of various parameters, ensuring that the system can adapt to different models and experimental requirements without modifying core logic.

Highlights

  • Configurable Parameter Ranges: Introduced a flexible system to define and override parameter ranges for ablation experiments, moving away from hardcoded values.
  • New Parameter Specification Classes: Added CategoricalParamSpecification and FloatParamSpecification to encapsulate parameter range definitions and their sampling logic.
  • Dynamic Parameter Lookup: Implemented a mechanism in Settings to find the most specific parameter range, allowing for module-specific overrides based on dotted keys.
  • Model-Agnostic Defaults: Default parameter ranges are now defined in main.py and stored in config.default.toml, designed to be reusable across different models.
  • Relative Positional Parameters: Positional parameters like direction_index, max_weight_position, and min_weight_distance are now sampled between 0 and 1 and then scaled by last_layer_index.

🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console.

Changelog
  • config.default.toml
    • Added a new [parameter_ranges] section to define default configurable ranges for ablation parameters such as direction_scope, direction_index, max_weight, max_weight_position, min_weight, and min_weight_distance.
    • Included detailed comments explaining the purpose of each parameter and how to specify module-specific overrides using dotted keys or nested objects.
  • src/heretic/config.py
    • Introduced CategoricalParamSpecification and FloatParamSpecification classes, which are Pydantic models designed to define and manage categorical and float parameter ranges, respectively, including methods for Optuna's Trial.suggest_categorical and Trial.suggest_float.
    • Added ParamSpecification, ParamSpecificationRecursive, and ParamSpecificationType type aliases for improved type hinting.
    • Modified the Settings class to include parameter_ranges (a dictionary for configurable overrides) and _parameter_range_defaults (for storing default ranges).
    • Implemented set_parameter_range_defaults to initialize default ranges and _param_spec to dynamically retrieve the most specific parameter specification based on a hierarchical name, supporting overrides.
    • Added convenience methods categorical_spec and float_spec to access parameter specifications.
  • src/heretic/main.py
    • Updated imports to include the new parameter specification classes.
    • In the run function, settings.set_parameter_range_defaults() is now called to register the application's default parameter ranges for ablation.
    • Refactored the objective function to utilize the new configurable parameter range system, replacing direct trial.suggest_categorical and trial.suggest_float calls with calls to settings.categorical_spec() and settings.float_spec().
    • Adjusted the calculation of positional parameters (direction_index, max_weight_position, min_weight_distance) to first sample a 0-1 range from the configured specification and then scale it by last_layer_index.
Activity
  • No human activity (comments, reviews, etc.) has been recorded for this pull request yet.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

@spikymoth spikymoth force-pushed the configurable-params branch from 2f03214 to 804fc13 Compare February 8, 2026 16:01
Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request introduces a flexible configuration system for parameter ranges, which is a great feature for improving the tunability of the abliteration process. The implementation is well-structured, leveraging Pydantic for configuration management and cleanly integrating with Optuna. My review focuses on two main areas: a potential bug in dictionary access that could lead to a KeyError, and a design choice around mutable objects for parameter specifications. While the author noted the trade-offs of the mutable design, I've suggested an alternative immutable approach that would improve safety and prevent potential side effects, especially if the same default specifications are reused. Overall, this is a solid contribution that significantly enhances the tool's flexibility.

@spikymoth spikymoth force-pushed the configurable-params branch from 804fc13 to a68a171 Compare February 8, 2026 16:05
@spikymoth
Copy link
Contributor Author

As expected, Gemini didn't really like my usage of _name and set_name() ;) Let me know what you think. I pushed a small change to turn the asserts into exceptions.

@spikymoth spikymoth force-pushed the configurable-params branch 3 times, most recently from 096ef14 to 62e4109 Compare February 10, 2026 19:08
Copy link
Owner

@p-e-w p-e-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Just a quick first look.

@spikymoth
Copy link
Contributor Author

spikymoth commented Feb 11, 2026

Perhaps another way to go here would be to initialize the settings as simple dicts and lists, then instantiate an immutable *ParamSpecification with the requested name on the fly. They wouldn't even need to inherit from BaseModel or RootModel (though they would be accessed essentially the same way).

The downside is that you would lose pydantic checks for the presence of low, high and the optional log (they could be moved into the constructor, but wouldn't trigger until first use).

Edit: Of course, we could also do both - 2 classes for setting defaults and reading the config, 2 classes for actually using them. 2 more classes, but probably a simpler implementation overall.

@KaraKaraWitch
Copy link

KaraKaraWitch commented Feb 17, 2026

Commenting here since I also looked into this in another PR. While it's less flexible as it doesn't use generic parameter specs, it strictly typed and, in my opinion, much more readable from a maintainer's standpoint.

I have to agree that using 0 to skip a weight is the wrong approach. It adds complexity by conflating quantity (the weight value) with behavior (control flow/skipping).

Plus, it's an additional mental step (and potentially a docstring to read!) to remember that "0 means to skip a layer" rather than "0 means zero weight."

As for the toml format, I would prefer explicit tables over inline dictionaries for readability:

[parameter_ranges]
direction_scope = [
    "global",
    "per layer",
]

[parameter_ranges.direction_index]
low=0.4
high=0.9

...instead of the inline direction_index = { low = 0.4, high = 0.9 }

Lastly, regarding this block in main.py:

    settings.set_parameter_range_defaults(
        {
            "direction_scope": CategoricalParamSpecification(["global", "per layer"]),
            # Discrimination between "harmful" and "harmless" inputs is usually strongest
            # in layers slightly past the midpoint of the layer stack. See the original
            # abliteration paper (https://arxiv.org/abs/2406.11717) for a deeper analysis.
            "direction_index": FloatParamSpecification(low=0.4, high=0.9),
            # The parameter ranges are based on experiments with various models
            # and much wider ranges. They are not set in stone and might have to be
            # adjusted for future models.
            "max_weight": FloatParamSpecification(low=0.8, high=1.5, log=False),
            "max_weight_position": FloatParamSpecification(low=0.6, high=1.0),
            # For sampling purposes, min_weight is expressed as a fraction of max_weight,
            # because multivariate TPE doesn't support variable-range parameters.
            "min_weight": FloatParamSpecification(low=0.0, high=1.0),
            "min_weight_distance": FloatParamSpecification(low=0.0, high=0.6),
        }
    )

We really should put the configuration in the toml rather than in a python.

Addn. We should also generalize parameter_ranges to be constraints since future abliteration methods or pluguns may not use parameter_ranges and the parameter_ranges name would feel misleading.

@spikymoth
Copy link
Contributor Author

I have to agree that using 0 to skip a weight is the wrong approach. It adds complexity by conflating quantity (the weight value) with behavior (control flow/skipping).

Plus, it's an additional mental step (and potentially a docstring to read!) to remember that "0 means to skip a layer" rather than "0 means zero weight."

Point taken, especially with the possibility of a negative minimum weight (for the opposite effect). I still think it would be nice to have an explicit way to disable optimization for a particular module, but maybe there should be a dedicated parameter for it (e.g. disabled_modules or some such that just takes a list of module names).

As for the toml format, I would prefer explicit tables over inline dictionaries for readability:

I think this actually Just Works™, although I haven't tested it. So it would just be a matter of changing the defaults.

Lastly, regarding this block in main.py:

We really should put the configuration in the toml rather than in a python.

I don't see how we could do that; the defaults have to be somewhere and the user's config file might be empty. I did add the defaults to config.default.toml as well (as you saw), but it can't be the only source of truth unless we start explicitly reading config.default.toml (and then plugins would still need some way to augment/replace it).

Addn. We should also generalize parameter_ranges to be constraints since future abliteration methods or plugins may not use parameter_ranges and the parameter_ranges name would feel misleading.

I'm open to other names, though I'm not sure "constraints" is the best name either given that TPESampler takes an explicit constraints_func which Heretic currently doesn't use (and has a different purpose than parameter selection).

@spikymoth
Copy link
Contributor Author

I'll have a think about the best way to incorporate the feedback so far and aim for something more straightforward.

@p-e-w
Copy link
Owner

p-e-w commented Feb 18, 2026

As for the toml format, I would prefer explicit tables over inline dictionaries for readability:

I think this actually Just Works™

Yes. The format you use in this PR is called an "inline table", and per the TOML specification is equivalent to the full table syntax. The configuration file can use either format and will be deserialized to the same Python object.

I actually prefer the inline syntax from this PR for this particular use case, because the number of fields is very small.

@p-e-w
Copy link
Owner

p-e-w commented Feb 18, 2026

We should also generalize parameter_ranges to be constraints since future abliteration methods or pluguns may not use parameter_ranges and the parameter_ranges name would feel misleading.

The name parameter_ranges is indeed problematic, because categorical parameters are not "ranges", but Heretic will not generalize arbitrarily and Optuna will remain our backbone. Therefore, constraints is too generic.

I suggest we simply use the name parameters.

@spikymoth
Copy link
Contributor Author

Sorry about the silence here, I've been a bit drained from work. Planning to do some work on this and my other PRs starting Saturday.

@spikymoth spikymoth force-pushed the configurable-params branch from 14d3b9f to 6efa857 Compare March 2, 2026 12:18
@spikymoth
Copy link
Contributor Author

Okay, refactored significantly.

  • Float param specifications are now represented by FloatParamSpec, which only defines the actual fields.
  • Categorical param lists just use list without any frills
  • New classes ParamCategorical and ParamFloat now handle the actual parameter suggestion. They inherit from base class Parameter.

After using the previous implementation myself, I realized that there was a mismatch between the parameter ranges that get displayed and the parameter ranges that are set in the config file: The default and configurable ranges are between 0 and 1, but the values that are actually used are multiplied by the last layer index.

That makes them annoying to adjust between runs: After figuring out the range you want to use (e.g. for the direction index), you then first have to divide it by the last layer index in order to actually apply it in the config file.

To solve this problem, I moved the multiplication into abliterate, so that the stored and displayed values can match the settings. This does mean that we now have the opposite problem: To map displayed positions and distances, you first have to multiply them with the last layer index. But I personally think that making the parameter ranges configurable is more important.

@spikymoth spikymoth force-pushed the configurable-params branch from 6efa857 to 1aed34c Compare March 9, 2026 22:08
@spikymoth
Copy link
Contributor Author

Okay, I think this is ready for another look. I made it as statically typed as I could while also implementing the features we discussed. Unfortunately it's quite a lot more code.

One annoyance I ran into that I can't really fix: TOML v1.0 doesn't allow inline tables (like { low = 0.0, high = 1.0 }) to span multiple lines. TOML v1.1 does allow it, but that spec is quite recent and Python's tomllib doesn't support it yet. tomli does support it, but Pydantic uses tomllib for Python 3.11+ so you'd need a hacky override like sys.modules["tomllib"] = tomli.

I also couldn't really do static typing for attn.o_proj and mlp.down_proj because class members with dotted keys aren't allowed. I settled for naming the fields attn_o_proj and mlp_down_proj, while still requiring "attn.o_proj" and "mlp.down_proj" on the config file level. That might lead to attn_o_proj and mlp_down_proj being displayed in some configuration error messages, but should otherwise work transparently.

I had to extend the error reporting in main.py slightly as it was only using the first element of error['loc'], which ended up hiding the true location for nested settings. It now also handles array elements like parameters.direction_index[0].

To get errors to show up properly I also had to move some additional validation into the BeforeValidator (instead of just handling constant values like "global" and 1.0 there). Allowing the union types to flow into Pydantic's normal validation made for bad error locations like ('parameters', 'max_weight', 'function-after[_validate_float(), FloatParamSpec]', 'low') and ('parameters', 'max_weight', 'FloatParamModuleSpec', 'attn_o_proj'), which appears to be a known limitation.

Copy link
Owner

@p-e-w p-e-w left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to talk about this in some more detail. I would have expected 80% of the logic in this PR to be handled automatically by Pydantic.

column = "prompt"
prefix = "Write a short story based on the writing prompt below.\n\nWriting prompt:"

# The parameters used to choose suggest settings for abliteration. With the
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove the changes from config.noslop.toml please; we only keep the noslop-relevant settings here and use the defaults for everything else.

direction_fraction = None

parameters = {}
suggested_params: dict[str, AbliterationParameters] = {}
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why this rename?

return UnitParamSpec.model_validate(param)
else:
return UnitParamModuleSpec.model_validate(param)
raise ValueError(f"Cannot determine param type for: {param!r}")
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We have to rethink this code. There is massive duplication here, and pretty much all of this I would expect Pydantic to handle automatically.

)

# Note: Although the above type aliases correctly type categorical parameters,
# ty is unable to see through them, so we use a specialized type here.
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

We're not changing valid code to satisfy ty. This is a shortcoming of ty, and the "solution" is to disable ty on the relevant lines until it is fixed.

)

max_weight_position_fraction: UnitParamTypeComplex = Field(
description="The position (layer) at which the maximum weight should be applied.",
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Misleading, because it's not a layer but a fractional layer index. Same for direction_fraction above.

@p-e-w p-e-w mentioned this pull request Mar 11, 2026
4 tasks
@spikymoth
Copy link
Contributor Author

spikymoth commented Mar 12, 2026

Just organizing my thoughts a bit based on our conversation:

  1. Decouple any conversion and merging logic, make Pydantic purely responsible for type checking (and serialization/deserialization)
    1. @model_validator(mode="after") can be moved to AfterValidator in an Annotated[], but the implementation is still a function - is that okay?
  2. Error messages are problematic with union types, inserting extra components in error["loc"] and reporting errors for every branch.
    1. The "too many errors" thing can be worked around by using a Discriminator and Tags - but the discriminator still takes a function that receives the raw parsed TOML.
    2. Getting rid of the extra entries in error["loc"] entirely requires something more like the current logic (preventing Pydantic from having to resolve the union type).
    3. Alternatively we could modify the error printing logic to omit elements of error["loc"] that we recognize as implementation details.
  3. To be able to natively work with the dotted keys for attn.o_proj and mlp.down_proj we would need a dict[ModelComponent, ...]. MinLen probably does work but I guess we'd want to enforce at least 1 key, not 2.
  4. Defaults should probably be specified on the individual Parameters class fields so that individual parameters can be omitted, instead of passing a single Parameters() with all the defaults.
  5. Custom merging logic could look at the default values via type(Settings).model_fields["parameters"] (from memory) for fallbacks if values for a component are not specified.
  6. The TOML 1.0 limitation is still awkward - I specified all the per-component values in config.default.toml the way I did because if you just have min_weight_relative = { low = 0.0, high = 1.0 } you'd have to either write min_weight_relative = { "attn.o_proj" = { low = 0.0, high = 1.0 }, "mlp.down_proj" = { low = 0.0, high = 1.0 } } (all one line) or switch to [parameters.min_weight_relative] as I did here.

@p-e-w
Copy link
Owner

p-e-w commented Mar 13, 2026

The TOML 1.0 thing is super unfortunate indeed.

I thought some more about upgrading to Python 3.12, but Transformers 5 still supports 3.10, and it appears that you often get 3.10 by default in cloud environments for older (Ampere-series) GPUs. So I don't think this is an option just yet.

@p-e-w
Copy link
Owner

p-e-w commented Mar 13, 2026

Wait... according to the TOML 1.0 spec, the following is possible:

[dog."tater.man"]
type.name = "pug"

Doesn't this completely solve the problem?

@spikymoth
Copy link
Contributor Author

Wait... according to the TOML 1.0 spec, the following is possible:

[dog."tater.man"]
type.name = "pug"

Doesn't this completely solve the problem?

This is essentially what I ended up doing; I went with

# ... other parameters ...

[parameters.min_weight_relative]
"attn.o_proj"   = { low = 0.0, high = 1.0 }
"mlp.down_proj" = { low = 0.0, high = 1.0 }

# ... other parameters ...

for config.default.toml instead of

[parameters]
# ... other parameters ...
min_weight_relative = { low = 0.0, high = 1.0 }
# ... other parameters ...

so users wouldn't need to figure out how to go from the latter to the former themselves.

@spikymoth
Copy link
Contributor Author

Oh, right. The reason I brought it up above is that if we want to make supplying a configuration for both components optional, explicitly specifying both in config.default.toml makes less sense.

@p-e-w
Copy link
Owner

p-e-w commented Mar 15, 2026

I don't understand. What does this have to do with the TOML 1.0 restrictions?

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants