feat: make parameter ranges configurable#138
Conversation
Summary of ChangesHello @spikymoth, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a robust and configurable system for managing parameter ranges within the application, particularly for ablation experiments. By defining parameter specifications and a lookup mechanism that supports both global defaults and specific overrides, the changes enhance flexibility and maintainability. This allows users to easily customize the sampling behavior of various parameters, ensuring that the system can adapt to different models and experimental requirements without modifying core logic. Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Changelog
Activity
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
2f03214 to
804fc13
Compare
There was a problem hiding this comment.
Code Review
This pull request introduces a flexible configuration system for parameter ranges, which is a great feature for improving the tunability of the abliteration process. The implementation is well-structured, leveraging Pydantic for configuration management and cleanly integrating with Optuna. My review focuses on two main areas: a potential bug in dictionary access that could lead to a KeyError, and a design choice around mutable objects for parameter specifications. While the author noted the trade-offs of the mutable design, I've suggested an alternative immutable approach that would improve safety and prevent potential side effects, especially if the same default specifications are reused. Overall, this is a solid contribution that significantly enhances the tool's flexibility.
804fc13 to
a68a171
Compare
|
As expected, Gemini didn't really like my usage of |
096ef14 to
62e4109
Compare
|
Perhaps another way to go here would be to initialize the settings as simple dicts and lists, then instantiate an immutable The downside is that you would lose pydantic checks for the presence of Edit: Of course, we could also do both - 2 classes for setting defaults and reading the config, 2 classes for actually using them. 2 more classes, but probably a simpler implementation overall. |
62e4109 to
14d3b9f
Compare
|
Commenting here since I also looked into this in another PR. While it's less flexible as it doesn't use generic parameter specs, it strictly typed and, in my opinion, much more readable from a maintainer's standpoint. I have to agree that using Plus, it's an additional mental step (and potentially a docstring to read!) to remember that "0 means to skip a layer" rather than "0 means zero weight." As for the toml format, I would prefer explicit tables over inline dictionaries for readability: [parameter_ranges]
direction_scope = [
"global",
"per layer",
]
[parameter_ranges.direction_index]
low=0.4
high=0.9...instead of the inline Lastly, regarding this block in main.py: settings.set_parameter_range_defaults(
{
"direction_scope": CategoricalParamSpecification(["global", "per layer"]),
# Discrimination between "harmful" and "harmless" inputs is usually strongest
# in layers slightly past the midpoint of the layer stack. See the original
# abliteration paper (https://arxiv.org/abs/2406.11717) for a deeper analysis.
"direction_index": FloatParamSpecification(low=0.4, high=0.9),
# The parameter ranges are based on experiments with various models
# and much wider ranges. They are not set in stone and might have to be
# adjusted for future models.
"max_weight": FloatParamSpecification(low=0.8, high=1.5, log=False),
"max_weight_position": FloatParamSpecification(low=0.6, high=1.0),
# For sampling purposes, min_weight is expressed as a fraction of max_weight,
# because multivariate TPE doesn't support variable-range parameters.
"min_weight": FloatParamSpecification(low=0.0, high=1.0),
"min_weight_distance": FloatParamSpecification(low=0.0, high=0.6),
}
)We really should put the configuration in the toml rather than in a python. Addn. We should also generalize |
Point taken, especially with the possibility of a negative minimum weight (for the opposite effect). I still think it would be nice to have an explicit way to disable optimization for a particular module, but maybe there should be a dedicated parameter for it (e.g.
I think this actually Just Works™, although I haven't tested it. So it would just be a matter of changing the defaults.
I don't see how we could do that; the defaults have to be somewhere and the user's config file might be empty. I did add the defaults to
I'm open to other names, though I'm not sure "constraints" is the best name either given that |
|
I'll have a think about the best way to incorporate the feedback so far and aim for something more straightforward. |
Yes. The format you use in this PR is called an "inline table", and per the TOML specification is equivalent to the full table syntax. The configuration file can use either format and will be deserialized to the same Python object. I actually prefer the inline syntax from this PR for this particular use case, because the number of fields is very small. |
The name I suggest we simply use the name |
|
Sorry about the silence here, I've been a bit drained from work. Planning to do some work on this and my other PRs starting Saturday. |
14d3b9f to
6efa857
Compare
|
Okay, refactored significantly.
After using the previous implementation myself, I realized that there was a mismatch between the parameter ranges that get displayed and the parameter ranges that are set in the config file: The default and configurable ranges are between 0 and 1, but the values that are actually used are multiplied by the last layer index. That makes them annoying to adjust between runs: After figuring out the range you want to use (e.g. for the direction index), you then first have to divide it by the last layer index in order to actually apply it in the config file. To solve this problem, I moved the multiplication into |
6efa857 to
1aed34c
Compare
|
Okay, I think this is ready for another look. I made it as statically typed as I could while also implementing the features we discussed. Unfortunately it's quite a lot more code. One annoyance I ran into that I can't really fix: TOML v1.0 doesn't allow inline tables (like I also couldn't really do static typing for I had to extend the error reporting in To get errors to show up properly I also had to move some additional validation into the |
p-e-w
left a comment
There was a problem hiding this comment.
We have to talk about this in some more detail. I would have expected 80% of the logic in this PR to be handled automatically by Pydantic.
| column = "prompt" | ||
| prefix = "Write a short story based on the writing prompt below.\n\nWriting prompt:" | ||
|
|
||
| # The parameters used to choose suggest settings for abliteration. With the |
There was a problem hiding this comment.
Remove the changes from config.noslop.toml please; we only keep the noslop-relevant settings here and use the defaults for everything else.
| direction_fraction = None | ||
|
|
||
| parameters = {} | ||
| suggested_params: dict[str, AbliterationParameters] = {} |
| return UnitParamSpec.model_validate(param) | ||
| else: | ||
| return UnitParamModuleSpec.model_validate(param) | ||
| raise ValueError(f"Cannot determine param type for: {param!r}") |
There was a problem hiding this comment.
We have to rethink this code. There is massive duplication here, and pretty much all of this I would expect Pydantic to handle automatically.
| ) | ||
|
|
||
| # Note: Although the above type aliases correctly type categorical parameters, | ||
| # ty is unable to see through them, so we use a specialized type here. |
There was a problem hiding this comment.
We're not changing valid code to satisfy ty. This is a shortcoming of ty, and the "solution" is to disable ty on the relevant lines until it is fixed.
| ) | ||
|
|
||
| max_weight_position_fraction: UnitParamTypeComplex = Field( | ||
| description="The position (layer) at which the maximum weight should be applied.", |
There was a problem hiding this comment.
Misleading, because it's not a layer but a fractional layer index. Same for direction_fraction above.
|
Just organizing my thoughts a bit based on our conversation:
|
|
The TOML 1.0 thing is super unfortunate indeed. I thought some more about upgrading to Python 3.12, but Transformers 5 still supports 3.10, and it appears that you often get 3.10 by default in cloud environments for older (Ampere-series) GPUs. So I don't think this is an option just yet. |
|
Wait... according to the TOML 1.0 spec, the following is possible: [dog."tater.man"]
type.name = "pug"Doesn't this completely solve the problem? |
This is essentially what I ended up doing; I went with # ... other parameters ...
[parameters.min_weight_relative]
"attn.o_proj" = { low = 0.0, high = 1.0 }
"mlp.down_proj" = { low = 0.0, high = 1.0 }
# ... other parameters ...for [parameters]
# ... other parameters ...
min_weight_relative = { low = 0.0, high = 1.0 }
# ... other parameters ...so users wouldn't need to figure out how to go from the latter to the former themselves. |
|
Oh, right. The reason I brought it up above is that if we want to make supplying a configuration for both components optional, explicitly specifying both in |
|
I don't understand. What does this have to do with the TOML 1.0 restrictions? |
Alright, this will probably still need some fine-tuning, but I think it should be relatively simple to extend #53 for this or vice versa.
The idea is as follows:
CategoricalParameterSpecificationandFloatParameterSpecification.suggest_floatorsuggest_categoricalon that specification.The logic in
config.pydoesn't make any assumptions about what parameters are used, so it should be reusable for any set. In the current implementation, we callsettings.set_parameter_range_defaults()frommain.py, but this could be abstracted into aModifieror similar in the context of #53 (as far as I can see, that PR currently does not touch the objective function).The parameter ranges themselves mostly match the current defaults, with 1 caveat:
last_layer_indexafterward.min_weight_distance, this meant I had to set the minimum to 0 which is a slight change. I think this is harmless - a distance of 0 just means that set of parameters will do nothing. Technically values between 0 and 1 post-multiplication can also still be effective since the nearest layer should have a distance < 1. Either way it shouldn't impact the sampler much in practice.A few more notes:
Type,TypeVar,TypeAliasandTypeAliasType. Python 3.12 adds a cleaner syntax for this.suggest_floatorsuggest_categorical, I set private attribute_nameduring_param_spec(). It works fine, but it does mean that the returned object is specific to that loop iteration. Alternatives would be to create a copy with specific name, or repeat the name twice, but the former felt wasteful and the latter felt like a potential footgun.config.default.toml. That's not really required, and it makesconfig.default.tomlspecific to the default ablation method, but I think it's very helpful for discoverability.