Conversation
Summary of ChangesHello @anrp, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request enhances the system's capability to work with Visual Language (VL) models by introducing a configuration flag to identify them. By dynamically selecting the appropriate Hugging Face Highlights
🧠 New Feature in Public Preview: You can now enable Memory to help Gemini Code Assist learn from your team's feedback. This makes future code reviews more consistent and personalized to your project's style. Click here to enable Memory in your admin console. Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Code Review
This pull request introduces support for Vision-Language (VL) models by adding a model_is_vl setting and a get_model_class helper function to select the appropriate AutoModel class. The changes are applied consistently where models are loaded. I have one comment regarding a style guide violation for a missing type hint.
A more critical concern is that the PEFT task_type appears to be hardcoded to CAUSAL_LM in src/heretic/model.py (lines 169 and 239). VL models, such as the Qwen-VL series mentioned in the context of this change, are often sequence-to-sequence models. These require task_type='SEQ_2_SEQ_LM' in the LoraConfig. Using the wrong task type will likely lead to incorrect behavior or errors when abliterating VL models. While this is outside the changed lines of this PR, it's a crucial point to address for this feature to work correctly.
9259582 to
9d41693
Compare
|
Having a configuration flag for this is not the correct solution. This should be determined automatically. |
|
Do you know if there's a reason that it wasn't just using AutoModel to begin with? I think I tried that before but got some vague error, but maybe I just did it wrong... |
|
I never tried |
|
OK, per https://huggingface.co/docs/transformers/model_doc/auto#transformers.AutoModel So we need the "{model}ForConditionalGeneration" type as a return but per that page only the specific AutoModelFor{CausalLM,ImageTextToText,(others)} classes can build them. I think you have to pick the superclass? |
|
Futher evidence that you have to choose the task https://discuss.huggingface.co/t/difference-between-automodel-and-automodelforlm/5967 https://insujang.github.io/2023-04-19/using-huggingface-transformers/ |
|
OK - figured out how to get the class from the config. No config needed now. |
1d18920 to
2cdef6b
Compare
|
/gemini review |
There was a problem hiding this comment.
Code Review
This pull request introduces a valuable enhancement by allowing the ablisteration of Vision-Language (VL) models. It achieves this by dynamically determining the appropriate AutoModel class based on the model's configuration, rather than hardcoding AutoModelForCausalLM. The implementation is clean and correctly applied throughout the model loading and manipulation logic. My review includes a couple of suggestions to further improve code clarity and ensure full adherence to the repository's style guide.
40aab65 to
95a229a
Compare
That's the most generic class transformers have, as last fallback users can have after no class like Having that + |
|
So is this working? Which VL models have you tested it with? |
src/heretic/model.py
Outdated
|
|
||
|
|
||
| def get_model_class( | ||
| settings: Settings, |
There was a problem hiding this comment.
We only use the model field from the settings here, so I think it's cleaner to just pass the model path directly instead of the entire settings object.
src/heretic/model.py
Outdated
| config_type = type(AutoConfig.from_pretrained(settings.model)) | ||
| if config_type in MODEL_FOR_IMAGE_TEXT_TO_TEXT_MAPPING: | ||
| return AutoModelForImageTextToText | ||
| if config_type in MODEL_FOR_CAUSAL_LM_MAPPING: |
There was a problem hiding this comment.
Are those the same dicts that are used by AutoModelFor* internally? We don't want to lose the ability to load models that were previously supported.
There was a problem hiding this comment.
https://github.com/huggingface/transformers/blob/57278c904c5158999d31a0db8bfcd63360c37b48/src/transformers/models/auto/modeling_auto.py#L1979
https://github.com/huggingface/transformers/blob/57278c904c5158999d31a0db8bfcd63360c37b48/src/transformers/models/auto/modeling_auto.py#L2183
Yes - exactly the same.
|
I've tested Qwen3-VL, I will test a few more (InternVL, Qwen2.5-VL), but given that it's the same data structure that transformers uses internally in AutoModel I don't expect trouble. |
|
Let me know when you're satisfied that this works, then I'll merge. |
|
Had to rework this as non-transformer-known-models (InternVL3, InternVL3.5) do not register for auto classes, but instead provide data in the config.json under auto_map (example: https://huggingface.co/OpenGVLab/InternVL3-1B/blob/main/config.json#L7). Also pulled param back to settings, as getting asked repeatedly about remote code was annoying. |
Transformers simply didn't merged those earlier versions at first but for InternVL 3.5 They already did it after sometime "auto_map": {
"AutoConfig": "configuration_internvl_chat.InternVLChatConfig",
"AutoModel": "modeling_internvl_chat.InternVLChatModel",
"AutoModelForCausalLM": "modeling_internvl_chat.InternVLChatModel"
},If it's there it would work with both the Auto Classes, just needs trust_remote_code=True which heretic already has |
src/heretic/model.py
Outdated
|
|
||
| # If the model is not self registered, it may provide task class details, documented at | ||
| # https://huggingface.co/docs/transformers/en/custom_models#upload | ||
| maybe_auto_map = getattr(config, "auto_map", None) |
There was a problem hiding this comment.
Can't we get rid of all that complexity by checking for a vision_config key in the configuration? It seems that most or all VL models use that. Basically, we just load the config.json (which can be done without trust_remote_code), then check for the vision_config key. If it is present, we return AutoModelForImageTextToText, else we return AutoModelForCausalLM.
Per https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes, it indicates that "There is one class of AutoModel for each task." Use the presence of "vision_config" in the config.json to determine which.
|
👍 This looks great now! This has got to be the cleanest way to solve the problem. As far as I'm concerned, this is ready to merge. Are you satisfied that it works correctly with all/most VL models? |
|
I'll test with some models I know can come in under both superclasses (gemma3) as well as some others. |
|
Tested working openai/gpt-oss-20b, google/gemma-3-4b-it, Qwen/Qwen3-0.6B, Qwen/Qwen3-VL-2B-Instruct, Qwen/Qwen2.5-VL-3B-Instruct, OpenGVLab/InternVL3_5-1B-hf |
|
A couple are weird; allenai/Molmo2-8B uses "vit_config" and not "vision_config", mistralai/Ministral-3-3B-Instruct-2512 complains about chat user-assistant ordering (not related to this change) |
|
Please give me a clear signal when this is ready to merge. |
|
This is a strict improvement, and the failures are unrelated (Molmo doesn't support system prompt, ministral 2512 needs transformers 5). |
|
Ok, merged. This is a great improvement! |
Per https://huggingface.co/docs/transformers/en/model_doc/auto#auto-classes, it indicates that "There is one class of AutoModel for each task." Since we're not using AutoModel (allowing it to pick the task automatically), expose a flag to allow for a different "task", which allows (at least) Qwen3-VL to be abliterated.