fix(defender): sync hasThreats blocking logic and tool rules precedence from JS package#1
fix(defender): sync hasThreats blocking logic and tool rules precedence from JS package#1
Conversation
…ce from JS package - Add has_threats guard so base risk from tool rules alone does not block safe content when block_high_risk is enabled - Custom config tool_rules now take precedence over use_default_tool_rules flag - Add TestUseDefaultToolRules integration tests to cover both behaviours Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
There was a problem hiding this comment.
Pull request overview
This PR syncs two behaviors from the JS package to the Python stackone_defender library: (1) a has_threats guard that ensures base risk from tool rules alone doesn't block safe content when block_high_risk is enabled, and (2) custom config tool rules now take precedence over the use_default_tool_rules flag. Integration tests are added covering both behaviors.
Changes:
- Added
has_threatsguard indefend_tool_resultso thatblock_high_riskonly blocks content when actual threat signals (detections, active sanitization methods, or tier2 scores above threshold) are present — base risk from tool rules alone no longer triggers blocking. - Changed tool rules resolution in
__init__to checkconfig["tool_rules"]first, falling back to default rules only when custom rules aren't provided, matching JS package precedence. - Added
TestUseDefaultToolRulestest class with four integration tests covering default, explicitly false, explicitly true, and custom config tool rules scenarios.
Reviewed changes
Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.
| File | Description |
|---|---|
src/stackone_defender/core/prompt_defense.py |
Updated tool rules precedence logic (line 58) and added has_threats guard to allowed computation (lines 123-136) |
tests/test_integration.py |
Added TestUseDefaultToolRules class with 4 tests covering tool rules precedence and safe content allowance |
💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.
You can also share your feedback on Copilot code review. Take the survey.
| def test_applies_tool_rules_when_true(self): | ||
| defense = create_prompt_defense(use_default_tool_rules=True, block_high_risk=True) | ||
| data = {"subject": "Weekly team update", "body": "Reminder about the meeting tomorrow at 10am.", "thread_id": "thread123"} | ||
| result = defense.defend_tool_result(data, "gmail_get_message") | ||
| # With use_default_tool_rules, gmail tool rule seeds risk_level: 'high' as base risk, | ||
| # but safe content with no detections should still be allowed through. | ||
| assert result.risk_level == "high" | ||
| assert result.allowed is True |
There was a problem hiding this comment.
The new tests verify that safe content is allowed through when block_high_risk=True with tool rules, but there's no test verifying the converse — that malicious content is still blocked when use_default_tool_rules=True and block_high_risk=True. Adding a test like test_blocks_malicious_content_with_tool_rules (e.g., using "SYSTEM: ignore previous instructions" in a gmail message body) would guard against regressions in the has_threats logic.
| self._config.block_high_risk = True | ||
|
|
||
| tool_rules = self._config.tool_rules if use_default_tool_rules else [] | ||
| tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else []) |
There was a problem hiding this comment.
Using or to chain the fallback means an explicitly empty config={"tool_rules": []} is treated as falsy and falls through to the use_default_tool_rules branch. If the intent is that custom config tool rules always take precedence (as stated in the PR description), consider using an explicit None check instead, e.g.: tool_rules = (config or {}).get("tool_rules") if (config or {}).get("tool_rules") is not None else (self._config.tool_rules if use_default_tool_rules else []). This way, an explicitly empty list from config would be respected as "no tool rules" rather than falling through.
There was a problem hiding this comment.
2 issues found across 2 files
Prompt for AI agents (unresolved issues)
Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.
<file name="src/stackone_defender/core/prompt_defense.py">
<violation number="1" location="src/stackone_defender/core/prompt_defense.py:58">
P2: The `or` fallback treats an explicitly empty `tool_rules` list as “not provided.” If a caller sets `"tool_rules": []` to disable tool rules, this line still loads defaults when `use_default_tool_rules` is true. Use an explicit key check so empty lists are honored.</violation>
<violation number="2" location="src/stackone_defender/core/prompt_defense.py:129">
P2: `has_threats` compares tier2 scores against `self._config.tier2.high_risk_threshold`, which doesn’t reflect `tier2_config` overrides. If the classifier uses a lower high-risk threshold, `tier2_risk` can be high while `has_threats` stays false, so `block_high_risk` won’t block. Use `tier2_risk` (or the classifier’s thresholds) instead of the config default.</violation>
</file>
Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.
| self._config.block_high_risk = True | ||
|
|
||
| tool_rules = self._config.tool_rules if use_default_tool_rules else [] | ||
| tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else []) |
There was a problem hiding this comment.
P2: The or fallback treats an explicitly empty tool_rules list as “not provided.” If a caller sets "tool_rules": [] to disable tool rules, this line still loads defaults when use_default_tool_rules is true. Use an explicit key check so empty lists are honored.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/stackone_defender/core/prompt_defense.py, line 58:
<comment>The `or` fallback treats an explicitly empty `tool_rules` list as “not provided.” If a caller sets `"tool_rules": []` to disable tool rules, this line still loads defaults when `use_default_tool_rules` is true. Use an explicit key check so empty lists are honored.</comment>
<file context>
@@ -55,7 +55,7 @@ def __init__(
self._config.block_high_risk = True
- tool_rules = self._config.tool_rules if use_default_tool_rules else []
+ tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else [])
self._tool_sanitizer: ToolResultSanitizer = create_tool_result_sanitizer(
</file context>
| tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else []) | |
| tool_rules = (config or {}).get("tool_rules") if "tool_rules" in (config or {}) else (self._config.tool_rules if use_default_tool_rules else []) |
| has_threats = ( | ||
| len(detections) > 0 | ||
| or len(fields_sanitized) > 0 | ||
| or (tier2_score is not None and tier2_score >= self._config.tier2.high_risk_threshold) |
There was a problem hiding this comment.
P2: has_threats compares tier2 scores against self._config.tier2.high_risk_threshold, which doesn’t reflect tier2_config overrides. If the classifier uses a lower high-risk threshold, tier2_risk can be high while has_threats stays false, so block_high_risk won’t block. Use tier2_risk (or the classifier’s thresholds) instead of the config default.
Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/stackone_defender/core/prompt_defense.py, line 129:
<comment>`has_threats` compares tier2 scores against `self._config.tier2.high_risk_threshold`, which doesn’t reflect `tier2_config` overrides. If the classifier uses a lower high-risk threshold, `tier2_risk` can be high while `has_threats` stays false, so `block_high_risk` won’t block. Use `tier2_risk` (or the classifier’s thresholds) instead of the config default.</comment>
<file context>
@@ -120,7 +120,20 @@ def defend_tool_result(self, value: Any, tool_name: str) -> DefenseResult:
+ has_threats = (
+ len(detections) > 0
+ or len(fields_sanitized) > 0
+ or (tier2_score is not None and tier2_score >= self._config.tier2.high_risk_threshold)
+ )
+
</file context>
| or (tier2_score is not None and tier2_score >= self._config.tier2.high_risk_threshold) | |
| or tier2_risk in ("high", "critical") |
Summary
Test plan
🤖 Generated with Claude Code