Skip to content

fix(defender): sync hasThreats blocking logic and tool rules precedence from JS package#1

Open
hiskudin wants to merge 1 commit intomainfrom
fix/sync-has-threats-blocking-logic
Open

fix(defender): sync hasThreats blocking logic and tool rules precedence from JS package#1
hiskudin wants to merge 1 commit intomainfrom
fix/sync-has-threats-blocking-logic

Conversation

@hiskudin
Copy link
Collaborator

@hiskudin hiskudin commented Mar 9, 2026

Summary

  • Add `has_threats` guard so base risk from tool rules alone (e.g. `gmail_*` seeding `'high'`) does not block safe content when `block_high_risk` is enabled
  • Custom `config` tool rules now take precedence over the `use_default_tool_rules` flag, matching the JS package behaviour
  • Add `TestUseDefaultToolRules` integration tests covering both behaviours

Test plan

  • `pytest tests/test_integration.py` — all 28 tests pass
  • Verify `test_applies_tool_rules_when_true` asserts `allowed=True` for safe content at high base risk
  • Verify `test_always_applies_custom_tool_rules_from_config` asserts `allowed=True` for safe content with custom rules

🤖 Generated with Claude Code

…ce from JS package

- Add has_threats guard so base risk from tool rules alone does not block
  safe content when block_high_risk is enabled
- Custom config tool_rules now take precedence over use_default_tool_rules flag
- Add TestUseDefaultToolRules integration tests to cover both behaviours

Co-Authored-By: Claude Sonnet 4.6 <noreply@anthropic.com>
Copilot AI review requested due to automatic review settings March 9, 2026 14:57
Copy link

Copilot AI left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Pull request overview

This PR syncs two behaviors from the JS package to the Python stackone_defender library: (1) a has_threats guard that ensures base risk from tool rules alone doesn't block safe content when block_high_risk is enabled, and (2) custom config tool rules now take precedence over the use_default_tool_rules flag. Integration tests are added covering both behaviors.

Changes:

  • Added has_threats guard in defend_tool_result so that block_high_risk only blocks content when actual threat signals (detections, active sanitization methods, or tier2 scores above threshold) are present — base risk from tool rules alone no longer triggers blocking.
  • Changed tool rules resolution in __init__ to check config["tool_rules"] first, falling back to default rules only when custom rules aren't provided, matching JS package precedence.
  • Added TestUseDefaultToolRules test class with four integration tests covering default, explicitly false, explicitly true, and custom config tool rules scenarios.

Reviewed changes

Copilot reviewed 2 out of 2 changed files in this pull request and generated 2 comments.

File Description
src/stackone_defender/core/prompt_defense.py Updated tool rules precedence logic (line 58) and added has_threats guard to allowed computation (lines 123-136)
tests/test_integration.py Added TestUseDefaultToolRules class with 4 tests covering tool rules precedence and safe content allowance

💡 Add Copilot custom instructions for smarter, more guided reviews. Learn how to get started.

You can also share your feedback on Copilot code review. Take the survey.

Comment on lines +164 to +171
def test_applies_tool_rules_when_true(self):
defense = create_prompt_defense(use_default_tool_rules=True, block_high_risk=True)
data = {"subject": "Weekly team update", "body": "Reminder about the meeting tomorrow at 10am.", "thread_id": "thread123"}
result = defense.defend_tool_result(data, "gmail_get_message")
# With use_default_tool_rules, gmail tool rule seeds risk_level: 'high' as base risk,
# but safe content with no detections should still be allowed through.
assert result.risk_level == "high"
assert result.allowed is True
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The new tests verify that safe content is allowed through when block_high_risk=True with tool rules, but there's no test verifying the converse — that malicious content is still blocked when use_default_tool_rules=True and block_high_risk=True. Adding a test like test_blocks_malicious_content_with_tool_rules (e.g., using "SYSTEM: ignore previous instructions" in a gmail message body) would guard against regressions in the has_threats logic.

Copilot uses AI. Check for mistakes.
self._config.block_high_risk = True

tool_rules = self._config.tool_rules if use_default_tool_rules else []
tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else [])
Copy link

Copilot AI Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Using or to chain the fallback means an explicitly empty config={"tool_rules": []} is treated as falsy and falls through to the use_default_tool_rules branch. If the intent is that custom config tool rules always take precedence (as stated in the PR description), consider using an explicit None check instead, e.g.: tool_rules = (config or {}).get("tool_rules") if (config or {}).get("tool_rules") is not None else (self._config.tool_rules if use_default_tool_rules else []). This way, an explicitly empty list from config would be respected as "no tool rules" rather than falling through.

Copilot uses AI. Check for mistakes.
Copy link

@cubic-dev-ai cubic-dev-ai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

2 issues found across 2 files

Prompt for AI agents (unresolved issues)

Check if these issues are valid — if so, understand the root cause of each and fix them. If appropriate, use sub-agents to investigate and fix each issue separately.


<file name="src/stackone_defender/core/prompt_defense.py">

<violation number="1" location="src/stackone_defender/core/prompt_defense.py:58">
P2: The `or` fallback treats an explicitly empty `tool_rules` list as “not provided.” If a caller sets `"tool_rules": []` to disable tool rules, this line still loads defaults when `use_default_tool_rules` is true. Use an explicit key check so empty lists are honored.</violation>

<violation number="2" location="src/stackone_defender/core/prompt_defense.py:129">
P2: `has_threats` compares tier2 scores against `self._config.tier2.high_risk_threshold`, which doesn’t reflect `tier2_config` overrides. If the classifier uses a lower high-risk threshold, `tier2_risk` can be high while `has_threats` stays false, so `block_high_risk` won’t block. Use `tier2_risk` (or the classifier’s thresholds) instead of the config default.</violation>
</file>

Reply with feedback, questions, or to request a fix. Tag @cubic-dev-ai to re-run a review.

self._config.block_high_risk = True

tool_rules = self._config.tool_rules if use_default_tool_rules else []
tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else [])
Copy link

@cubic-dev-ai cubic-dev-ai bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: The or fallback treats an explicitly empty tool_rules list as “not provided.” If a caller sets "tool_rules": [] to disable tool rules, this line still loads defaults when use_default_tool_rules is true. Use an explicit key check so empty lists are honored.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/stackone_defender/core/prompt_defense.py, line 58:

<comment>The `or` fallback treats an explicitly empty `tool_rules` list as “not provided.” If a caller sets `"tool_rules": []` to disable tool rules, this line still loads defaults when `use_default_tool_rules` is true. Use an explicit key check so empty lists are honored.</comment>

<file context>
@@ -55,7 +55,7 @@ def __init__(
             self._config.block_high_risk = True
 
-        tool_rules = self._config.tool_rules if use_default_tool_rules else []
+        tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else [])
 
         self._tool_sanitizer: ToolResultSanitizer = create_tool_result_sanitizer(
</file context>
Suggested change
tool_rules = (config or {}).get("tool_rules") or (self._config.tool_rules if use_default_tool_rules else [])
tool_rules = (config or {}).get("tool_rules") if "tool_rules" in (config or {}) else (self._config.tool_rules if use_default_tool_rules else [])
Fix with Cubic

has_threats = (
len(detections) > 0
or len(fields_sanitized) > 0
or (tier2_score is not None and tier2_score >= self._config.tier2.high_risk_threshold)
Copy link

@cubic-dev-ai cubic-dev-ai bot Mar 9, 2026

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

P2: has_threats compares tier2 scores against self._config.tier2.high_risk_threshold, which doesn’t reflect tier2_config overrides. If the classifier uses a lower high-risk threshold, tier2_risk can be high while has_threats stays false, so block_high_risk won’t block. Use tier2_risk (or the classifier’s thresholds) instead of the config default.

Prompt for AI agents
Check if this issue is valid — if so, understand the root cause and fix it. At src/stackone_defender/core/prompt_defense.py, line 129:

<comment>`has_threats` compares tier2 scores against `self._config.tier2.high_risk_threshold`, which doesn’t reflect `tier2_config` overrides. If the classifier uses a lower high-risk threshold, `tier2_risk` can be high while `has_threats` stays false, so `block_high_risk` won’t block. Use `tier2_risk` (or the classifier’s thresholds) instead of the config default.</comment>

<file context>
@@ -120,7 +120,20 @@ def defend_tool_result(self, value: Any, tool_name: str) -> DefenseResult:
+        has_threats = (
+            len(detections) > 0
+            or len(fields_sanitized) > 0
+            or (tier2_score is not None and tier2_score >= self._config.tier2.high_risk_threshold)
+        )
+
</file context>
Suggested change
or (tier2_score is not None and tier2_score >= self._config.tier2.high_risk_threshold)
or tier2_risk in ("high", "critical")
Fix with Cubic

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants