Skip to content

feat: apply RLS conservatively#38683

Open
betodealmeida wants to merge 3 commits intomasterfrom
conservative-rls
Open

feat: apply RLS conservatively#38683
betodealmeida wants to merge 3 commits intomasterfrom
conservative-rls

Conversation

@betodealmeida
Copy link
Member

@betodealmeida betodealmeida commented Mar 17, 2026

User description

SUMMARY

This PR fixes errors like this:

psycopg2.errors.UndefinedColumn: column "is_green" does not exist in virtual_table

This happens on Redshift (and potentially other databases) when RLS rules are applied to virtual datasets whose SQL uses table-name-qualified column references without an explicit alias.

Root Cause

RLSAsSubqueryTransformer replaces tables with filtered subqueries when applying RLS. When a table has no explicit alias, it constructed the subquery alias from the fully-qualified table name:

-- Virtual dataset SQL:
SELECT pens.pen_id, pens.is_green FROM public.pens

-- After RLS (before fix):
SELECT pens.pen_id, pens.is_green
-- column refs use "pens" ↑
FROM (SELECT * FROM public.pens WHERE user_id = 1) AS "public.pens"
--                                        but alias is ↑ "public.pens"

The column references (pens.column) can't resolve against the quoted alias "public.pens" because they're different identifiers, causing Redshift to return column "is_green" does not exist in virtual_table.

Fix

  • superset/sql/parse.py — Use just the table name as the subquery alias instead of the schema-qualified path:
  -- After RLS (after fix):
  SELECT pens.pen_id, pens.is_green
  FROM (SELECT * FROM public.pens WHERE user_id = 1) AS "pens"
  • superset/utils/rls.pyapply_rls() now returns bool indicating whether any RLS predicates were actually applied.
  • superset/models/helpers.pyget_from_clause() and validate_adhoc_subquery() only regenerate SQL through sqlglot's format() when RLS was actually applied. Previously, all virtual dataset SQL was round-tripped
    through sqlglot even when no RLS rules existed, which could silently rewrite dialect-specific syntax (e.g., NVLCOALESCE, current_timestampGETDATE(), :: castsCAST() on Redshift).

BEFORE/AFTER SCREENSHOTS OR ANIMATED GIF

N/A

TESTING INSTRUCTIONS

Added 12 unit tests in tests/unit_tests/models/test_virtual_dataset_format.py covering:

  • SQL is preserved verbatim when no RLS applies (4 tests including Redshift-specific syntax)
  • SQL is regenerated when RLS is applied (1 test)
  • apply_rls() return value correctness (3 tests)
  • RLS subquery alias uses table name only, not schema-qualified path (4 tests)

ADDITIONAL INFORMATION

  • Has associated issue:
  • Required feature flags:
  • Changes UI
  • Includes DB Migration (follow approval process in SIP-59)
    • Migration is atomic, supports rollback & is backwards-compatible
    • Confirm DB migration upgrade and downgrade tested
    • Runtime estimates and downtime expectations provided
  • Introduces new feature or API
  • Removes existing feature or API

CodeAnt-AI Description

Apply RLS only when it changes SQL and fix subquery aliasing for virtual datasets

What Changed

  • RLS application now returns whether any predicates were actually applied; code only regenerates SQL when RLS changed the query
  • Virtual dataset SQL is preserved verbatim when no RLS rules apply, avoiding silent rewrites of dialect-specific syntax (e.g., NVL, :: casts, current_timestamp)
  • When RLS replaces a table with a filtered subquery, the subquery alias uses just the table name (not schema/catalog-qualified name) so column references like table.col resolve correctly on Redshift
  • Unit tests added to validate: no sqlglot round-trip when RLS not applied, correct formatting when RLS applied, apply_rls boolean return behavior, and correct subquery aliases

Impact

✅ Fewer Redshift "column does not exist" errors for virtual datasets
✅ Preserves Redshift-specific SQL syntax in virtual datasets when RLS doesn't apply
✅ Clearer detection when RLS changes a query (boolean result for callers)

💡 Usage Guide

Checking Your Pull Request

Every time you make a pull request, our system automatically looks through it. We check for security issues, mistakes in how you're setting up your infrastructure, and common code problems. We do this to make sure your changes are solid and won't cause any trouble later.

Talking to CodeAnt AI

Got a question or need a hand with something in your pull request? You can easily get in touch with CodeAnt AI right here. Just type the following in a comment on your pull request, and replace "Your question here" with whatever you want to ask:

@codeant-ai ask: Your question here

This lets you have a chat with CodeAnt AI about your pull request, making it easier to understand and improve your code.

Example

@codeant-ai ask: Can you suggest a safer alternative to storing this secret?

Preserve Org Learnings with CodeAnt

You can record team preferences so CodeAnt AI applies them in future reviews. Reply directly to the specific CodeAnt AI suggestion (in the same thread) and replace "Your feedback here" with your input:

@codeant-ai: Your feedback here

This helps CodeAnt AI learn and adapt to your team's coding style and standards.

Example

@codeant-ai: Do not flag unused imports.

Retrigger review

Ask CodeAnt AI to review the PR again, by typing:

@codeant-ai: review

Check Your Repository Health

To analyze the health of your code repository, visit our dashboard at https://app.codeant.ai. This tool helps you identify potential issues and areas for improvement in your codebase, ensuring your repository maintains high standards of code health.

@betodealmeida betodealmeida marked this pull request as ready for review March 17, 2026 00:20
@dosubot dosubot bot added authentication:row-level-security Related to Row Level Security change:backend Requires changing the backend data:dataset Related to dataset configurations labels Mar 17, 2026
@codeant-ai-for-open-source codeant-ai-for-open-source bot added the size:L This PR changes 100-499 lines, ignoring generated files label Mar 17, 2026
@codeant-ai-for-open-source
Copy link
Contributor

Sequence Diagram

This PR changes RLS handling to avoid unnecessary SQL reformatting and to preserve valid column resolution. SQL is only regenerated when RLS predicates are actually applied, and subquery aliases now use the base table name so qualified column references continue to work.

sequenceDiagram
    participant QueryFlow
    participant SQLHandler
    participant RLSEngine
    participant SQLTransformer

    QueryFlow->>SQLHandler: Build SQL for virtual or adhoc query
    SQLHandler->>RLSEngine: Apply RLS to parsed statement
    RLSEngine->>SQLTransformer: Wrap table with filtered subquery using table name alias
    SQLTransformer-->>RLSEngine: Updated statement
    RLSEngine-->>SQLHandler: Return rls applied true or false
    alt RLS applied
        SQLHandler->>SQLHandler: Regenerate SQL from updated AST
    else No RLS predicates
        SQLHandler->>SQLHandler: Keep original SQL text unchanged
    end
    SQLHandler-->>QueryFlow: Return final SQL for execution
Loading

Generated by CodeAnt AI

Copy link
Contributor

@bito-code-review bito-code-review bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review Agent Run #a1ed00

Actionable Suggestions - 1
  • superset/models/helpers.py - 1
Review Details
  • Files reviewed - 4 · Commit Range: 06b23f3..dd98832
    • superset/models/helpers.py
    • superset/sql/parse.py
    • superset/utils/rls.py
    • tests/unit_tests/models/test_virtual_dataset_format.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

@betodealmeida betodealmeida force-pushed the conservative-rls branch 2 times, most recently from 9c26ef0 to ab8bdf3 Compare March 17, 2026 00:49
@bito-code-review
Copy link
Contributor

bito-code-review bot commented Mar 17, 2026

Code Review Agent Run #0056b9

Actionable Suggestions - 0
Additional Suggestions - 1
  • tests/unit_tests/models/test_virtual_dataset_format.py - 1
    • Incomplete test assertion · Line 366-382
      The test verifies that unqualified columns work with RLS subquery transformation and that the predicate is applied, but it should also check that the alias is properly generated for tables without explicit aliases. This ensures the transformation creates the expected subquery structure.
      Code suggestion
       @@ -381,2 +381,3 @@
      -        assert "is_green" in result
      -        assert "WHERE" in result  # RLS predicate applied
      +        assert "is_green" in result
      +        assert "WHERE" in result  # RLS predicate applied
      +        assert 'AS "pens"' in result
Review Details
  • Files reviewed - 6 · Commit Range: 0bd4983..28e1ee8
    • superset/models/helpers.py
    • superset/sql/parse.py
    • superset/utils/rls.py
    • tests/integration_tests/sqla_models_tests.py
    • tests/unit_tests/models/test_virtual_dataset_format.py
    • tests/unit_tests/sql/parse_tests.py
  • Files skipped - 0
  • Tools
    • Whispers (Secret Scanner) - ✔︎ Successful
    • Detect-secrets (Secret Scanner) - ✔︎ Successful
    • MyPy (Static Code Analysis) - ✔︎ Successful
    • Astral Ruff (Static Code Analysis) - ✔︎ Successful

Bito Usage Guide

Commands

Type the following command in the pull request comment and save the comment.

  • /review - Manually triggers a full AI review.

  • /pause - Pauses automatic reviews on this pull request.

  • /resume - Resumes automatic reviews.

  • /resolve - Marks all Bito-posted review comments as resolved.

  • /abort - Cancels all in-progress reviews.

Refer to the documentation for additional commands.

Configuration

This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.

Documentation & Help

AI Code Review powered by Bito Logo

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

authentication:row-level-security Related to Row Level Security change:backend Requires changing the backend data:dataset Related to dataset configurations preset-io size/L size:L This PR changes 100-499 lines, ignoring generated files

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants