feat: a manager for SQLAlchemy engines#34826
Conversation
|
Based on your review schedule, I'll hold off on reviewing this PR until it's marked as ready for review. If you'd like me to take a look now, comment
|
b4537e6 to
7ce936a
Compare
a8d1555 to
757f175
Compare
a37d137 to
63ce879
Compare
63ce879 to
4256dba
Compare
✅ Deploy Preview for superset-docs-preview canceled.
|
259eeb2 to
c00fae5
Compare
- Use sshtunnel.open_tunnel() instead of SSHTunnelForwarder directly to properly handle debug_level parameter - Fix keepalive parameter name (set_keepalive, not keepalive) - Fix test assertions that were inside pytest.raises blocks and never executed - now check error_type instead of string messages - Update SSH tunnel test mocks to patch open_tunnel Co-Authored-By: Claude Opus 4.5 <noreply@anthropic.com>
| import pytest | ||
|
|
||
| from superset.engines.manager import _LockManager, EngineManager, EngineModes | ||
|
|
There was a problem hiding this comment.
Suggestion: The tests call EngineManager._get_engine and get_engine without a Flask request or application context, but those methods internally use get_query_source_from_request and get_user_id, which rely on Flask's request and g and will raise "working outside of request context"; patching these functions in the EngineManager module for tests avoids spurious failures unrelated to engine logic. [logic error]
Severity Level: Major ⚠️
- ❌ Unit tests fail with Flask context errors.
- ⚠️ CI pipeline test suite blocked by failures.
- ⚠️ Developer workflow slowed by spurious test noise.| @pytest.fixture(autouse=True) | |
| def _mock_engine_manager_dependencies(monkeypatch): | |
| """ | |
| Avoid Flask request/g dependencies when calling EngineManager in unit tests. | |
| """ | |
| import superset.engines.manager as manager | |
| monkeypatch.setattr(manager, "get_query_source_from_request", lambda: None) | |
| monkeypatch.setattr(manager, "get_user_id", lambda: None) |
Steps of Reproduction ✅
1. Run the unit tests for the engine manager: pytest
tests/unit_tests/engines/manager_test.py. The test module imports EngineManager (file:
tests/unit_tests/engines/manager_test.py) and executes tests such as
test_get_engine_new_mode and test_engine_oauth2_error_handling.
2. In test_get_engine_new_mode (defined in the same file, added around lines 128-149 in
the PR), the test calls engine_manager._get_engine(mock_database, "catalog1", "schema1",
None). That code path inside EngineManager uses helper functions tied to Flask request
context (e.g., get_query_source_from_request / get_user_id) which access flask.request or
flask.g.
3. Because pytest runs these tests outside of any Flask request/app context, calling those
helpers triggers a "working outside of request context" runtime error coming from Flask
internals when EngineManager calls get_query_source_from_request/get_user_id. The failure
surface is seen during test collection/execution of the test functions in this file.
4. The suggested change (add an autouse fixture patching get_query_source_from_request and
get_user_id) prevents the Flask-dependent helpers from executing during tests so the
EngineManager logic is exercised without needing a Flask request context. This reproduces
locally by removing the fixture and re-running pytest to observe the Flask "working
outside of request context" error.Prompt for AI Agent 🤖
This is a comment left during a code review.
**Path:** tests/unit_tests/engines/manager_test.py
**Line:** 27:27
**Comment:**
*Logic Error: The tests call `EngineManager._get_engine` and `get_engine` without a Flask request or application context, but those methods internally use `get_query_source_from_request` and `get_user_id`, which rely on Flask's `request` and `g` and will raise "working outside of request context"; patching these functions in the EngineManager module for tests avoids spurious failures unrelated to engine logic.
Validate the correctness of the flagged issue. If correct, How can I resolve this? If you propose a fix, implement it and please make it concise.There was a problem hiding this comment.
Code Review Agent Run #0c218e
Actionable Suggestions - 5
-
superset/extensions/engine_manager.py - 1
- Incorrect teardown handler registration · Line 74-74
-
tests/unit_tests/engines/manager_test.py - 1
- Blind exception catch too broad · Line 208-208
-
superset/engines/manager.py - 3
- Blind exception catch without specificity · Line 627-627
- KeyError in poolclass lookup · Line 337-337
- Blind exception catch without specificity · Line 627-627
Additional Suggestions - 5
-
tests/unit_tests/engines/manager_test.py - 5
-
Incorrect Mock Setup · Line 165-165The mock for make_url_safe is incorrectly set to return the real_engine, but make_url_safe returns a URL object. This could cause the test to pass incorrectly or mask issues.
Code suggestion
@@ -162,6 +162,7 @@ - from sqlalchemy import create_engine - from sqlalchemy.pool import StaticPool - - real_engine = create_engine("sqlite:///:memory:", poolclass=StaticPool) - mock_create_engine.return_value = real_engine - mock_make_url.return_value = real_engine + from sqlalchemy import create_engine + from sqlalchemy.pool import StaticPool + from sqlalchemy.engine.url import URL + + real_engine = create_engine("sqlite:///:memory:", poolclass=StaticPool) - mock_create_engine.return_value = real_engine + mock_make_url.return_value = URL("sqlite:///:memory:")
-
Missing Test Implementation · Line 176-176The test comments about verifying different parameters create new engines but lacks the implementation.
Code suggestion
@@ -171,1 +171,5 @@ - # Call with different params - should create new engine + # Call with different params - should create new engine + result3 = engine_manager._get_engine(mock_database, "catalog2", "schema1", None) + assert result3 is not result1 # Different engine + assert mock_create_engine.call_count == 2 # Called twice now
-
Incorrect Mock Setup · Line 189-189The mock for make_url_safe should return a URL object instead of the engine; update mock_make_url.return_value to a URL instance.
-
Incomplete Test Assertions · Line 332-335The test does not verify the actual behavior of _get_engine_args, such as correct poolclass for NEW mode.
Code suggestion
@@ -334,2 +334,5 @@ - assert str(uri) == "trino://" - assert "connect_args" in database.get_extra.return_value + assert str(uri) == "trino://" + assert "connect_args" in database.get_extra.return_value + from sqlalchemy.pool import NullPool + assert kwargs['poolclass'] == NullPool + assert kwargs['connect_args'] == {"source": "Apache Superset"}
-
Incomplete Test Assertions · Line 374-381The test verifies the call but not the returned values from impersonation logic.
Code suggestion
@@ -381,1 +381,3 @@ - assert call_args[0][2] is None # access_token (no OAuth2) + assert call_args[0][2] is None # access_token (no OAuth2) + assert uri is mock_uri + assert kwargs == {"connect_args": {"user": "alice", "source": "Apache Superset"}}
-
Review Details
-
Files reviewed - 15 · Commit Range:
5753dfb..c00fae5- superset/config.py
- superset/engines/manager.py
- superset/extensions/__init__.py
- superset/extensions/engine_manager.py
- superset/extensions/ssh.py
- superset/initialization/__init__.py
- superset/models/core.py
- superset/superset_typing.py
- tests/integration_tests/conftest.py
- tests/integration_tests/databases/commands_tests.py
- tests/integration_tests/model_tests.py
- tests/unit_tests/engines/manager_test.py
- tests/unit_tests/initialization_test.py
- tests/unit_tests/models/core_test.py
- tests/unit_tests/sql/execution/conftest.py
-
Files skipped - 1
- UPDATING.md - Reason: Filter setting
-
Tools
- Whispers (Secret Scanner) - ✔︎ Successful
- Detect-secrets (Secret Scanner) - ✔︎ Successful
- MyPy (Static Code Analysis) - ✔︎ Successful
- Astral Ruff (Static Code Analysis) - ✔︎ Successful
Bito Usage Guide
Commands
Type the following command in the pull request comment and save the comment.
-
/review- Manually triggers a full AI review. -
/pause- Pauses automatic reviews on this pull request. -
/resume- Resumes automatic reviews. -
/resolve- Marks all Bito-posted review comments as resolved. -
/abort- Cancels all in-progress reviews.
Refer to the documentation for additional commands.
Configuration
This repository uses Superset You can customize the agent settings here or contact your Bito workspace admin at evan@preset.io.
Documentation & Help
| mock_database, "catalog1", "schema1", None | ||
| ) | ||
| results.append(engine) | ||
| except Exception as e: |
There was a problem hiding this comment.
Replace broad Exception catch with specific exception types. Multiple similar issues exist (line 208). Catch specific exceptions like RuntimeError or AssertionError instead.
Code suggestion
Check the AI-generated fix before applying
| except Exception as e: | |
| except (RuntimeError, AssertionError) as e: |
Code Review Run #0c218e
Should Bito avoid suggestions like this for future reviews? (Manage Rules)
- Yes, avoid them
Code Review Agent Run #741e2dActionable Suggestions - 0Review Details
Bito Usage GuideCommands Type the following command in the pull request comment and save the comment.
Refer to the documentation for additional commands. Configuration This repository uses Documentation & Help |
|
@betodealmeida I believe the original SIP has been abandoned, and before it was closed I left a comment about the design not considering typical distributed worker deployments, for which the singleton doesn't really work: #8574 (comment) With this design connections will be grow linearly with the number of pods, which makes it more difficult to manage multi-pod deployments. So I think we need to revisit the architecture to come up with a design where we can coordinate connection pooling across the whole worker fleet. |
Agree with @villebro. To illustrate the problem in more detail: 1. Connection Pool MultiplicationIn Pod 1: EngineManager → Pool(size=5) → 5 connections With 10 pods and pool size 5, you could hit 50 connections to the database, not 5. This can easily exhaust 2. SSH Tunnel MultiplicationSame problem with SSH tunnels: If you have 20 Celery workers across multiple pods, you could have 20 separate SSH tunnels to the same bastion 3. Celery Prefork Workers Make It WorseCelery's default prefork model spawns multiple worker processes: Pod 1: Each forked process gets its own memory space, so connection pools aren't shared even within a single pod. 4. No Cluster-Wide CoordinationThe cache keys are deterministic but there's no coordination: Each process independently decides to create connections without knowing cluster-wide state. What's Missing for True Connection PoolingFor distributed connection pooling, you'd typically need:
|
|
@villebro @michael-s-molina I understand this is not a solution for every deployment out there, but it's still incredibly valuable for other architectures (like Preset, but also smaller Superset deployments). For distributed workers you can continue using Regardless, IMHO this PR is a big win just for the fact that it centralizes the engine creation in a single place in a Flask extension. |
SUMMARY
This PR implements SIP-26: Connection Pooling for Analytics Database Connections (#8574).
Problem: Superset previously created and discarded database connections for each query without pooling, causing:
Solution: A new
EngineManagerclass that provides centralized SQLAlchemy engine management with two modes:NullPool(current behavior)Key Features:
EngineManager, consolidating logic previously scattered inDatabase.get_sqla_engine()QueuePool,SingletonThreadPool,StaticPool,NullPool, andAssertionPoolvia databaseextraconfigurationSSHManager)EngineManagerExtensionhandles initialization and shutdownConfiguration Options (in
config.py):ENGINE_MANAGER_MODE:EngineModes.NEW(default) orEngineModes.SINGLETONENGINE_MANAGER_CLEANUP_INTERVAL: Interval for cleanup thread (default: 5 minutes)ENGINE_MANAGER_AUTO_START_CLEANUP: Auto-start cleanup in SINGLETON mode (default: True)Code Changes:
superset/engines/manager.pywithEngineManagerclass (~680 lines)superset/extensions/engine_manager.pyFlask extensionDatabase.get_sqla_engine()to delegate toEngineManagersuperset/extensions/ssh.py(functionality merged into EngineManager)DBConnectionMutatorandEngineContextManagerLockManagerandEngineManagerBEFORE/AFTER SCREENSHOTS OR ANIMATED GIF
N/A - Backend infrastructure change with no UI impact.
TESTING INSTRUCTIONS
Verify backward compatibility (NEW mode - default):
Test SINGLETON mode with connection pooling:
Test SSH tunnel connections:
Run unit tests:
ADDITIONAL INFORMATION
SSHManagerin favor of integratedEngineManager)