-
Notifications
You must be signed in to change notification settings - Fork 25
Closed
Labels
Description
Summary
Add cache_httpfs DuckDB extension to cache S3 Parquet files in memory, improving query performance for Grafana dashboards with many panels.
Changes
// internal/database/duckdb.go
// Cache S3 Parquet files in memory
db.Exec("INSTALL cache_httpfs FROM community")
db.Exec("LOAD cache_httpfs")
db.Exec("SET cache_httpfs_type='in_memory'")CTE/Subquery Performance Comparison
| Run | With cache_httpfs | Without cache_httpfs | Improvement |
|---|---|---|---|
| 1 | 268 ms | 1,801 ms | 6.7x faster |
| 2 | 399 ms | 4,124 ms | 10.3x faster |
| 3 | 527 ms | 2,082 ms | 4.0x faster |
| 4 | 406 ms | 2,612 ms | 6.4x faster |
| 5 | 492 ms | 5,238 ms | 10.6x faster |
| Average | 418 ms | 3,171 ms | 7.6x faster |
| Total | 2,092 ms | 15,857 ms | -13,765 ms saved |
Why This Helps
When a Grafana dashboard has queries with subqueries (CTEs), the same Parquet files are read multiple times. Without caching, each read downloads from S3. With cache_httpfs, the first read caches the data and subsequent reads are served from memory.
-- This query reads 'measurements' table twice
WITH top_regions AS (
SELECT region FROM measurements WHERE time > X GROUP BY region LIMIT 50
)
SELECT * FROM measurements WHERE region IN (SELECT region FROM top_regions)