Skip to content

Add cache_httpfs for Faster S3 Reads #147

@khalid244

Description

@khalid244

Summary

Add cache_httpfs DuckDB extension to cache S3 Parquet files in memory, improving query performance for Grafana dashboards with many panels.

Changes

// internal/database/duckdb.go

// Cache S3 Parquet files in memory
db.Exec("INSTALL cache_httpfs FROM community")
db.Exec("LOAD cache_httpfs")
db.Exec("SET cache_httpfs_type='in_memory'")

CTE/Subquery Performance Comparison

Run With cache_httpfs Without cache_httpfs Improvement
1 268 ms 1,801 ms 6.7x faster
2 399 ms 4,124 ms 10.3x faster
3 527 ms 2,082 ms 4.0x faster
4 406 ms 2,612 ms 6.4x faster
5 492 ms 5,238 ms 10.6x faster
Average 418 ms 3,171 ms 7.6x faster
Total 2,092 ms 15,857 ms -13,765 ms saved

Why This Helps

When a Grafana dashboard has queries with subqueries (CTEs), the same Parquet files are read multiple times. Without caching, each read downloads from S3. With cache_httpfs, the first read caches the data and subsequent reads are served from memory.

-- This query reads 'measurements' table twice
WITH top_regions AS (
  SELECT region FROM measurements WHERE time > X GROUP BY region LIMIT 50
)
SELECT * FROM measurements WHERE region IN (SELECT region FROM top_regions)

Metadata

Metadata

Assignees

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions