Skip to content

Conversation

@khalid244
Copy link
Contributor

@khalid244 khalid244 commented Jan 22, 2026

Summary

  • Add cache_httpfs extension support for faster S3 reads
  • Add [query] config section with enable_s3_cache, s3_cache_size, and s3_cache_ttl_seconds options
  • Add logging for cache setup (info on success, warn on failure)
  • Add config tests for defaults and environment variable overrides

Test plan

  • Config tests pass (TestQueryConfig_Defaults, TestQueryConfig_EnvOverride)
  • Build succeeds

Add optional S3 file caching via DuckDB's cache_httpfs extension.
This improves query performance for CTEs/subqueries that read the
same Parquet files multiple times (5-10x faster in benchmarks).

Configuration (disabled by default):
  [query]
  enable_s3_cache = true

Fixes Basekick-Labs#147
Copy link
Member

@xe-nvdk xe-nvdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks for implementing the in-memory cache approach! This addresses the portability concern I raised in Issue #147.

Review Summary

The implementation uses in_memory cache mode which preserves ephemeral compute - good decision.

Required Changes

1. Bug: Config not wired to database

The EnableS3Cache field is added to database.Config but never set in cmd/arc/main.go:

// cmd/arc/main.go:153-169
dbConfig := &database.Config{
    MaxConnections: cfg.Database.MaxConnections,
    // ... other fields ...
    AzureEndpoint:    cfg.Storage.AzureEndpoint,
    // EnableS3Cache is missing here!
}

Fix: Add EnableS3Cache: cfg.Query.EnableS3Cache, to the dbConfig initialization.

2. Add logging for cache setup

When the cache is enabled/disabled, it should be logged:

if cfg.EnableS3Cache {
    logger.Info().Msg("Enabling S3 file caching via cache_httpfs extension")
    if _, err := db.Exec("INSTALL cache_httpfs FROM community"); err == nil {
        // ...
        logger.Info().Msg("cache_httpfs extension loaded with in_memory mode")
    } else {
        logger.Warn().Err(err).Msg("Failed to install cache_httpfs extension, continuing without cache")
    }
}

3. Add tests

Please add at least a config loading test in internal/config/config_test.go:

func TestQueryConfig(t *testing.T) {
    // Test that enable_s3_cache defaults to false
    // Test that it can be set to true
}

Questions

  1. Should we also log the s3_cache_enabled status in the DuckDB initialization log message (line ~84-93)?
  2. Should there be a way to configure the cache size? The default is ~128MB.

Let me know if you have questions!

@khalid244
Copy link
Contributor Author

Thanks for the review @xe-nvdk!

I'll address all the required changes.

For your questions:

  1. Yes, I will add s3_cache_enabled status to the DuckDB initialization log message.

  2. Great idea! I will also add configuration options for cache size and TTL.

- Add cache_httpfs extension support for faster S3 reads
- Add [query] config section with enable_s3_cache, s3_cache_size, and s3_cache_ttl_seconds options
- Add logging for cache setup (info on success, warn on failure)
- Add config tests for defaults and environment variable overrides
@khalid244 khalid244 changed the title Add cache_httpfs extension support for faster S3 reads Add S3 file caching via cache_httpfs extension Jan 22, 2026
@khalid244 khalid244 requested a review from xe-nvdk January 22, 2026 17:25
@xe-nvdk xe-nvdk added the enhancement New feature or request label Jan 22, 2026
@xe-nvdk xe-nvdk merged commit 1e9e164 into Basekick-Labs:main Jan 22, 2026
@xe-nvdk
Copy link
Member

xe-nvdk commented Jan 22, 2026

Thanks @khalid244! Great work on this feature.

Merged and will ship in Arc 2026.02.1. I added a small follow-up commit to log warnings if any of the SET commands fail (edge case, but good to have visibility).

Appreciate you using in-memory caching - keeps Arc's stateless compute philosophy intact.

Copy link
Member

@xe-nvdk xe-nvdk left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Everything looks good! Thank you again

@khalid244
Copy link
Contributor Author

Thanks for the merge and the follow-up improvement! Excited to see it ship in 2026.02.1.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

enhancement New feature or request

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants