Skip to content

Conversation

@svaroqui
Copy link
Collaborator

…dir="./mydump"

Contributor Agreement

By submitting this pull request, I agree to the terms outlined in the Contributor Agreement.

@claude
Copy link

claude bot commented Jan 28, 2026

PR Review: Implement splitdump CLI command

Summary

This PR adds a new splitdump CLI command that converts MySQL/MariaDB dump streams into split table files similar to mydumper output, with gzip support and metadata extraction.

✅ Positive Aspects

  • Clear separation of concerns with proper build tags
  • Correct gzip detection using peek-based approach
  • Good use of goroutine pipeline with channels
  • Proper GTID/binlog regex patterns for both MariaDB and MySQL

🔴 Critical Issues

1. Variable Name Typo (Line 82)
cliOuputDir should be cliOutputDir

2. Broken fmt Calls (Lines 56, 60, 63)
fmt.Sprintf() calls don't print anything - should be fmt.Printf(). Users won't see progress/timing.

3. Unused --inputfile Flag
The flag is defined but SplitDumpLineReader() always reads stdin, ignoring the flag. This breaks the documented feature.

4. Unchecked os.Create() Errors (14+ locations)
Errors are silently ignored with f, _ = os.Create(). Could cause data loss or panics.

5. No Goroutine Error Handling
If SplitDumpLineReader panics, SplitDumpLineParser will deadlock. No recovery mechanism.

6. Unused splitDumpOpenReader() Function
Defined but never called. Also has unchecked gzip.NewReader() error.

🟡 Medium Issues

  • OOM Risk: 1GB per shard hardcoded with no bounds checking
  • String Performance: Multiple Replace operations in tight loop
  • Incomplete Metadata: No warning if binlog info missing
  • Race Condition: cliOuputDir could be modified during execution
  • Misleading Function Name: initStatusFlagsDumpSplit() isn't status-related
  • No Input Validation: Table names not sanitized, output dir not validated
  • Unused Channels: SplitDumpChannelBus has unused fields

📋 Recommendations (Priority)

MUST FIX (blocks merge):

  1. Change fmt.Sprintffmt.Printf (3 calls)
  2. Implement --inputfile support or remove flag
  3. Add error handling to all os.Create calls

SHOULD FIX (before merge):
4. Fix typo: cliOuputDircliOutputDir
5. Add panic recovery in goroutines
6. Handle gzip.NewReader errors properly
7. Remove unused splitDumpOpenReader() function

NICE TO HAVE:
8. Sanitize table names for path safety
9. Validate output directory writable on startup
10. Add metadata validation warnings
11. Improve string operations for performance

🔐 Security Concerns

  • Table names from SQL comments not sanitized - potential path traversal
  • Output files world-readable (default 0644) - consider restricting to 0600

Questions

  1. Is --inputfile intentionally stubbed or unfinished?
  2. What's the expected behavior for permission errors on output directory?
  3. Should metadata file be created with empty binlog info or skipped entirely?

Status: 🔴 Requires Changes - Multiple critical issues must be addressed before production use.

@svaroqui
Copy link
Collaborator Author

PR Review: Implement splitdump CLI command

Summary

This PR adds a new splitdump CLI command that converts MySQL/MariaDB dump streams into split table files similar to mydumper output, with gzip support and metadata extraction.

✅ Positive Aspects

  • Clear separation of concerns with proper build tags
  • Correct gzip detection using peek-based approach
  • Good use of goroutine pipeline with channels
  • Proper GTID/binlog regex patterns for both MariaDB and MySQL

🔴 Critical Issues

1. Variable Name Typo (Line 82) cliOuputDir should be cliOutputDir

2. Broken fmt Calls (Lines 56, 60, 63) fmt.Sprintf() calls don't print anything - should be fmt.Printf(). Users won't see progress/timing.

3. Unused --inputfile Flag The flag is defined but SplitDumpLineReader() always reads stdin, ignoring the flag. This breaks the documented feature.

4. Unchecked os.Create() Errors (14+ locations) Errors are silently ignored with f, _ = os.Create(). Could cause data loss or panics.

5. No Goroutine Error Handling If SplitDumpLineReader panics, SplitDumpLineParser will deadlock. No recovery mechanism.

6. Unused splitDumpOpenReader() Function Defined but never called. Also has unchecked gzip.NewReader() error.

🟡 Medium Issues

  • OOM Risk: 1GB per shard hardcoded with no bounds checking
  • String Performance: Multiple Replace operations in tight loop
  • Incomplete Metadata: No warning if binlog info missing
  • Race Condition: cliOuputDir could be modified during execution
  • Misleading Function Name: initStatusFlagsDumpSplit() isn't status-related
  • No Input Validation: Table names not sanitized, output dir not validated
  • Unused Channels: SplitDumpChannelBus has unused fields

📋 Recommendations (Priority)

MUST FIX (blocks merge):

  1. Change fmt.Sprintffmt.Printf (3 calls)
  2. Implement --inputfile support or remove flag
  3. Add error handling to all os.Create calls

SHOULD FIX (before merge): 4. Fix typo: cliOuputDircliOutputDir 5. Add panic recovery in goroutines 6. Handle gzip.NewReader errors properly 7. Remove unused splitDumpOpenReader() function

NICE TO HAVE: 8. Sanitize table names for path safety 9. Validate output directory writable on startup 10. Add metadata validation warnings 11. Improve string operations for performance

🔐 Security Concerns

  • Table names from SQL comments not sanitized - potential path traversal
  • Output files world-readable (default 0644) - consider restricting to 0600

Questions

  1. Is --inputfile intentionally stubbed or unfinished?
  2. What's the expected behavior for permission errors on output directory?
  3. Should metadata file be created with empty binlog info or skipped entirely?

Status: 🔴 Requires Changes - Multiple critical issues must be addressed before production use.

@svaroqui svaroqui closed this Jan 28, 2026
@svaroqui svaroqui reopened this Jan 28, 2026
@svaroqui svaroqui merged commit 1e2959a into develop Jan 28, 2026
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants