-
Notifications
You must be signed in to change notification settings - Fork 1k
Add structured datasets loading capability in valkey benchmark #2823
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Open
VoletiRam
wants to merge
15
commits into
valkey-io:unstable
Choose a base branch
from
VoletiRam:dataset-support
base: unstable
Could not load branches
Branch not found: {{ refName }}
Loading
Could not load tags
Nothing to show
Loading
Are you sure you want to change the base?
Some commits from the old base branch may be removed from the timeline,
and old review comments may become outdated.
Open
Changes from 7 commits
Commits
Show all changes
15 commits
Select commit
Hold shift + click to select a range
211ad83
Add structured datasets loading capability in valkey benchmark
fa33f69
Fix memory leak in memory reporting
ba03cc6
Add documentation for valkey-benchmark and improve xml field discovery
933ee94
Merge branch 'unstable' into dataset-support
be05541
Fix memory leak in xml scan
1833c6e
Load fields only we care about from dataset
b3a4516
Remove warning log for incomplete document
8053c4d
Separate dataset changes into new file
f98d01d
Merge remote-tracking branch 'fork/unstable' into dataset-support
6d3c7fc
Fix build
4562857
Fix cmake build issue for latest ubuntu-cmake
d088b22
Replace placeholders with in same field
f98630f
Address the comments on cmake file, field size and count limitation
6ff88c9
Merge remote-tracking branch 'origin/unstable' into dataset-support
bc3432c
Address comments on license and make file
File filter
Filter by extension
Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
There are no files selected for viewing
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
| Original file line number | Diff line number | Diff line change |
|---|---|---|
| @@ -0,0 +1,272 @@ | ||
| # Valkey Benchmark | ||
|
|
||
| Benchmark utility for measuring Valkey server performance. | ||
|
|
||
| ```bash | ||
| valkey-benchmark [OPTIONS] [--] [COMMAND ARGS...] | ||
| ``` | ||
|
|
||
| ## Connection Options | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `-h <hostname>` | Server hostname (default: 127.0.0.1) | | ||
| | `-p <port>` | Server port (default: 6379) | | ||
| | `-s <socket>` | Server socket (overrides host and port) | | ||
| | `-u <uri>` | Server URI: `valkey://user:password@host:port/dbnum` | | ||
| | `-a <password>` | Password for Valkey Auth | | ||
| | `--user <username>` | Used to send ACL style 'AUTH username pass'. Needs `-a` | | ||
| | `--dbnum <db>` | SELECT the specified db number (default: 0) | | ||
| | `-3` | Start session in RESP3 protocol mode | | ||
|
|
||
| ## Performance Options | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `-c <clients>` | Number of parallel connections (default: 50) | | ||
| | `-n <requests>` | Total number of requests (default: 100000) | | ||
| | `-d <size>` | Data size of SET/GET value in bytes (default: 3) | | ||
| | `-P <numreq>` | Pipeline requests (default: 1, no pipeline) | | ||
| | `-k <boolean>` | Keep alive: 1=keep alive, 0=reconnect (default: 1) | | ||
| | `--threads <num>` | Enable multi-thread mode | | ||
| | `--rps <requests>` | Limit requests per second (default: 0, no limit) | | ||
|
|
||
| ## Test Selection | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `-t <tests>` | Comma-separated list of tests to run | | ||
| | `-l` | Loop mode: run tests forever | | ||
| | `-I` | Idle mode: open N idle connections and wait | | ||
|
|
||
| Available tests: `ping`, `ping_inline`, `ping_mbulk`, `set`, `get`, `incr`, `lpush`, `rpush`, `lpop`, `rpop`, `sadd`, `hset`, `spop`, `zadd`, `zpopmin`, `lrange`, `lrange_100`, `lrange_300`, `lrange_500`, `lrange_600`, `mset`, `mget`, `xadd`, `function_load`, `fcall` | ||
|
|
||
| ## Output Options | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `-q` | Quiet mode: show only query/sec values | | ||
| | `--csv` | Output in CSV format | | ||
| | `--precision` | Number of decimal places in latency output (default: 0) | | ||
|
|
||
| ## Cluster Options | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `--cluster` | Enable cluster mode | | ||
| | `--rfr <mode>` | Read from replicas: `no`/`yes`/`all` (default: `no`) | | ||
|
|
||
| ## Randomization Options | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `-r <keyspacelen>` | Use random keys in range [0, keyspacelen-1] | | ||
| | `--sequential` | Use sequential numbers instead of random | | ||
| | `--seed <num>` | Set random number generator seed | | ||
|
|
||
| ## Dataset Support | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `--dataset <file>` | Dataset file for field placeholder replacement | | ||
|
|
||
| ### File Formats | ||
|
|
||
| **CSV** | ||
| ```csv | ||
| term,category | ||
| anarchism,politics | ||
| democracy,politics | ||
| ``` | ||
| Header row required, comma-delimited, field names become `__field:name__` placeholders. | ||
|
|
||
| **TSV** | ||
| Tab-delimited with header row. | ||
|
|
||
| **XML** | ||
| ```xml | ||
| <page> | ||
| <title>Anarchism</title> | ||
| <id>12</id> | ||
| <revision> | ||
| <id>1317806107</id> | ||
| <text bytes="112881">Article content...</text> | ||
| </revision> | ||
| </page> | ||
| ``` | ||
| Requires `--xml-root-element` parameter. Root element choice affects discovered fields - deeper elements include nested content. | ||
|
|
||
| ### Dataset Behavior | ||
|
|
||
| - One row per command | ||
| - Sequential iteration with wraparound | ||
| - Thread-safe atomic selection | ||
| - Duplicate XML field names: first occurrence wins | ||
|
|
||
| ### Usage | ||
|
|
||
| ```bash | ||
| # CSV dataset | ||
| valkey-benchmark --dataset terms.csv \ | ||
| -n 50000 FT.SEARCH myindex "__field:term__" | ||
|
|
||
| # Wikipedia XML | ||
| valkey-benchmark --dataset wiki.xml --xml-root-element page \ | ||
| -n 10000 HSET "doc:__rand_int__" title "__field:title__" body "__field:text__" | ||
| ``` | ||
|
|
||
| **Memory:** Large datasets may require GB-scale RAM. | ||
|
|
||
| ## Additional Options | ||
|
|
||
| | Option | Description | | ||
| |--------|-------------| | ||
| | `--enable-tracking` | Send CLIENT TRACKING on | | ||
| | `--num-functions <num>` | Functions in Lua lib (default: 10) | | ||
| | `--num-keys-in-fcall <num>` | Keys for FCALL (default: 1) | | ||
| | `--seed <num>` | RNG seed | | ||
| | `-x` | Read last arg from STDIN | | ||
| | `--mptcp` | Enable MPTCP | | ||
| | `--help` | Show help | | ||
| | `--version` | Show version | | ||
|
|
||
| ## Placeholder System | ||
|
|
||
| ### Random Placeholders | ||
|
|
||
| | Placeholder | Behavior | | ||
| |-------------|----------| | ||
| | `__rand_int__` | Different random value per occurrence | | ||
| | `__rand_1st__` | Same random value for all occurrences in command | | ||
| | `__rand_2nd__` | Same random value for all occurrences in command | | ||
| | ... | ... | | ||
| | `__rand_9th__` | Same random value for all occurrences in command | | ||
|
|
||
| Random values are 12-digit zero-padded numbers in range [0, keyspacelen-1]. | ||
|
|
||
| ### Data Placeholders | ||
|
|
||
| | Placeholder | Description | | ||
| |-------------|-------------| | ||
| | `__data__` | Random data of size specified by `-d` option | | ||
|
|
||
| ### Cluster Placeholders | ||
|
|
||
| | Placeholder | Description | | ||
| |-------------|-------------| | ||
| | `{tag}` | Cluster slot hashtag for proper key distribution | | ||
|
|
||
| Required in cluster mode to ensure commands route to correct nodes. | ||
|
|
||
| ## Command Sequences | ||
|
|
||
| Commands can be chained using semicolon separators: | ||
|
|
||
| ```bash | ||
| valkey-benchmark -- multi ';' set key:__rand_int__ __data__ ';' incr counter ';' exec | ||
| ``` | ||
|
|
||
| ### Repetition Syntax | ||
|
|
||
| Prefix commands with a number to repeat: | ||
|
|
||
| ```bash | ||
| valkey-benchmark -- 5 set key:__rand_int__ value ';' get key:__rand_int__ | ||
| ``` | ||
|
|
||
| This executes 5 SET commands followed by 1 GET command per pipeline iteration. | ||
|
|
||
|
|
||
| ## Examples | ||
|
|
||
| ### Basic Benchmarking | ||
|
|
||
| ```bash | ||
| # Default benchmark suite | ||
| valkey-benchmark | ||
|
|
||
| # Specific tests | ||
| valkey-benchmark -t ping,set,get -n 100000 | ||
|
|
||
| # Custom data size | ||
| valkey-benchmark -t set -d 1024 -n 50000 | ||
| ``` | ||
|
|
||
| ### Random Key Distribution | ||
|
|
||
| ```bash | ||
| # Random keys in range [0, 999999] | ||
| valkey-benchmark -t set,get -r 1000000 -n 100000 | ||
|
|
||
| # Sequential keys | ||
| valkey-benchmark -t set --sequential -r 1000000 -n 100000 | ||
| ``` | ||
|
|
||
| ### Dataset-Driven Benchmarking | ||
|
|
||
| ```bash | ||
| # CSV dataset | ||
| valkey-benchmark --dataset terms.csv \ | ||
| -n 50000 FT.SEARCH myindex "__field:term__" | ||
|
|
||
| # Wikipedia XML dataset (page-level) | ||
| valkey-benchmark --dataset wiki_sample.xml --xml-root-element page \ | ||
| -n 10000 HSET "doc:__rand_int__" title "__field:title__" content "__field:text__" id "__field:id__" | ||
|
|
||
| # Wikipedia XML dataset (revision-level) | ||
| valkey-benchmark --dataset wiki_sample.xml --xml-root-element revision \ | ||
| -n 10000 HSET "doc:__rand_int__" content "__field:text__" timestamp "__field:timestamp__" | ||
|
|
||
| # Multiple field usage | ||
| valkey-benchmark --dataset products.csv \ | ||
| -- HSET product:__field:id__ name "__field:name__" price __field:price__ | ||
| ``` | ||
|
|
||
| ### Cluster Benchmarking | ||
|
|
||
| ```bash | ||
| # Cluster mode with proper key distribution | ||
| valkey-benchmark --cluster -t set,get \ | ||
| -- SET key:{tag}:__rand_int__ __data__ | ||
|
|
||
| # Read from replicas | ||
| valkey-benchmark --cluster --rfr yes -t get \ | ||
| -- GET key:{tag}:__rand_int__ | ||
| ``` | ||
|
|
||
| ### Pipelining | ||
|
|
||
| ```bash | ||
| # Pipeline 10 requests | ||
| valkey-benchmark -P 10 -t set -n 100000 | ||
|
|
||
| # Pipeline with datasets | ||
| valkey-benchmark --dataset terms.csv -P 5 \ | ||
| -n 50000 FT.SEARCH index "__field:term__" | ||
| ``` | ||
|
|
||
| ### Complex Command Sequences | ||
|
|
||
| ```bash | ||
| # Transaction benchmark | ||
| valkey-benchmark -r 100000 -n 10000 \ | ||
| -- multi ';' set key:__rand_int__ __data__ ';' \ | ||
| incr counter:__rand_int__ ';' exec | ||
|
|
||
| # Mixed operations with repetition | ||
| valkey-benchmark -r 100000 \ | ||
| -- 3 set key:__rand_int__ __data__ ';' \ | ||
| 2 get key:__rand_int__ ';' \ | ||
| del key:__rand_int__ | ||
| ``` | ||
|
|
||
| ### Rate Limiting | ||
|
|
||
| ```bash | ||
| # Limit to 1000 requests/second | ||
| valkey-benchmark --rps 1000 -t set -n 50000 | ||
|
|
||
| # Dataset with rate limiting | ||
| valkey-benchmark --dataset search_terms.csv --rps 500 \ | ||
| -n 10000 FT.SEARCH index "__field:term__" | ||
| ``` |
Oops, something went wrong.
Oops, something went wrong.
Add this suggestion to a batch that can be applied as a single commit.
This suggestion is invalid because no changes were made to the code.
Suggestions cannot be applied while the pull request is closed.
Suggestions cannot be applied while viewing a subset of changes.
Only one suggestion per line can be applied in a batch.
Add this suggestion to a batch that can be applied as a single commit.
Applying suggestions on deleted lines is not supported.
You must change the existing code in this line in order to create a valid suggestion.
Outdated suggestions cannot be applied.
This suggestion has been applied or marked resolved.
Suggestions cannot be applied from pending reviews.
Suggestions cannot be applied on multi-line comments.
Suggestions cannot be applied while the pull request is queued to merge.
Suggestion cannot be applied right now. Please check back later.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This doc is great! I noticed
--warmupand--durationoptions weren't documented. Are there other options not covered? I'd like to either document all of them or have a list of undocumented options here. :)There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Done.