Skip to content

Changelog generation is complicated / takes a long time #9332

@alamb

Description

@alamb

Is your feature request related to a problem or challenge? Please describe what you are trying to do.
This is me ranting (while waiting for the changlog generator to run)

The current instructions in https://github.com/apache/arrow-rs/blob/main/dev/release/README.md consumes far more time than it should in my opinion:

  1. I have to manually edit the script
  2. It takes many minutes to generate a CHANGELOG file
  3. I have to then manually touch up the various CHANGELOGs
  4. It doesn't seem to know how to deal with branches

For example, while creating a CHANGELOG for 57.3.0 (which has 4 commits on the branch) it produces this nonsense (includes commits from main).

This is even after I tried to tell it to use 57_maintenance as the base

diff --git a/dev/release/update_change_log.sh b/dev/release/update_change_log.sh
index 7f0195bbd7b..0617079171b 100755
--- a/dev/release/update_change_log.sh
+++ b/dev/release/update_change_log.sh
@@ -29,8 +29,8 @@

 set -e

-SINCE_TAG="57.1.0"
-FUTURE_RELEASE="57.2.0"
+SINCE_TAG="57.2.0"
+FUTURE_RELEASE="57.3.0"

 SOURCE_DIR="$(cd "$(dirname "${BASH_SOURCE[0]}")" && pwd)"
 SOURCE_TOP_DIR="$(cd "${SOURCE_DIR}/../../" && pwd)"
@@ -49,6 +49,8 @@ docker run -it --rm -e CHANGELOG_GITHUB_TOKEN="$ARROW_GITHUB_API_TOKEN" -v "$(pw
     --max-issues=300 \
     --exclude-tags-regex "^object_store_\d+\.\d+\.\d+$|-rc\d$" \
     --since-tag ${SINCE_TAG} \
+    --since-commit 2026-02-01 00:00:00 \
+    --release-branch 57_maintenance \
     --future-release ${FUTURE_RELEASE}
Details

Changelog

57.3.0 (2026-02-02)

Full Changelog

Implemented enhancements:

  • Optimize data page statistics conversion #9306
  • A more generic convenience method to create list arrays from nested iterators #9267 [arrow]
  • Speed up string view comparison #9253
  • Improve parquet BinaryView / StringView decoder performance #9238
  • Speedup filter (up to 1.5x) filter/BitIndexIterator/iter_set_bits_rev #9230
  • [Parquet] Optimize struct reading #9216
  • [Parquet] Add benchmarks for reading struct arrays from parquet #9209
  • Support casting negative scale decimals to numeric #9201
  • Add ability to reuse DictionaryTracker when creating new IPC Stream #9195
  • [regression] Sealing the Array trait broke downstream crates #9184
  • perf: optimize RowGroupIndexReader for single row group reads #9180
  • Support formatting ListView #9174
  • Row format support for ListView #9173
  • Add lossy flag in CastOptions #9172
  • Consolidate parquet examples into the doc comments #9154
  • Uncomment part of test_utf8_single_column_reader_test #9147
  • Document / Add an example of RowFilter usage #9096 [parquet]
  • Document / Add an example of preserving dictionary encoding when reading parquet #9095 [parquet]
  • Reduce overhead to create an Array from ArrayData (make_array) #9061
  • [arrow-avro] Add Explicit Projection API to ReaderBuilder #8923

...

This may be a function of not configuring the changelog generator correctly but it is really quite a pain

Describe the solution you'd like
Something that I don't have to spend much time working on changelogs

Describe alternatives you've considered
@andygrove made one in DataFusion that seems to work well; https://github.com/apache/datafusion/blob/main/dev/release/generate-changelog.py

Maybe @kylebarron or @Jefffrey have some suggestions for a better solution too

Additional context

Metadata

Metadata

Assignees

No one assigned

    Labels

    enhancementAny new improvement worthy of a entry in the changelog

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions