file_util: improve rmtree performance #1672

nikitych · 2025-12-09T14:26:02Z

Optimize rmtree to significantly reduce cleanup time, especially for large buildroots (e.g., from ~13 minutes for a ~2M-file tree down to ~1 minute).

Profiling showed that a substantial amount of time was spent in the trace decorator invoked on every recursive rmtree call.

Mon Dec  8 13:31:32 2025    /tmp/out.prof

         3207566512 function calls (3205556916 primitive calls) in 1490.879 seconds

   Ordered by: cumulative time
   List reduced from 463 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
1004587/1   20.540    0.000 1499.566 1499.566 /usr/lib/python3.9/site-packages/mockbuild/trace_decorator.py:57(trace)
1004587/1    9.368    0.000 1499.554 1499.554 /usr/lib/python3.9/site-packages/mockbuild/file_util.py:34(rmtree)
  1004587   30.361    0.000 1341.630    0.001 /usr/lib64/python3.9/inspect.py:1524(getouterframes)
 25119143   78.405    0.000 1294.998    0.000 /usr/lib64/python3.9/inspect.py:1485(getframeinfo)
 25119143  122.171    0.000  850.117    0.000 /usr/lib64/python3.9/inspect.py:809(findsource)
 51242960  103.391    0.000  634.200    0.000 /usr/lib64/python3.9/inspect.py:693(getsourcefile)
 25119143   87.511    0.000  213.004    0.000 /usr/lib64/python3.9/inspect.py:727(getmodule)
 51242970   87.414    0.000  209.656    0.000 /usr/lib64/python3.9/inspect.py:655(getfile)
102485927   66.293    0.000  195.296    0.000 {built-in method builtins.any}
 76362217  143.114    0.000  143.114    0.000 {built-in method posix.stat}

The repeated logic has been moved into an internal helper without the decorator, cutting the call stack depth roughly in half and eliminating redundant os.path.islink checks and if path in exclude lookups. os.listdir was also replaced with os.scandir, improving memory efficiency and reducing os.stat calls.
New implementation spends most of its time in syscalls.

Tue Dec  9 11:17:38 2025    /tmp/out2.prof

         8157029 function calls (7152018 primitive calls) in 74.268 seconds

   Ordered by: cumulative time
   List reduced from 481 to 10 due to restriction <10>

   ncalls  tottime  percall  cumtime  percall filename:lineno(function)
        1    0.000    0.000   74.247   74.247 /usr/lib/python3.9/site-packages/mockbuild/trace_decorator.py:57(trace)
        1    0.000    0.000   74.238   74.238 /usr/lib/python3.9/site-packages/mockbuild/file_util.py:34(rmtree)
1004587/1    9.660    0.000   74.238   74.238 /usr/lib/python3.9/site-packages/mockbuild/file_util.py:48(_recursive_rmtree)
  1052067   28.968    0.000   28.968    0.000 {built-in method posix.remove}
  1004587   18.780    0.000   18.780    0.000 {built-in method posix.rmdir}
  1004587   16.170    0.000   16.170    0.000 {built-in method posix.scandir}
  2056653    0.349    0.000    0.349    0.000 {method 'is_dir' of 'posix.DirEntry' objects}
  1004587    0.214    0.000    0.214    0.000 {method '__exit__' of 'posix.ScandirIterator' objects}
  1006359    0.099    0.000    0.099    0.000 {method 'append' of 'list' objects}
     27/2    0.000    0.000    0.022    0.011 <frozen importlib._bootstrap>:1002(_find_and_load)

Optimize `rmtree` to significantly reduce cleanup time, especially for large buildroots (e.g., from ~13 minutes for a ~2M-file tree down to ~1 minute). Profiling showed that a substantial amount of time was spent in the trace decorator invoked on every recursive `rmtree` call. The repeated logic has been moved into an internal helper without the decorator, cutting the call stack depth roughly in half and eliminating redundant `os.path.islink` checks and `if path in exclude` lookups. `os.listdir` was also replaced with `os.scandir`, improving memory efficiency and reducing `os.stat` calls.

mock/py/mockbuild/file_util.py

+    _recursive_rmtree(path, selinux, exclude)
+
+
+def _recursive_rmtree(path, selinux, exclude):


mock/tests/test_file_util.py

@@ -0,0 +1,334 @@
+import os


mock/tests/test_file_util.py

+                          universal_newlines=True)
+
+
+def chattr_works_or_skip(path: Path):


mock/tests/test_file_util.py

+    try:
+        set_immutable(path, True)
+        set_immutable(path, False)
+    except subprocess.CalledProcessError as e:


mock/tests/test_file_util.py

+                "nested": {}
+            }
+        }
+        create_dir_structure(temp_dir, struct)


mock/tests/test_file_util.py

+                file_util.rmtree(str(temp_dir))
+            assert (readonly_dir / "file.txt").exists()
+
+            # Return write permission on readonly_dir


mock/tests/test_file_util.py

+        link = temp_dir / "link"
+        real_dir.mkdir()
+        os.symlink(real_dir, link)
+


mock/tests/test_file_util.py

+        with pytest.raises(OSError, match="Cannot call rmtree on a symbolic link"):
+            file_util.rmtree(str(link))
+
+    def test_rmtree_error_retry_simulated(self, temp_dir):


mock/tests/test_file_util.py

+
+            assert not temp_dir.exists()
+
+    def test_rmtree_long_path(self, temp_dir):


mock/tests/test_file_util.py

+            if islonglongpath:
+                file_util.rmtree(str(temp_dir))
+                assert not temp_dir.exists()
+        except OSError:


gemini-code-assist

Code Review

This pull request significantly improves the performance of rmtree by refactoring the recursive logic into a helper function, avoiding a costly decorator on each recursive call. The use of os.scandir further enhances efficiency. The change is well-supported by a comprehensive new test suite, which is excellent. My review includes a few suggestions to improve the clarity and robustness of the new tests. Overall, this is a high-quality contribution that effectively addresses the performance bottleneck.

gemini-code-assist · 2025-12-09T14:28:12Z

mock/tests/test_file_util.py

+    def test_rmtree_error_retry_simulated(self, temp_dir):
+        """Simulate delayed deletion."""
+        (temp_dir / "file.txt").write_text("will be deleted late")
+
+        # Monkey-patch os.remove to fail first few times
+        original_remove = os.remove
+        retries = 10 * 2 + 2
+
+        def fake_remove(path):
+            nonlocal retries
+            if path == str(temp_dir / "file.txt") and retries:
+                retries -= 1
+                if retries < 12:
+                    raise OSError(errno.EBUSY, "tst EBUSY", path)
+                return
+            original_remove(path)
+
+        with patch("os.remove", fake_remove):
+            # Patch time.sleep to avoid long waits during retry
+            with patch("time.sleep"):
+                with pytest.raises(OSError, match="Directory not empty"):
+                    file_util.rmtree(str(temp_dir))
+                with pytest.raises(OSError, match="tst EBUSY"):
+                    file_util.rmtree(str(temp_dir))
+                file_util.rmtree(str(temp_dir))
+
+            assert not temp_dir.exists()


This test is clever in its use of a stateful counter to test multiple scenarios in sequence. However, this approach makes the test difficult to understand and brittle, as each assertion depends on the state left by the previous one. Best practices for testing recommend that test cases be independent.

Consider refactoring this into separate, more focused tests for each scenario (ENOTEMPTY failure, EBUSY failure, and successful retry). This would improve readability and maintainability.

For example, an isolated test for the ENOTEMPTY scenario could look like this:

def test_rmtree_enotempty_retry_fails(self, temp_dir): """Test that rmtree fails after exhausting retries on ENOTEMPTY.""" (temp_dir / "file.txt").write_text("data") original_remove = os.remove def fake_remove_noop(path): if path == str(temp_dir / "file.txt"): return # Do nothing to simulate a persistent file original_remove(path) with patch("os.remove", fake_remove_noop), patch("time.sleep"): with pytest.raises(OSError) as excinfo: file_util.rmtree(str(temp_dir)) assert excinfo.value.errno == errno.ENOTEMPTY

gemini-code-assist · 2025-12-09T14:28:12Z

mock/tests/test_file_util.py

+        except OSError:
+            pytest.skip("Support of path > PATH_MAX not yet implemented")


The broad except OSError: can mask actual bugs in file_util.rmtree. If rmtree were to fail with an OSError for a reason other than the filesystem's lack of support for long paths, this test would be skipped incorrectly instead of failing.

To make the test more robust, the exception handling should be more specific. The try...except block should ideally only wrap the code that is expected to fail on certain systems (i.e., create_dir_tree), and it should catch specific error codes like errno.ENAMETOOLONG. An OSError raised from file_util.rmtree should generally be considered a test failure.

gemini-code-assist · 2025-12-09T14:28:13Z

releng/release-notes-next/improve-file_util_rmtree-performance.bugfix

@@ -0,0 +1 @@
+The `file_util.rmtree` cleanup process has been significantly accelerated, especially for very large buildroots. The previous approach could take over 13 minutes to remove ~2 million files. A new implementation reducing cleanup times to one minute on same data.


The release note is clear, but the phrasing could be slightly more polished for the user-facing announcement. Consider this alternative for improved flow and conciseness:

"The file_util.rmtree cleanup process has been significantly accelerated, especially for very large buildroots. For example, cleanup time for a ~2M-file tree has been reduced from over 13 minutes to approximately one minute."

praiskup · 2025-12-10T05:43:45Z

See also #1666.

What if we just removed the decorator from affected calls?

praiskup

This PR is simply too large (it appears to be generated) for me to provide a
comprehensive review. My time is limited, and we will likely close this soon.
We really need to discuss this with a human and have a dialogue.

Also, given the risk associated with a first-time GitHub contributor (not just Mock), we need to be careful about potential security threats (e.g., xz-like attacks).

Suggestion: If we know the root cause of the problem, let's implement a
minimal, few-line fix (avoiding the trace decorator) and evaluate the actual
benefits of such a large-scale change. The time spent on cleanups is not a
critical bottleneck—we often use tmpfs, minimal buildroots are truly minimal,
and source tarballs rarely contain millions of files. There is even an environment
variable available to experiment with this approach.

nikitych · 2025-12-16T14:25:36Z

Ah yes, the modern AI-powered world - “it will make you more productive”, they said.

This PR consists of a performance fix for file_util.rmtree and a set of tests that verify that the behavior of this function remains unchanged before and after the change. Apologies in advance if I’m stating the obvious. Normally, when refactoring unfamiliar code, it’s common to rely on test coverage. I do acknowledge that while rm -r can give an extra 10-15% performance boost, the associated risks are not worth it. I also admit that I made a mistake by not running a profiler before implementing the fix.

The actual business-logic changes affect only 19 lines of code, mostly due to Python formatting. If we exclude moved code, the real changes amount to 11 lines, most of which are related to replacing os.listdir with os.scandir, which @xsuchy explicitly pointed out as worthwhile here
Given that, I don’t quite understand the “too large” concern.

Yes, the test skeleton was generated, but the polishing was done manually. The changes in file_util were also written by a human initially (plus autopep8, although I’m not sure whether that counts as intelligence).
The current implementation performs 4 syscalls per directory in the tree; the proposed one performs 3 (Since os.listdir uses caching in some cases entry.is_dir may not require a syscall, so reduced to 2 in this case). That seems like a meaningful improvement to me.

Regarding the “dialog with a human” comment, if #1672 (comment) was addressed to me, then I didn’t realize that - apologies. My answer would be: how am I supposed to know how a change to the public API (decorator removing) of utility code will affect downstream code with whom I am not familiar?

The statement about this being my first GitHub contribution is technically incorrect. While I contribute infrequently, this is not my first PR.

Unfortunately, the size of the buildroot is caused not so much by sources as by build artifacts, this is a production environment.

To summarize, before proceeding further I’d like clarification on the following points:

Should I reduce the size of the PR (for safety?) by removing tests and possibly docstring?
Does it make sense to continue with this PR, or would it be better to open an Issue (possibly with a script that generates a test buildroot) and leave the fix to Members?
Taking into account the current direction of the PR, it’s probably better not to look at implementing paths longer than PATH_MAX for now, right?

nikitych added 2 commits December 9, 2025 16:58

Adds tests for file_util.rmtree

1c9be39

github-advanced-security bot found potential problems Dec 9, 2025

View reviewed changes

gemini-code-assist bot reviewed Dec 9, 2025

View reviewed changes

praiskup requested changes Dec 16, 2025

View reviewed changes

praiskup added the blocked label Dec 16, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

file_util: improve rmtree performance #1672

file_util: improve rmtree performance #1672

Uh oh!

nikitych commented Dec 9, 2025

Uh oh!

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

gemini-code-assist bot left a comment

Uh oh!

gemini-code-assist bot Dec 9, 2025

Uh oh!

gemini-code-assist bot Dec 9, 2025

Uh oh!

gemini-code-assist bot Dec 9, 2025

Uh oh!

praiskup commented Dec 10, 2025

Uh oh!

praiskup left a comment

Uh oh!

nikitych commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants

		_recursive_rmtree(path, selinux, exclude)


		def _recursive_rmtree(path, selinux, exclude):

		universal_newlines=True)


		def chattr_works_or_skip(path: Path):


		assert not temp_dir.exists()

		def test_rmtree_long_path(self, temp_dir):

		except OSError:
		pytest.skip("Support of path > PATH_MAX not yet implemented")

		@@ -0,0 +1 @@
		The `file_util.rmtree` cleanup process has been significantly accelerated, especially for very large buildroots. The previous approach could take over 13 minutes to remove ~2 million files. A new implementation reducing cleanup times to one minute on same data.

file_util: improve rmtree performance #1672

Are you sure you want to change the base?

file_util: improve rmtree performance #1672

Uh oh!

Conversation

nikitych commented Dec 9, 2025

Uh oh!

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

Check warning

gemini-code-assist bot left a comment

Choose a reason for hiding this comment

Code Review

Uh oh!

gemini-code-assist bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

gemini-code-assist bot Dec 9, 2025

Choose a reason for hiding this comment

Uh oh!

praiskup commented Dec 10, 2025

Uh oh!

praiskup left a comment

Choose a reason for hiding this comment

Uh oh!

nikitych commented Dec 16, 2025

Uh oh!

Reviewers

Assignees

Labels

Projects

Milestone

Development

Uh oh!

2 participants