Dynamically create sitemap.xml#4060
Merged
veganstraightedge merged 26 commits intomainfrom Dec 17, 2024
Merged
Conversation
just1602
approved these changes
Dec 16, 2024
This was referenced Dec 16, 2024
Closed
Contributor
Author
Merged
veganstraightedge
added a commit
that referenced
this pull request
Dec 17, 2024
Same data as `/sitemap.xml`, but as a flat file list of URLs - #4060 The purpose is for a simple way for an archivist to make a backup of the whole site using cURL/wget/similar means. # TODO - add URLs of CSS/JS files - add URLs of images (!!!) - add URLS of PDFs (downloads of zines, posters, etc)
veganstraightedge
added a commit
that referenced
this pull request
Dec 17, 2024
Cleanup following: - #4155 - #4060 # Summary - remove `sitemap_generator` config initializer - remove `sitemap_generator` in `Procfile` and a test - remove `sitemap_generator` gem - make xml/txt formats explicit in the routes (`curl .../sitemap.xml` was getting the `.txt` version mistakenly) - remove duplicate `/tce` URL in both sitemaps
This file contains hidden or bidirectional Unicode text that may be interpreted or compiled differently than what appears below. To review, open the file in an editor that reveals hidden Unicode characters.
Learn more about bidirectional Unicode characters
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Add this suggestion to a batch that can be applied as a single commit.This suggestion is invalid because no changes were made to the code.Suggestions cannot be applied while the pull request is closed.Suggestions cannot be applied while viewing a subset of changes.Only one suggestion per line can be applied in a batch.Add this suggestion to a batch that can be applied as a single commit.Applying suggestions on deleted lines is not supported.You must change the existing code in this line in order to create a valid suggestion.Outdated suggestions cannot be applied.This suggestion has been applied or marked resolved.Suggestions cannot be applied from pending reviews.Suggestions cannot be applied on multi-line comments.Suggestions cannot be applied while the pull request is queued to merge.Suggestion cannot be applied right now. Please check back later.
Issues
closes #3780
closes #1192
An investigation and discoveries
So, it turns out that we maybe never had a live sitemap.xml or .xml.gz in production, this whole time. 🤦🏻
The reason that it seemed to work in development and seemed to succeed in the deploy/release in production but not actually available in production is because… Heroku's ephemeral filesystem.
So, what was happening was in development, running
bundle exec rails sitemap:createorsitemap:refreshetc, would create thesitemap.xml.gzin our local/publicfolders. And stay there. Seems good.In production, during the release stage of a production deploy (as defined in the
Procfile), thesitemap:refreshwould "succeed", but then the/publicfolder it was created in wouldn't necessarily be in the actual dyno/s serving any real requests. AFAICT.My preferred requirements
When working on this, I went round and round trying to make it work with all of these conditions:
robots.txtThe gem suggests and has functionality to store the generated file somewhere else (say, S3), but I'd like to keep it in its well known location.
Conclusion
In the end, I decided to create a sitemaps controller and dynamically create the file to:
/sitemap.xmlrobots.txt(a separate issue/pr to do!)The challenge and risk, of course, is performance. Namely around articles and some of the tools (zines, etc) which have the biggest tables to scan. Especially since most items in the sitemap never change.
But that not ever really changing-ness is what allowed me to use Rails' fragment caching around each
<url>item in the long list of<urlset>and reduce the page load time from ~1s to ~200ms (depending on warm cache, etc). Even at 1s, it's not the end of the world, since (I'm suspecting) that this file doesn't get read a ton.TODO follow up
sitemap:refreshfromProcfileconfig/sitemap.rbsitemap_generatorgem