-
Notifications
You must be signed in to change notification settings - Fork 33
Description
Version
0.7.0
Problem
When sending several (10~100) requests in a row, some requests fail, without determinism, with the following error:
JSONDecodeError: Expecting value: line 1 column 1 (char 0)
Upon closer investigation, the actual response is a 429,
- with "reason":
Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy, and - with the following "body":
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645ad; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: sans-serif; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Error</h1>
<div class="content-text">
<p>Our servers are currently under maintenance or experiencing a technical problem.
Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few minutes.</p>
<p>See the error message at the bottom of this page for more information.</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from 122.216.10.145 via cp5012 cp5012, Varnish XID 477962109<br>Upstream caches: cp5012 int<br>Error: 429, Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy at Sun, 17 Jul 2022 22:28:20 GMT</code></p>
</div>
</html>Root cause
This library doesn't follow Wikimedia's user-agent policy, specifically:
<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.
which leads in a temporary rate limiting/blacklisting of the agent:
Requests from disallowed user agents may instead encounter a less helpful error message like this:
Our servers are currently experiencing a technical problem. Please try again in a few minutes.
See also: https://meta.wikimedia.org/wiki/User-Agent_policy
Solution
Set an User-Agent header compliant with the above policy, e.g.:
>>> import urllib
>>> od = urllib.request.OpenerDirector()
>>> od.addheaders
[('User-agent', 'Python-urllib/3.9')]
>>>
>>> import wikidata
>>> wikidata.__version__
'0.7.0'
>>>
>>> import sys
>>> od.addheaders = {
... "Accept": "application/sparql-results+json",
... "User-Agent": "wikidata-based-bot/%s (https://github.com/dahlia/wikidata ; [email protected]) python/%s.%s.%s Wikidata/%s" % (wikidata.__version__, sys.version_info.major, sys.version_info.minor, sys.version_info.micro, wikidata.__version__),
... }
>>>
>>> od.addheaders
{'Accept': 'application/sparql-results+json', 'User-Agent': 'wikidata-based-bot/0.7.0 (https://github.com/dahlia/wikidata ; [email protected]) python/3.9.13 Wikidata/0.7.0'}