Skip to content

Empty response (JSONDecodeError) when sending many requests in a rowΒ #43

@marccarre

Description

@marccarre

Version

0.7.0

Problem

When sending several (10~100) requests in a row, some requests fail, without determinism, with the following error:

JSONDecodeError: Expecting value: line 1 column 1 (char 0)

Upon closer investigation, the actual response is a 429,

  • with "reason": Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy, and
  • with the following "body":
<!DOCTYPE html>
<html lang="en">
<meta charset="utf-8">
<title>Wikimedia Error</title>
<style>
	* { margin: 0; padding: 0; }
body { background: #fff; font: 15px/1.6 sans-serif; color: #333; }
.content { margin: 7% auto 0; padding: 2em 1em 1em; max-width: 640px; }
.footer { clear: both; margin-top: 14%; border-top: 1px solid #e5e5e5; background: #f9f9f9; padding: 2em 0; font-size: 0.8em; text-align: center; }
img { float: left; margin: 0 2em 2em 0; }
a img { border: 0; }
h1 { margin-top: 1em; font-size: 1.2em; }
.content-text { overflow: hidden; overflow-wrap: break-word; word-wrap: break-word; -webkit-hyphens: auto; -moz-hyphens: auto; -ms-hyphens: auto; hyphens: auto; }
p { margin: 0.7em 0 1em 0; }
a { color: #0645ad; text-decoration: none; }
a:hover { text-decoration: underline; }
code { font-family: sans-serif; }
.text-muted { color: #777; }
</style>
<div class="content" role="main">
	<a href="https://www.wikimedia.org"><img src="https://www.wikimedia.org/static/images/wmf-logo.png" srcset="https://www.wikimedia.org/static/images/wmf-logo-2x.png 2x" alt="Wikimedia" width="135" height="101">
</a>
<h1>Error</h1>
<div class="content-text">
	<p>Our servers are currently under maintenance or experiencing a technical problem.

	Please <a href="" title="Reload this page" onclick="window.location.reload(false); return false">try again</a> in a few&nbsp;minutes.</p>

<p>See the error message at the bottom of this page for more&nbsp;information.</p>
</div>
</div>
<div class="footer"><p>If you report this error to the Wikimedia System Administrators, please include the details below.</p><p class="text-muted"><code>Request from 122.216.10.145 via cp5012 cp5012, Varnish XID 477962109<br>Upstream caches: cp5012 int<br>Error: 429, Too many requests. Please comply with the User-Agent policy to get a higher rate limit: https://meta.wikimedia.org/wiki/User-Agent_policy at Sun, 17 Jul 2022 22:28:20 GMT</code></p>
</div>
</html>

Root cause

This library doesn't follow Wikimedia's user-agent policy, specifically:

<client name>/<version> (<contact information>) <library/framework name>/<version> [<library name>/<version> ...]. Parts that are not applicable can be omitted.

which leads in a temporary rate limiting/blacklisting of the agent:

Requests from disallowed user agents may instead encounter a less helpful error message like this:
Our servers are currently experiencing a technical problem. Please try again in a few minutes.

See also: https://meta.wikimedia.org/wiki/User-Agent_policy

Solution

Set an User-Agent header compliant with the above policy, e.g.:

>>> import urllib
>>> od = urllib.request.OpenerDirector()
>>> od.addheaders 
[('User-agent', 'Python-urllib/3.9')]
>>> 
>>> import wikidata
>>> wikidata.__version__
'0.7.0'
>>> 
>>> import sys
>>> od.addheaders = { 
...     "Accept": "application/sparql-results+json",
...     "User-Agent": "wikidata-based-bot/%s (https://github.com/dahlia/wikidata ; [email protected]) python/%s.%s.%s Wikidata/%s" % (wikidata.__version__, sys.version_info.major, sys.version_info.minor, sys.version_info.micro, wikidata.__version__),
... }
>>> 
>>> od.addheaders 
{'Accept': 'application/sparql-results+json', 'User-Agent': 'wikidata-based-bot/0.7.0 (https://github.com/dahlia/wikidata ; [email protected]) python/3.9.13 Wikidata/0.7.0'}

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions