-
Notifications
You must be signed in to change notification settings - Fork 25
Long text query may incorrectly return empty results array #116
Description
The reconciliation query's query field "is searched for with both search APIs provided by the Wikibase instance (the auto-complete API and the search API)".
The auto-complete API (wbsearchentities) "searches for entities using labels and aliases". Wikidata labels and aliases seem to be limited to 250 characters (I'm not sure what the limit is in other Wikibase instances). As a result, any query longer than 250 characters would return an empty results array from Wikidata API's wbsearchentities (I've just posted a task in Phabricator suggesting that it returns an error instead, as the query would be nonsense)
On the other hand, the search API (query&list=search) searches page content (including labels and aliases, I understand). This endpoint has a query-length limit of 300 characters. In this case, the endpoint does return an error (instead of an empty results array) if the limit is exceeded, but openrefine-wikibase seems to ignore this error.
As a result, reconciliation queries with a query field longer than 300 characters will always return an empty results array (as long as the query doesn't fit one of the exceptions to the reconciliation workflow).
This may make an user believe that there is no item matching their query, when in reality the query had an error.
Would it make sense to either limit the length of the query field, or handle the error returned by the search API?
Why is the wbsearchentities endpoint used if the search API searches page content including labels and aliases? Assuming there is a reason (I'm sure there is), would this reason imply that the length of the query field should be further limited to 250 characters instead, or that the wbsearchentities error response proposed in my Phabricator ticket should be handled (if ever implemented)?
Thank you!