Skip to content
This repository was archived by the owner on Oct 11, 2025. It is now read-only.

Long text query may incorrectly return empty results array #116

@diegodlh

Description

@diegodlh

The reconciliation query's query field "is searched for with both search APIs provided by the Wikibase instance (the auto-complete API and the search API)".

The auto-complete API (wbsearchentities) "searches for entities using labels and aliases". Wikidata labels and aliases seem to be limited to 250 characters (I'm not sure what the limit is in other Wikibase instances). As a result, any query longer than 250 characters would return an empty results array from Wikidata API's wbsearchentities (I've just posted a task in Phabricator suggesting that it returns an error instead, as the query would be nonsense)

On the other hand, the search API (query&list=search) searches page content (including labels and aliases, I understand). This endpoint has a query-length limit of 300 characters. In this case, the endpoint does return an error (instead of an empty results array) if the limit is exceeded, but openrefine-wikibase seems to ignore this error.

As a result, reconciliation queries with a query field longer than 300 characters will always return an empty results array (as long as the query doesn't fit one of the exceptions to the reconciliation workflow).

This may make an user believe that there is no item matching their query, when in reality the query had an error.

Would it make sense to either limit the length of the query field, or handle the error returned by the search API?

Why is the wbsearchentities endpoint used if the search API searches page content including labels and aliases? Assuming there is a reason (I'm sure there is), would this reason imply that the length of the query field should be further limited to 250 characters instead, or that the wbsearchentities error response proposed in my Phabricator ticket should be handled (if ever implemented)?

Thank you!

Metadata

Metadata

Assignees

No one assigned

    Labels

    No labels
    No labels

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions