Add predicate-object tuples to Solr

Description

It should be possible to query metadata by requesting entries that match certain predicate-object tuples.

This can be implemented by introducing a new dynamic field: metadata.predicate.[literal|uri].{md5-hash-of-predicate}

Because of unclear size and length limitations of Solr field names, a hash is to be preferred over an escaped URI. MD5 is probably a good choice as it produces hashes of constant length, it is available in most programming languages, it is fast and it has a standardized hex-representation.

To shorten the hash (an MD5 hash has a length of 32 characters in its hex-notation) it is probably possible to truncate it. There is the risk of causing collisions, but there are not that many predicate URIs that are used in practice, so the risk is probably rather small. Even if collissions should occur there are probably no negative effects other than false positives which match certain queries. To summarize, it should be safe to truncate after 8 characters or even less.

As an example, the Solr field metadata.predicate.literal.af00b1ee allows to query for dc:subject. The hash has been truncated after 8 characters.

Because literals and URIs need different field types in Solr it is probably necessary to differentiate these cases by appending either literal or uri to the Solr field name.

Environment

None

Activity

Show:
Hannes Ebner
March 11, 2016, 1:23 PM

Modified indexing procedure to make more exact queries for predicate-object combinations possible, see also KB at entrystore.org

Fixed

Assignee

Hannes Ebner

Reporter

Hannes Ebner

Labels

None

Fix versions

Priority

Normal