Skip to content

qEndpoint Full Text Indexing

Antoine Willerval edited this page Feb 27, 2023 · 9 revisions

In qEndpoint, you can configure into the repo_model.ttl file an index to enable full-text or GeoSPARQL indexing

You have multiple example of model here, but we will describe how to add a simple node to handle this.

Simple text indexing

You can describe a simple node do text indexing like this one:

# Specify the main node
mdlc:main mdlc:node _:mainNode .
_:mainNode mdlc:type mdlc:luceneNode ;
            # Describe the location of the lucene directory, you can use mdlc:parsedString for template strings
            mdlc:dirLocation "${locationNative}lucene"^^mdlc:parsedString ;
            # Define the language(s) indexed by the Lucene index, here "fr" (French) and "es" (Spanish) (uncomment to add)
            # mdlc:luceneLang "es", "fr" ;
            # Define the reindex query for the lucene sail, the query should be ordered by ?s
            mdlc:luceneReindexQuery "SELECT * {?s ?p ?o} order by ?s" ;
            # Describe the evaluation mode of the queries, for native or endpointStore storage, use NATIVE
            mdlc:luceneEvalMode "NATIVE"^^mdlc:parsedString.

For location on disk, you can use the predefined options like locationNative for example, you can use all the predefined options here

You can then search with the search virtual properties in your SPARQL queries:

PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>

?subj search:matches [
	      search:query "search terms...";
	      search:property my:property;
	      search:score ?score;
	      search:snippet ?snippet ] .

Multiple index with flux filtering

You can find demo models here

With qEndpoint, you can config multiple Full-text index to have specific rules to search over them.

In you model.ttl, you can create a Lucene node like explained in the Simple text indexing part, but you can also add filter to your node to only handle certain triples. This is done by using filters.

Filters

Start by creating a filter node, here we will call it _:filterNode and it will filter the node _:luceneNode

_:filterNode mdlc:type mdlc:filterNode ;
             mdlc:paramFilter [
                  mdlc:type mdlc:typeFilterLuceneExp
             ] ;
             mdlc:paramLink _:luceneNode .

You can describe the type of the node with the mdlc:type predicate, you have multiple type available:

  • mdlc:typeFilterLuceneExp : Will only pass the SPARQL queries with a Lucene search:matches query.

    Example:

    _:filterNode mdlc:type mdlc:filterNode ;
                 mdlc:paramFilter [
                      mdlc:type mdlc:typeFilterLuceneExp
                 ] ;
                 mdlc:paramLink _:luceneNode .
  • mdlc:typeFilterLuceneGeoExp : Will only pass the SPARQL queries with a Lucene GeoSPARQL query.

    Example:

    _:filterNode mdlc:type mdlc:filterNode ;
                 mdlc:paramFilter [
                      mdlc:type mdlc:typeFilterLuceneGeoExp
                 ] ;
                 mdlc:paramLink _:luceneNode .
  • mdlc:predicateFilter : Will only pass during add/remove/get the triples with the described predicate(s)

    • Required param: mdlc:typeFilterPredicate <predicates>

    Example, here my:text1, my:text2 and my:text3 are the filtered predicates, but you can also specify only 1 or more than 3:

    _:filterNode mdlc:type mdlc:filterNode ;
                 mdlc:paramFilter [
                       mdlc:type mdlc:predicateFilter ;
                       # The filtered predicates
                       mdlc:typeFilterPredicate my:text1, my:text2, my:text3
                 ] ;
                 mdlc:paramLink _:luceneNode .
  • mdlc:languageFilter : Will only pass during add/remove/get the triples with a literal of a particular language, the mdlc:luceneLang parameter is faster for the Lucene nodes, it is mentionned here for custom implementations.

    • Required param: mdlc:languageFilterLang "langs": set the filtered languages
    • Optional param: mdlc:acceptNoLanguageLiterals []: allow to pass literals without languages

    Example, here "es", "fr" and "it" are the filtered languages, but you can also specify only 1 or more than 3:

    _:filterNode mdlc:type mdlc:filterNode ;
                 mdlc:paramFilter [
                       mdlc:type mdlc:languageFilter ;
                       # The filtered languages
                       mdlc:languageFilterLang "es", "fr", "it" ;
                       # Do we accept literals without any language
                       # mdlc:acceptNoLanguageLiterals []
                 ] ;
                 mdlc:paramLink _:luceneNode .
  • mdlc:typeFilter : Will only pass during add/remove/get the triples with a subject of a particular type, the mdlc:multiFilterNode node is faster and better for multiple type checks.

    • Required param: mdlc:typeFilterPredicate <is_of_type>: describe the type predicate to define the type of a subject
    • Required param: mdlc:typeFilterObject <types>: the filtered types

    Example, here my:type1 and my:type2 are the filtered types, but you can also specify only 1 or more than 3:

    _:filterNode mdlc:type mdlc:filterNode ;
                 mdlc:paramFilter [
                       mdlc:type mdlc:typeFilter ;
                       # The predicate describing the type for a subject
                       mdlc:typeFilterPredicate my:oftype ;
                       # The filtered types
                       mdlc:typeFilterObject my:type1, my:type2
                 ] ;
                 mdlc:paramLink _:luceneNode .

Filters boolean operations

Now we can filter our streams, but what if we want to use multiple filters? qEndpoint also has a syntax for that. It is done by using the mdlc:paramFilterAnd and mdlc:paramFilterOr predicates in the mdlc:paramFilter.

Example 1

_:filterNode mdlc:type mdlc:filterNode ;
             mdlc:paramFilter [
                  mdlc:type mdlc:typeFilterLuceneGeoExp
                  mdlc:paramFilterOr: [
                      mdlc:type mdlc:typeFilterLuceneExp
                  ]
             ] ;
             mdlc:paramLink _:luceneNode .

Here we are filtering all the expression not containing a GeoSPARQL query or a Full text search query, the mdlc:paramFilterOr can contain multiple filters, the predicates are the same as with the mdlc:paramFilter objects.

Example 2

_:filterNode mdlc:type mdlc:filterNode ;
             mdlc:paramFilter [
                  mdlc:type mdlc:typeFilterLuceneExp
                  mdlc:paramFilterAnd: [
                      mdlc:type mdlc:predicateFilter ;
                      mdlc:typeFilterPredicate my:description ;
                  ]
             ] ;
             mdlc:paramLink _:luceneNode .

In this example, we are filtering the expressions with a full-text search and all the triples without a my:description predicate, it can be used for example to index all the descriptions.

The boolean operators priorities as the same as in most of the programming languages.

[] mdlc:paramFilter [
    mdlc:type <FILTER_A>
    mdlc:paramFilterAnd: [
        mdlc:type <FILTER_B>
    ],
    mdlc:paramFilterOr: [
        mdlc:type <FILTER_C>
    ]
].

This little example can be translated to this expression:

(FILTER_A and FILTER_B) or FILTER_C
// TODO: Explain multi-filter

// TODO: Explain Reindex query

// TODO: Explain chains
Clone this wiki locally