-
Notifications
You must be signed in to change notification settings - Fork 14
qEndpoint Full Text Indexing
In qEndpoint, you can configure into the repo_model.ttl file an index to enable full-text or GeoSPARQL indexing
You have multiple example of model here, but we will describe how to add a simple node to handle this.
You can describe a simple node do text indexing like this one:
# Specify the main node
mdlc:main mdlc:node _:mainNode .
_:mainNode mdlc:type mdlc:luceneNode ;
# Describe the location of the lucene directory, you can use mdlc:parsedString for template strings
mdlc:dirLocation "${locationNative}lucene"^^mdlc:parsedString ;
# Define the language(s) indexed by the Lucene index, here "fr" (French) and "es" (Spanish) (uncomment to add)
# mdlc:luceneLang "es", "fr" ;
# Define the reindex query for the lucene sail, the query should be ordered by ?s
mdlc:luceneReindexQuery "SELECT * {?s ?p ?o} order by ?s" ;
# Describe the evaluation mode of the queries, for native or endpointStore storage, use NATIVE
mdlc:luceneEvalMode "NATIVE"^^mdlc:parsedString.
For location on disk, you can use the predefined options like locationNative
for example, you can use all the predefined options here
You can then search with the search virtual properties in your SPARQL queries:
PREFIX search: <http://www.openrdf.org/contrib/lucenesail#>
?subj search:matches [
search:query "search terms...";
search:property my:property;
search:score ?score;
search:snippet ?snippet ] .
With qEndpoint, you can config multiple Full-text index to have specific rules to search over them.
In you model.ttl
, you can create a Lucene node like explained in the Simple text indexing part, but you can also add filter to your node to only handle certain triples. This is done by using filters.
Start by creating a filter node, here we will call it _:filterNode
and it will filter the node _:luceneNode
_:filterNode mdlc:type mdlc:filterNode ;
mdlc:paramFilter [
mdlc:type mdlc:typeFilterLuceneExp
] ;
mdlc:paramLink _:luceneNode .
You can describe the type of the node with the mdlc:type
predicate, you have multiple type available:
-
mdlc:typeFilterLuceneExp
: Will only pass the SPARQL queries with a Lucenesearch:matches
query.Example:
_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:typeFilterLuceneExp ] ; mdlc:paramLink _:luceneNode .
-
mdlc:typeFilterLuceneGeoExp
: Will only pass the SPARQL queries with a Lucene GeoSPARQL query.Example:
_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:typeFilterLuceneGeoExp ] ; mdlc:paramLink _:luceneNode .
-
mdlc:predicateFilter
: Will only pass during add/remove/get the triples with the described predicate(s)- Required param:
mdlc:typeFilterPredicate <predicates>
Example, here
my:text1
,my:text2
andmy:text3
are the filtered predicates, but you can also specify only 1 or more than 3:_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:predicateFilter ; # The filtered predicates mdlc:typeFilterPredicate my:text1, my:text2, my:text3 ] ; mdlc:paramLink _:luceneNode .
- Required param:
-
mdlc:languageFilter
: Will only pass during add/remove/get the triples with a literal of a particular language, themdlc:luceneLang
parameter is faster for the Lucene nodes, it is mentionned here for custom implementations.- Required param:
mdlc:languageFilterLang "langs"
: set the filtered languages - Optional param:
mdlc:acceptNoLanguageLiterals []
: allow to pass literals without languages
Example, here
"es"
,"fr"
and"it"
are the filtered languages, but you can also specify only 1 or more than 3:_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:languageFilter ; # The filtered languages mdlc:languageFilterLang "es", "fr", "it" ; # Do we accept literals without any language # mdlc:acceptNoLanguageLiterals [] ] ; mdlc:paramLink _:luceneNode .
- Required param:
-
mdlc:typeFilter
: Will only pass during add/remove/get the triples with a subject of a particular type, themdlc:multiFilterNode
node is faster and better for multiple type checks.- Required param:
mdlc:typeFilterPredicate <is_of_type>
: describe the type predicate to define the type of a subject - Required param:
mdlc:typeFilterObject <types>
: the filtered types
Example, here
my:type1
andmy:type2
are the filtered types, but you can also specify only 1 or more than 3:_:filterNode mdlc:type mdlc:filterNode ; mdlc:paramFilter [ mdlc:type mdlc:typeFilter ; # The predicate describing the type for a subject mdlc:typeFilterPredicate my:oftype ; # The filtered types mdlc:typeFilterObject my:type1, my:type2 ] ; mdlc:paramLink _:luceneNode .
- Required param:
Now we can filter our streams, but what if we want to use multiple filters? qEndpoint also has a syntax for that. It is done by using the mdlc:paramFilterAnd
and mdlc:paramFilterOr
predicates in the mdlc:paramFilter
.
Example 1
_:filterNode mdlc:type mdlc:filterNode ;
mdlc:paramFilter [
mdlc:type mdlc:typeFilterLuceneGeoExp
mdlc:paramFilterOr: [
mdlc:type mdlc:typeFilterLuceneExp
]
] ;
mdlc:paramLink _:luceneNode .
Here we are filtering all the expression not containing a GeoSPARQL query or a Full text search query, the mdlc:paramFilterOr
can contain multiple filters, the predicates are the same as with the mdlc:paramFilter
objects.
Example 2
_:filterNode mdlc:type mdlc:filterNode ;
mdlc:paramFilter [
mdlc:type mdlc:typeFilterLuceneExp
mdlc:paramFilterAnd: [
mdlc:type mdlc:predicateFilter ;
mdlc:typeFilterPredicate my:description ;
]
] ;
mdlc:paramLink _:luceneNode .
In this example, we are filtering the expressions with a full-text search and all the triples without a my:description
predicate, it can be used for example to index all the descriptions.
The boolean operators priorities as the same as in most of the programming languages.
[] mdlc:paramFilter [
mdlc:type <FILTER_A>
mdlc:paramFilterAnd: [
mdlc:type <FILTER_B>
],
mdlc:paramFilterOr: [
mdlc:type <FILTER_C>
]
].
This little example can be translated to this expression:
(FILTER_A and FILTER_B) or FILTER_C
// TODO: Explain multi-filter
// TODO: Explain Reindex query
// TODO: Explain chains