Index to search #1276

sampaccoud · 2025-08-07T16:31:17Z

Purpose

We want to add fulltext (and semantic in a second phase) search to Docs.

The goal is to enable efficient and scalable search across document content by pushing relevant data to a dedicated search backend, such as OpenSearch. The backend should be pluggable.

Proposal

Add indexing logic in a search indexer that can be declared as a backend
Implement indexing for the Find backend. See corresponding PR in Find
Implement search views as a proxy
Implement triggers to update search index when a document or its accesses change. Synchronization should be done asyncrhonously as changing a document or its accesses affects all its descendants...

Fixes #322

gitguardian · 2025-09-08T12:39:18Z

️✅ There are no secrets present in this pull request anymore.

If these secrets were true positive and are still valid, we highly recommend you to revoke them.
While these secrets were previously flagged, we no longer have a reference to the
specific commits where they were detected. Once a secret has been leaked into a git
repository, you should consider it compromised, even if it was deleted immediately.
Find here more information about risks.

^{_{🦉 GitGuardian detects secrets in your source code to help developers and security teams secure the modern development process. You are seeing this because you or someone else with access to this repository has authorized GitGuardian to scan your pull request.}}

Search in Docs relies on an external project like "La Suite Find". We need to declare a common external network in order to connect to the search app and index our documents.

We need to content in our demo documents so that we can test indexing.

Add indexer that loops across documents in the database, formats them as json objects and indexes them in the remote "Find" mico-service.

qbey

First review, I know work is still ongoing and I did not read all the tests... :)

qbey · 2025-09-10T19:45:34Z

src/backend/core/api/serializers.py

+    q = serializers.CharField(required=True)
+
+    def validate_q(self, value):
+        """Ensure the text field is not empty."""
+
+        if len(value.strip()) == 0:
+            raise serializers.ValidationError("Text field cannot be empty.")
+
+        return value


Suggested change

q = serializers.CharField(required=True)

def validate_q(self, value):

"""Ensure the text field is not empty."""

if len(value.strip()) == 0:

raise serializers.ValidationError("Text field cannot be empty.")

return value

q = serializers.CharField(required=True, allow_blank=False)

You may also add trim_whitespace=True

qbey · 2025-09-10T19:56:25Z

src/backend/core/api/viewsets.py

+        serializer.is_valid(raise_exception=True)
+
+        try:
+            indexer = FindDocumentIndexer()


I guess this class should come from settings, because not everyone will have an indexer. I also think this view might fallback on searching locally on title if no indexer is configured.

qbey · 2025-09-10T19:58:16Z

src/backend/core/services/search_indexers.py

+        url = getattr(settings, "SEARCH_INDEXER_QUERY_URL", None)
+
+        if not url:
+            raise RuntimeError(


Suggested change

raise RuntimeError(

raise ImproperlyConfigured(

qbey · 2025-09-10T19:58:48Z

src/backend/core/services/search_indexers.py

+        Returns:
+            dict: A JSON-serializable dictionary.
+        """
+        url = getattr(settings, "SEARCH_INDEXER_QUERY_URL", None)


Suggested change

url = getattr(settings, "SEARCH_INDEXER_QUERY_URL", None)

url = settings.SEARCH_INDEXER_QUERY_URL

qbey · 2025-09-10T19:59:42Z

src/backend/core/services/search_indexers.py

+            logger.error("HTTPError: %s", e)
+            logger.error("Response content: %s", response.text)  # type: ignore


Log the error only once

qbey · 2025-09-10T20:08:43Z

src/backend/core/models.py

@@ -1169,6 +1185,15 @@ def get_abilities(self, user):
        }


+@receiver(signals.post_save, sender=DocumentAccess)


We try to follow "Use signals as a last resort" (Two scoops of Django): is there a problem calling this from the save method?

qbey · 2025-09-10T20:10:19Z

src/backend/core/services/search_indexers.py

+        """
+        return models.Document.objects.filter(pk__in=[d["_id"] for d in data])
+
+    def push(self, data):


Same comments as before in this method

qbey · 2025-09-10T20:25:02Z

src/backend/core/tests/commands/test_index.py

+    def sortkey(d):
+        return d["id"]


Use https://docs.python.org/3/library/operator.html#operator.itemgetter instead

qbey · 2025-09-10T20:32:25Z

src/backend/core/tests/commands/test_index.py

+        push_call_args = [call.args[0] for call in mock_push.call_args_list]
+
+        assert len(push_call_args) == 1  # called once but with a batch of docs
+        assert sorted(push_call_args[0], key=sortkey) == sorted(


Interesting, actually I think the documents sorting should be deterministic, in case we need tu run several times the index command => I think we should change the index command to sort documents by creation date or something ^^

We can keep it this way for now, but we surely need to add a comment in the "index" management command.

the sort is by id because the indexation is done by batch : loop + id__gt=prev_batch_last_id

qbey · 2025-09-10T20:33:04Z

src/backend/core/tests/commands/test_index.py

+        push_call_args = [call.args[0] for call in mock_push.call_args_list]
+
+        assert len(push_call_args) == 1  # called once but with a batch of docs


Don't you want to simply check the assert_called_once, then use the first value?

On document content or permission changes, start a celery job that will call the indexation API of the app "Find". Signed-off-by: Fabre Florian <ffabre@hybird.org>

Signed-off-by: Fabre Florian <ffabre@hybird.org>

New API view that calls the indexed documents search view (resource server) of app "Find". Signed-off-by: Fabre Florian <ffabre@hybird.org>

New SEARCH_INDEXER_CLASS setting to define the indexer service class. Raise ImpoperlyConfigured errors instead of RuntimeError in index service. Signed-off-by: Fabre Florian <ffabre@hybird.org>

Signed-off-by: Fabre Florian <ffabre@hybird.org>

sampaccoud mentioned this pull request Aug 7, 2025

Full-Blown search feature #322

Open

sampaccoud requested a review from joehybird August 7, 2025 16:40

sampaccoud assigned joehybird Aug 7, 2025

sampaccoud added feature add a new feature backend labels Aug 7, 2025

joehybird force-pushed the index-to-search branch from 89fd00e to 526d757 Compare August 13, 2025 14:22

joehybird force-pushed the index-to-search branch 3 times, most recently from 10bfd94 to 5bd6b18 Compare September 8, 2025 12:38

sampaccoud added 3 commits September 10, 2025 15:58

🔧(compose) configure external network for communication with search

833662c

Search in Docs relies on an external project like "La Suite Find". We need to declare a common external network in order to connect to the search app and index our documents.

✨(backend) add dummy content to demo documents

f519121

We need to content in our demo documents so that we can test indexing.

✨(backend) add document search indexer

59a8704

Add indexer that loops across documents in the database, formats them as json objects and indexes them in the remote "Find" mico-service.

joehybird force-pushed the index-to-search branch from 5bd6b18 to e966594 Compare September 10, 2025 15:18

qbey reviewed Sep 10, 2025

View reviewed changes

joehybird force-pushed the index-to-search branch 2 times, most recently from 42e69af to 41f4967 Compare September 11, 2025 13:32

sampaccoud and others added 4 commits September 11, 2025 15:39

✨(backend) add async triggers to enable document indexation with find

db3e0d6

On document content or permission changes, start a celery job that will call the indexation API of the app "Find". Signed-off-by: Fabre Florian <ffabre@hybird.org>

🔧(compose) Add some ignore for docker-compose local overrides

6a1782e

Signed-off-by: Fabre Florian <ffabre@hybird.org>

✨(backend) add unit test for the 'index' command

eb0c4fa

Signed-off-by: Fabre Florian <ffabre@hybird.org>

✨(backend) add document search view

e6b2415

New API view that calls the indexed documents search view (resource server) of app "Find". Signed-off-by: Fabre Florian <ffabre@hybird.org>

joehybird force-pushed the index-to-search branch 4 times, most recently from 0290e02 to 2acbaa0 Compare September 12, 2025 05:27

✨(backend) improve search indexer service configuration

f4b11cb

New SEARCH_INDEXER_CLASS setting to define the indexer service class. Raise ImpoperlyConfigured errors instead of RuntimeError in index service. Signed-off-by: Fabre Florian <ffabre@hybird.org>

joehybird force-pushed the index-to-search branch 3 times, most recently from 2058095 to 6e9c6ec Compare September 12, 2025 12:22

✨(backend) refactor indexation signals and fix circular import issues

dab8220

Signed-off-by: Fabre Florian <ffabre@hybird.org>

joehybird force-pushed the index-to-search branch from 6e9c6ec to dab8220 Compare September 12, 2025 12:30

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Index to search #1276

Index to search #1276

Uh oh!

sampaccoud commented Aug 7, 2025 •

edited

Loading

Uh oh!

gitguardian bot commented Sep 8, 2025 •

edited

Loading

Uh oh!

qbey left a comment

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

joehybird Sep 11, 2025

Uh oh!

qbey Sep 10, 2025

Uh oh!

Uh oh!

	url = getattr(settings, "SEARCH_INDEXER_QUERY_URL", None)
	url = settings.SEARCH_INDEXER_QUERY_URL

		logger.error("HTTPError: %s", e)
		logger.error("Response content: %s", response.text) # type: ignore

		@@ -1169,6 +1185,15 @@ def get_abilities(self, user):
		}


		@receiver(signals.post_save, sender=DocumentAccess)

		push_call_args = [call.args[0] for call in mock_push.call_args_list]

		assert len(push_call_args) == 1 # called once but with a batch of docs

Index to search #1276

Are you sure you want to change the base?

Index to search #1276

Uh oh!

Conversation

sampaccoud commented Aug 7, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Purpose

Proposal

Uh oh!

gitguardian bot commented Sep 8, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

️✅ There are no secrets present in this pull request anymore.

Uh oh!

qbey left a comment

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

sampaccoud commented Aug 7, 2025 •

edited

Loading

gitguardian bot commented Sep 8, 2025 •

edited

Loading