Skip to content

Conversation

@perfectra1n
Copy link
Member

@perfectra1n perfectra1n commented Oct 10, 2025

  • make sure that =text returns matches that only include "text"
  • create additional e2e and unit tests to validate change

Phrase example:
image

Word example (with working highlighting!)
image

@dosubot dosubot bot added the size:XL This PR changes 500-999 lines, ignoring generated files. label Oct 10, 2025
@gemini-code-assist
Copy link
Contributor

Summary of Changes

Hello @perfectra1n, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request resolves a long-standing issue in the search mechanism where the = operator failed to provide true exact word matches, often returning partial or substring results. The changes introduce a precise word-boundary matching algorithm, enhancing the accuracy and reliability of exact searches. This ensures that users can confidently use the = operator to find specific terms without irrelevant noise, backed by a robust suite of new and updated tests.

Highlights

  • Exact Search Fix: The core search functionality has been updated to correctly handle the = operator for exact word matching, ensuring that queries like =text only return results that contain 'text' as a whole word, rather than partial matches.
  • New containsExactWord Method: A new private utility method, containsExactWord, was introduced in note_content_fulltext.ts to perform case-insensitive, word-boundary-aware matching, which is now leveraged by the = and != search operators.
  • Updated Comparator Logic: The build_comparator.ts service now uses the new exact word matching logic for the = and != string comparators, ensuring consistent behavior across different search contexts.
  • Comprehensive Test Coverage: Extensive new E2E tests (exact_search.spec.ts) and expanded unit tests (search.spec.ts) have been added to validate the exact search functionality, covering scenarios such as partial matches, exact matches, case insensitivity, and handling of special characters.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request aims to fix the exact search operator =. The changes correctly implement exact word matching for single words. However, there's a critical issue with how quoted phrases (multi-word tokens) are handled in exact searches. The current implementation checks if any single word in the content matches the entire phrase, which will almost always fail. It should instead verify that all words from the phrase are present as distinct words in the content. This issue is present in both note_content_fulltext.ts and build_comparator.ts. Additionally, this logic is duplicated in both files and should be centralized. The new e2e and unit tests are very thorough and a great addition, though one e2e test has a weak assertion that should be strengthened.

Comment on lines 92 to 104
/**
* Checks if content contains the exact word (with word boundaries)
* This is case-insensitive since content and token are already normalized
*/
private containsExactWord(token: string, content: string): boolean {
// Normalize both for case-insensitive comparison
const normalizedToken = normalizeSearchText(token);
const normalizedContent = normalizeSearchText(content);

// Split content into words and check for exact match
const words = normalizedContent.split(/\s+/);
return words.some(word => word === normalizedToken);
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The containsExactWord method is implemented incorrectly for multi-word tokens (e.g., from a quoted search like ="exact phrase"). It checks if any single word in the content is equal to the entire token phrase, which will almost always be false. The correct logic should be to split the token into words and verify that all of those words exist in the content.

Additionally, the comment on line 94 is misleading as the function performs normalization. The normalization of content is also redundant since it's already normalized before this function is called.

The logic here is also duplicated in apps/server/src/services/search/services/build_comparator.ts. This should ideally be centralized in a utility function.

    /**
     * Checks if content contains all words from the given token.
     * This is case-insensitive. Assumes content is already normalized.
     */
    private containsExactWord(token: string, content: string): boolean {
        // content is assumed to be pre-normalized.
        const contentWords = new Set(content.split(/\s+/));
        const tokenWords = normalizeSearchText(token).split(/\s+/);

        return tokenWords.every(word => contentWords.has(word));
    }

Comment on lines 16 to 37
"=": (comparedValue) => (val) => {
// For the = operator, check if the value contains the exact word (word-boundary matching)
// This is case-insensitive since both values are already lowercased
if (!val) return false;

const normalizedVal = normalizeSearchText(val);
const normalizedCompared = normalizeSearchText(comparedValue);

// Split into words and check for exact match
const words = normalizedVal.split(/\s+/);
return words.some(word => word === normalizedCompared);
},
"!=": (comparedValue) => (val) => {
// Negation of exact word match
if (!val) return true;

const normalizedVal = normalizeSearchText(val);
const normalizedCompared = normalizeSearchText(comparedValue);

const words = normalizedVal.split(/\s+/);
return !words.some(word => word === normalizedCompared);
},
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The comparators for = and != are implemented incorrectly for multi-word values (e.g., from a quoted attribute search like #title="foo bar"). The current logic for = checks if any single word in the attribute value is equal to the entire search phrase, which is incorrect. It should check if all words from the search phrase are present in the attribute value. The != comparator has the inverse issue.

The comment on line 18 is also misleading as normalization is performed within the function. This logic is also duplicated from note_content_fulltext.ts and would benefit from being a shared utility function.

    "=": (comparedValue) => (val) => {
        // For the = operator, check if the value contains all words from the comparedValue.
        // This is case-insensitive.
        if (!val) return false;

        const valWords = new Set(normalizeSearchText(val).split(/\s+/));
        const comparedWords = normalizeSearchText(comparedValue).split(/\s+/);

        return comparedWords.every(cw => valWords.has(cw));
    },
    "!=": (comparedValue) => (val) => {
        // Negation of exact word match: check if value does NOT contain all words from comparedValue.
        if (!val) return true;

        const valWords = new Set(normalizeSearchText(val).split(/\s+/));
        const comparedWords = normalizeSearchText(comparedValue).split(/\s+/);

        return !comparedWords.every(cw => valWords.has(cw));
    },

Comment on lines +18 to +173
test.beforeEach(async ({ page, context }) => {
const app = new App(page, context);
await app.goto();

// Get CSRF token
csrfToken = await page.evaluate(() => {
return (window as any).glob.csrfToken;
});

expect(csrfToken).toBeTruthy();

// Create test notes with specific content patterns
// Note 1: Contains exactly "pagio" in title
const note1 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, {
headers: { "x-csrf-token": csrfToken },
data: {
title: "Test Note with pagio",
content: "This note contains the word pagio in the content.",
type: "text"
}
});
expect(note1.ok()).toBeTruthy();
const note1Data = await note1.json();
createdNoteIds.push(note1Data.note.noteId);

// Note 2: Contains "page" (not exact match)
const note2 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, {
headers: { "x-csrf-token": csrfToken },
data: {
title: "Test Note with page",
content: "This note contains the word page in the content.",
type: "text"
}
});
expect(note2.ok()).toBeTruthy();
const note2Data = await note2.json();
createdNoteIds.push(note2Data.note.noteId);

// Note 3: Contains "pages" (plural, not exact match)
const note3 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, {
headers: { "x-csrf-token": csrfToken },
data: {
title: "Test Note with pages",
content: "This note contains the word pages in the content.",
type: "text"
}
});
expect(note3.ok()).toBeTruthy();
const note3Data = await note3.json();
createdNoteIds.push(note3Data.note.noteId);

// Note 4: Contains "homepage" (contains "page", not exact match)
const note4 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, {
headers: { "x-csrf-token": csrfToken },
data: {
title: "Homepage Note",
content: "This note is about homepage content.",
type: "text"
}
});
expect(note4.ok()).toBeTruthy();
const note4Data = await note4.json();
createdNoteIds.push(note4Data.note.noteId);

// Note 5: Another note with exact "pagio" in content
const note5 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, {
headers: { "x-csrf-token": csrfToken },
data: {
title: "Another pagio Note",
content: "This is another note with pagio content for testing exact matches.",
type: "text"
}
});
expect(note5.ok()).toBeTruthy();
const note5Data = await note5.json();
createdNoteIds.push(note5Data.note.noteId);

// Note 6: Contains "pagio" in title only
const note6 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, {
headers: { "x-csrf-token": csrfToken },
data: {
title: "pagio",
content: "This note has pagio as the title.",
type: "text"
}
});
expect(note6.ok()).toBeTruthy();
const note6Data = await note6.json();
createdNoteIds.push(note6Data.note.noteId);

// Wait a bit for indexing
await page.waitForTimeout(500);
});

test.afterEach(async ({ page }) => {
// Clean up created notes
for (const noteId of createdNoteIds) {
try {
const taskId = `cleanup-${Math.random().toString(36).substr(2, 9)}`;
await page.request.delete(`${BASE_URL}/api/notes/${noteId}?taskId=${taskId}&last=true`, {
headers: { "x-csrf-token": csrfToken }
});
} catch (e) {
console.error(`Failed to delete note ${noteId}:`, e);
}
}
createdNoteIds = [];
});

test("Quick search without = operator returns all partial matches", async ({ page }) => {
// Test the /quick-search endpoint without the = operator
const response = await page.request.get(`${BASE_URL}/api/quick-search/pag`, {
headers: { "x-csrf-token": csrfToken }
});

expect(response.ok()).toBeTruthy();
const data = await response.json();

// Should return multiple notes including "page", "pages", "homepage"
expect(data.searchResultNoteIds).toBeDefined();
expect(data.searchResults).toBeDefined();

// Filter to only our test notes
const testResults = data.searchResults.filter((result: any) =>
result.noteTitle.includes("page") ||
result.noteTitle.includes("pagio") ||
result.noteTitle.includes("Homepage")
);

// Should find at least "page", "pages", "homepage", and "pagio" notes
expect(testResults.length).toBeGreaterThanOrEqual(4);

console.log("Quick search 'pag' found:", testResults.length, "matching notes");
console.log("Note titles:", testResults.map((r: any) => r.noteTitle));
});

test("Quick search with = operator returns only exact matches", async ({ page }) => {
// Test the /quick-search endpoint WITH the = operator
const response = await page.request.get(`${BASE_URL}/api/quick-search/=pagio`, {
headers: { "x-csrf-token": csrfToken }
});

expect(response.ok()).toBeTruthy();
const data = await response.json();

// Should return only notes with exact "pagio" match
expect(data.searchResultNoteIds).toBeDefined();
expect(data.searchResults).toBeDefined();

// Filter to only our test notes
const testResults = data.searchResults.filter((result: any) =>
createdNoteIds.includes(result.notePath.split("/").pop() || "")
);

console.log("Quick search '=pagio' found:", testResults.length, "matching notes");
console.log("Note titles:", testResults.map((r: any) => r.noteTitle));
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The test file has some opportunities for improvement in terms of maintainability and type safety:

  • Code Duplication: The beforeEach block (lines 18-110) has a lot of duplicated code for creating notes. This could be refactored into a helper function to make the setup cleaner and easier to maintain.
  • Type Safety: The filter callbacks use any for the result parameter (e.g., line 141, 168). Defining an interface for the search result object would improve type safety.
  • Debugging Logs: There are several console.log statements (e.g., lines 150-151, 172-173) which are useful for debugging but should be removed from the final test code to keep the test output clean.


// Should find the note with exact "test" match, but not "testing"
// Note: This test may fail if the implementation doesn't properly handle exact matching in content
expect(ourTestNotes.length).toBeGreaterThan(0);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The assertion expect(ourTestNotes.length).toBeGreaterThan(0); is too lenient for this test case. To properly verify that an exact search for =test does not match "testing", the assertion should be more specific and check that exactly one note is found.

Suggested change
expect(ourTestNotes.length).toBeGreaterThan(0);
expect(ourTestNotes.length).toBe(1);

@contributor
Copy link
Contributor

contributor commented Oct 21, 2025

@perfectra1n search with = operator only shows one note for me in this PR. But actually there are multiple notes in db.

@perfectra1n
Copy link
Member Author

@contributor can you provide more details? Was the search term ='test' or was it ='test phrase'? How many notes would you expect for it to have found, were they all text notes?

@contributor
Copy link
Contributor

@perfectra1n

Commit 4fa4112

The search term =pagio and ='pagio' returns the same (incorrect) result - only one note.
I have 45 notes with word pagio.

The search term pagio return 114 notes - this includes false positives due to fuzzy search (the first 45 of them are correct notes).

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

size:XL This PR changes 500-999 lines, ignoring generated files.

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants