-
-
Couldn't load subscription status.
- Fork 2.2k
fix(search): resolve issue when using = operator in search
#7268
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @perfectra1n, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request resolves a long-standing issue in the search mechanism where the Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request aims to fix the exact search operator =. The changes correctly implement exact word matching for single words. However, there's a critical issue with how quoted phrases (multi-word tokens) are handled in exact searches. The current implementation checks if any single word in the content matches the entire phrase, which will almost always fail. It should instead verify that all words from the phrase are present as distinct words in the content. This issue is present in both note_content_fulltext.ts and build_comparator.ts. Additionally, this logic is duplicated in both files and should be centralized. The new e2e and unit tests are very thorough and a great addition, though one e2e test has a weak assertion that should be strengthened.
| /** | ||
| * Checks if content contains the exact word (with word boundaries) | ||
| * This is case-insensitive since content and token are already normalized | ||
| */ | ||
| private containsExactWord(token: string, content: string): boolean { | ||
| // Normalize both for case-insensitive comparison | ||
| const normalizedToken = normalizeSearchText(token); | ||
| const normalizedContent = normalizeSearchText(content); | ||
|
|
||
| // Split content into words and check for exact match | ||
| const words = normalizedContent.split(/\s+/); | ||
| return words.some(word => word === normalizedToken); | ||
| } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The containsExactWord method is implemented incorrectly for multi-word tokens (e.g., from a quoted search like ="exact phrase"). It checks if any single word in the content is equal to the entire token phrase, which will almost always be false. The correct logic should be to split the token into words and verify that all of those words exist in the content.
Additionally, the comment on line 94 is misleading as the function performs normalization. The normalization of content is also redundant since it's already normalized before this function is called.
The logic here is also duplicated in apps/server/src/services/search/services/build_comparator.ts. This should ideally be centralized in a utility function.
/**
* Checks if content contains all words from the given token.
* This is case-insensitive. Assumes content is already normalized.
*/
private containsExactWord(token: string, content: string): boolean {
// content is assumed to be pre-normalized.
const contentWords = new Set(content.split(/\s+/));
const tokenWords = normalizeSearchText(token).split(/\s+/);
return tokenWords.every(word => contentWords.has(word));
}| "=": (comparedValue) => (val) => { | ||
| // For the = operator, check if the value contains the exact word (word-boundary matching) | ||
| // This is case-insensitive since both values are already lowercased | ||
| if (!val) return false; | ||
|
|
||
| const normalizedVal = normalizeSearchText(val); | ||
| const normalizedCompared = normalizeSearchText(comparedValue); | ||
|
|
||
| // Split into words and check for exact match | ||
| const words = normalizedVal.split(/\s+/); | ||
| return words.some(word => word === normalizedCompared); | ||
| }, | ||
| "!=": (comparedValue) => (val) => { | ||
| // Negation of exact word match | ||
| if (!val) return true; | ||
|
|
||
| const normalizedVal = normalizeSearchText(val); | ||
| const normalizedCompared = normalizeSearchText(comparedValue); | ||
|
|
||
| const words = normalizedVal.split(/\s+/); | ||
| return !words.some(word => word === normalizedCompared); | ||
| }, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The comparators for = and != are implemented incorrectly for multi-word values (e.g., from a quoted attribute search like #title="foo bar"). The current logic for = checks if any single word in the attribute value is equal to the entire search phrase, which is incorrect. It should check if all words from the search phrase are present in the attribute value. The != comparator has the inverse issue.
The comment on line 18 is also misleading as normalization is performed within the function. This logic is also duplicated from note_content_fulltext.ts and would benefit from being a shared utility function.
"=": (comparedValue) => (val) => {
// For the = operator, check if the value contains all words from the comparedValue.
// This is case-insensitive.
if (!val) return false;
const valWords = new Set(normalizeSearchText(val).split(/\s+/));
const comparedWords = normalizeSearchText(comparedValue).split(/\s+/);
return comparedWords.every(cw => valWords.has(cw));
},
"!=": (comparedValue) => (val) => {
// Negation of exact word match: check if value does NOT contain all words from comparedValue.
if (!val) return true;
const valWords = new Set(normalizeSearchText(val).split(/\s+/));
const comparedWords = normalizeSearchText(comparedValue).split(/\s+/);
return !comparedWords.every(cw => valWords.has(cw));
},| test.beforeEach(async ({ page, context }) => { | ||
| const app = new App(page, context); | ||
| await app.goto(); | ||
|
|
||
| // Get CSRF token | ||
| csrfToken = await page.evaluate(() => { | ||
| return (window as any).glob.csrfToken; | ||
| }); | ||
|
|
||
| expect(csrfToken).toBeTruthy(); | ||
|
|
||
| // Create test notes with specific content patterns | ||
| // Note 1: Contains exactly "pagio" in title | ||
| const note1 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, { | ||
| headers: { "x-csrf-token": csrfToken }, | ||
| data: { | ||
| title: "Test Note with pagio", | ||
| content: "This note contains the word pagio in the content.", | ||
| type: "text" | ||
| } | ||
| }); | ||
| expect(note1.ok()).toBeTruthy(); | ||
| const note1Data = await note1.json(); | ||
| createdNoteIds.push(note1Data.note.noteId); | ||
|
|
||
| // Note 2: Contains "page" (not exact match) | ||
| const note2 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, { | ||
| headers: { "x-csrf-token": csrfToken }, | ||
| data: { | ||
| title: "Test Note with page", | ||
| content: "This note contains the word page in the content.", | ||
| type: "text" | ||
| } | ||
| }); | ||
| expect(note2.ok()).toBeTruthy(); | ||
| const note2Data = await note2.json(); | ||
| createdNoteIds.push(note2Data.note.noteId); | ||
|
|
||
| // Note 3: Contains "pages" (plural, not exact match) | ||
| const note3 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, { | ||
| headers: { "x-csrf-token": csrfToken }, | ||
| data: { | ||
| title: "Test Note with pages", | ||
| content: "This note contains the word pages in the content.", | ||
| type: "text" | ||
| } | ||
| }); | ||
| expect(note3.ok()).toBeTruthy(); | ||
| const note3Data = await note3.json(); | ||
| createdNoteIds.push(note3Data.note.noteId); | ||
|
|
||
| // Note 4: Contains "homepage" (contains "page", not exact match) | ||
| const note4 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, { | ||
| headers: { "x-csrf-token": csrfToken }, | ||
| data: { | ||
| title: "Homepage Note", | ||
| content: "This note is about homepage content.", | ||
| type: "text" | ||
| } | ||
| }); | ||
| expect(note4.ok()).toBeTruthy(); | ||
| const note4Data = await note4.json(); | ||
| createdNoteIds.push(note4Data.note.noteId); | ||
|
|
||
| // Note 5: Another note with exact "pagio" in content | ||
| const note5 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, { | ||
| headers: { "x-csrf-token": csrfToken }, | ||
| data: { | ||
| title: "Another pagio Note", | ||
| content: "This is another note with pagio content for testing exact matches.", | ||
| type: "text" | ||
| } | ||
| }); | ||
| expect(note5.ok()).toBeTruthy(); | ||
| const note5Data = await note5.json(); | ||
| createdNoteIds.push(note5Data.note.noteId); | ||
|
|
||
| // Note 6: Contains "pagio" in title only | ||
| const note6 = await page.request.post(`${BASE_URL}/api/notes/root/children?target=into&targetBranchId=`, { | ||
| headers: { "x-csrf-token": csrfToken }, | ||
| data: { | ||
| title: "pagio", | ||
| content: "This note has pagio as the title.", | ||
| type: "text" | ||
| } | ||
| }); | ||
| expect(note6.ok()).toBeTruthy(); | ||
| const note6Data = await note6.json(); | ||
| createdNoteIds.push(note6Data.note.noteId); | ||
|
|
||
| // Wait a bit for indexing | ||
| await page.waitForTimeout(500); | ||
| }); | ||
|
|
||
| test.afterEach(async ({ page }) => { | ||
| // Clean up created notes | ||
| for (const noteId of createdNoteIds) { | ||
| try { | ||
| const taskId = `cleanup-${Math.random().toString(36).substr(2, 9)}`; | ||
| await page.request.delete(`${BASE_URL}/api/notes/${noteId}?taskId=${taskId}&last=true`, { | ||
| headers: { "x-csrf-token": csrfToken } | ||
| }); | ||
| } catch (e) { | ||
| console.error(`Failed to delete note ${noteId}:`, e); | ||
| } | ||
| } | ||
| createdNoteIds = []; | ||
| }); | ||
|
|
||
| test("Quick search without = operator returns all partial matches", async ({ page }) => { | ||
| // Test the /quick-search endpoint without the = operator | ||
| const response = await page.request.get(`${BASE_URL}/api/quick-search/pag`, { | ||
| headers: { "x-csrf-token": csrfToken } | ||
| }); | ||
|
|
||
| expect(response.ok()).toBeTruthy(); | ||
| const data = await response.json(); | ||
|
|
||
| // Should return multiple notes including "page", "pages", "homepage" | ||
| expect(data.searchResultNoteIds).toBeDefined(); | ||
| expect(data.searchResults).toBeDefined(); | ||
|
|
||
| // Filter to only our test notes | ||
| const testResults = data.searchResults.filter((result: any) => | ||
| result.noteTitle.includes("page") || | ||
| result.noteTitle.includes("pagio") || | ||
| result.noteTitle.includes("Homepage") | ||
| ); | ||
|
|
||
| // Should find at least "page", "pages", "homepage", and "pagio" notes | ||
| expect(testResults.length).toBeGreaterThanOrEqual(4); | ||
|
|
||
| console.log("Quick search 'pag' found:", testResults.length, "matching notes"); | ||
| console.log("Note titles:", testResults.map((r: any) => r.noteTitle)); | ||
| }); | ||
|
|
||
| test("Quick search with = operator returns only exact matches", async ({ page }) => { | ||
| // Test the /quick-search endpoint WITH the = operator | ||
| const response = await page.request.get(`${BASE_URL}/api/quick-search/=pagio`, { | ||
| headers: { "x-csrf-token": csrfToken } | ||
| }); | ||
|
|
||
| expect(response.ok()).toBeTruthy(); | ||
| const data = await response.json(); | ||
|
|
||
| // Should return only notes with exact "pagio" match | ||
| expect(data.searchResultNoteIds).toBeDefined(); | ||
| expect(data.searchResults).toBeDefined(); | ||
|
|
||
| // Filter to only our test notes | ||
| const testResults = data.searchResults.filter((result: any) => | ||
| createdNoteIds.includes(result.notePath.split("/").pop() || "") | ||
| ); | ||
|
|
||
| console.log("Quick search '=pagio' found:", testResults.length, "matching notes"); | ||
| console.log("Note titles:", testResults.map((r: any) => r.noteTitle)); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The test file has some opportunities for improvement in terms of maintainability and type safety:
- Code Duplication: The
beforeEachblock (lines 18-110) has a lot of duplicated code for creating notes. This could be refactored into a helper function to make the setup cleaner and easier to maintain. - Type Safety: The
filtercallbacks useanyfor theresultparameter (e.g., line 141, 168). Defining an interface for the search result object would improve type safety. - Debugging Logs: There are several
console.logstatements (e.g., lines 150-151, 172-173) which are useful for debugging but should be removed from the final test code to keep the test output clean.
|
|
||
| // Should find the note with exact "test" match, but not "testing" | ||
| // Note: This test may fail if the implementation doesn't properly handle exact matching in content | ||
| expect(ourTestNotes.length).toBeGreaterThan(0); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The assertion expect(ourTestNotes.length).toBeGreaterThan(0); is too lenient for this test case. To properly verify that an exact search for =test does not match "testing", the assertion should be more specific and check that exactly one note is found.
| expect(ourTestNotes.length).toBeGreaterThan(0); | |
| expect(ourTestNotes.length).toBe(1); |
|
@perfectra1n search with |
|
@contributor can you provide more details? Was the search term |
|
Commit 4fa4112 The search term The search term |
=textreturns matches that only include "text"Phrase example:

Word example (with working highlighting!)
