Skip to content

Conversation

azaddhirajkumar
Copy link
Contributor

No description provided.

@azaddhirajkumar azaddhirajkumar requested a review from a team as a code owner October 14, 2025 11:37
Copy link
Contributor

Summary of Changes

Hello @azaddhirajkumar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!

This pull request introduces a new, in-depth tutorial designed to help developers build semantic search applications. It focuses on integrating Microsoft's Semantic Kernel with Couchbase's robust vector search features via the Couchbase .NET Vector Store Connector. The tutorial covers the entire process, from setting up the development environment and defining data models to generating embeddings with OpenAI, ingesting data, and executing various vector search queries. It also provides extensive information on configuring different types of Couchbase vector indexes, offering a practical guide for leveraging AI-powered search capabilities.

Highlights

  • New Tutorial Added: A comprehensive tutorial has been added, guiding users through building vector search applications using the Couchbase .NET Semantic Kernel Connector and OpenAI.
  • Couchbase Integration with Semantic Kernel: The tutorial demonstrates seamless integration of Microsoft Semantic Kernel with Couchbase's vector search capabilities, covering BHIVE, Composite, and FTS index types.
  • Embedding Generation and Storage: It explains how to generate text embeddings using OpenAI's text-embedding-ada-002 model and efficiently store them within Couchbase.
  • Vector Search Operations: The tutorial details both pure and filtered vector search queries, illustrating their underlying translation to SQL++ with ANN_DISTANCE and WHERE clauses.
  • Advanced Index Configuration: Detailed guidance is provided on configuring Couchbase vector indexes, including BHIVE, Composite, and FTS, along with explanations of parameters like centroids and quantization.
Using Gemini Code Assist

The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.

Invoking Gemini

You can request assistance from Gemini at any point by creating a comment using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands on the current page.

Feature Command Description
Code Review /gemini review Performs a code review for the current pull request in its current state.
Pull Request Summary /gemini summary Provides a summary of the current pull request in its current state.
Comment @gemini-code-assist Responds in comments when explicitly tagged, both in pull request comments and review comments.
Help /gemini help Displays a list of available commands.

Customization

To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.

Limitations & Feedback

Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here.

You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.

Footnotes

  1. Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution.

Copy link
Contributor

@gemini-code-assist gemini-code-assist bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Code Review

This pull request adds a new tutorial for using the Semantic Kernel with Couchbase. The tutorial is comprehensive and well-structured. However, I've found several issues that need to be addressed before merging. There are critical errors in the provided JSON and SQL++ code snippets (missing commas, trailing commas) that will prevent them from working. Additionally, some links point to temporary or internal resources (a feature branch and a test documentation server), which should be updated to stable, public URLs. There are also some invalid tags in the frontmatter that will likely fail validation, and a section on embedding generation is potentially confusing. I've left specific comments with suggestions for each of these points.

Comment on lines +84 to +91
"Couchbase": {
"ConnectionString": "couchbase://localhost",
"Username": "Administrator",
"Password": "your-password",
"BucketName": "demo",
"ScopeName": "semantic-kernel",
"CollectionName": "glossary"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The JSON snippet for the Couchbase configuration is invalid. It is missing commas (,) between the key-value pairs. This will cause a JSON parsing error when a user tries to copy and use this configuration.

Suggested change
"Couchbase": {
"ConnectionString": "couchbase://localhost",
"Username": "Administrator",
"Password": "your-password",
"BucketName": "demo",
"ScopeName": "semantic-kernel",
"CollectionName": "glossary"
}
"Couchbase": {
"ConnectionString": "couchbase://localhost",
"Username": "Administrator",
"Password": "your-password",
"BucketName": "demo",
"ScopeName": "semantic-kernel",
"CollectionName": "glossary"
}

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid

Comment on lines +187 to +191
USING GSI WITH {
"dimension": 1536,
"similarity": "cosine",
"description": "IVF,SQ8"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

critical

The JSON object in the WITH clause of the CREATE VECTOR INDEX statement is invalid due to a trailing comma after the "similarity": "cosine" line. This will cause a SQL++ syntax error when executing the statement.

Suggested change
USING GSI WITH {
"dimension": 1536,
"similarity": "cosine",
"description": "IVF,SQ8"
}
USING GSI WITH {
"dimension": 1536,
"similarity": "cosine",
"description": "IVF,SQ8"
}

## Repository Links

- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the example code points to a feature branch (Support-Bhive-and-Composite-Index). This is not ideal for a public tutorial, as feature branches are often temporary and may be deleted. It's recommended to update this link to point to the main branch (e.g., main or master) or a specific release tag once the code is merged.

Suggested change
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/main/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the example not be in Couchbase-Examples?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can link the example from the framework README

Comment on lines +151 to +157
**Automatic Embedding Generation** - The connector integrates with Semantic Kernel's `IEmbeddingGenerator` interface to automatically generate embeddings from text. When you provide an embedding generator (in this case, OpenAI's `text-embedding-ada-002`), the text is automatically converted to vectors:

```csharp
// Generate embedding from text
var embedding = await embeddingGenerator.GenerateAsync(glossary.Definition);
glossary.DefinitionEmbedding = embedding.Vector;
```
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The title of this section is "Automatic Embedding Generation" and the text says "the text is automatically converted to vectors". However, the code snippet that follows shows a manual process of generating an embedding and assigning it to the model property. This is contradictory and could confuse the reader. If the UpsertAsync method handles embedding generation automatically, this section should explain that and either omit this manual code snippet or clarify that this is what happens "under the hood".

- Use **Composite** when scalar filters eliminate large portions of data before vector comparison
- Use **FTS** when you need hybrid search combining full-text and semantic search

For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html).
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

The link to the "Couchbase Vector Index Documentation" points to a preview.docs-test.couchbase.com URL. This appears to be an internal or test documentation server. For a public tutorial, this should be updated to the final, public documentation URL.

Suggested change
For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html).
For more details, see the [Couchbase Vector Index Documentation](https://docs.couchbase.com/server/current/vector-search/vector-search-overview.html).

- `IVF1000,SQ6` - 1000 centroids, 6-bit quantization (faster, less accurate)
- `IVF,PQ32x8` - Auto centroids, product quantization (better accuracy)

For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

high

This link points to a preview.docs-test.couchbase.com URL, which appears to be an internal or staging documentation server. For a public tutorial, this should be updated to point to the official public documentation.

Suggested change
For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation.
For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-search/indexing-vectors.html) documentation.

Comment on lines 17 to 20
tags:
- Semantic Kernel
- OpenAI
- Vector Search
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

medium

The tags Semantic Kernel and Vector Search are not in the list of allowed tags defined in test/test-markdown-frontmatter.js. This will likely cause the frontmatter validation test to fail. Please use existing tags or add these new tags to the validation script. Some existing tags that might be relevant are Artificial Intelligence and .NET.

Suggested change
tags:
- Semantic Kernel
- OpenAI
- Vector Search
tags:
- .NET
- OpenAI
- Artificial Intelligence

## Repository Links

- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the example not be in Couchbase-Examples?

## Repository Links

- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can link the example from the framework README


### 2. OpenAI API Access
- **OpenAI API Key** - Get one from: https://platform.openai.com/api-keys
- Used for generating text embeddings with `text-embedding-ada-002` model
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason to use the old embedding model? text-embedding-3-small should be better both from cost & performance perspective


### 3. Configuration Setup

Update `appsettings.Development.json` with your credentials:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Should the bucket, scope & collection exist?

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

You can also mention that these values can be changed with corresponding updation of the code.

"glossary",
new CouchbaseQueryCollectionOptions
{
IndexName = "bhive_glossary_index", // BHIVE index name
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Are you able to create the index without having any data? Or do you create the index after inserting the data? I think this point is worth highlighting.
Also is the index optional?

- **Include Fields**: Non-vector fields for faster retrieval
- **Quantization**: `IVF,SQ8` (Inverted File with 8-bit scalar quantization)

> **Note**: Composite vector indexes can be created similarly by adding scalar fields to the index definition. Use composite indexes when your queries frequently filter on scalar values before vector comparison. For this demo, we use BHIVE since we're demonstrating pure semantic search capabilities.
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you link to the composite index docs?

2. **Get Collection** - Use `GetCollection<TKey, TRecord>()` to get a typed collection reference
3. **Generate Embeddings** - Use Semantic Kernel's `IEmbeddingGenerator` to convert text to vectors
4. **Upsert Records** - Call `UpsertAsync()` to insert/update records with embeddings
5. **Create Index** - Set up a vector index using SQL++ for optimal search performance
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is optional right? Without an index, the brute force knn search would be performed.

Comment on lines +84 to +91
"Couchbase": {
"ConnectionString": "couchbase://localhost",
"Username": "Administrator",
"Password": "your-password",
"BucketName": "demo",
"ScopeName": "semantic-kernel",
"CollectionName": "glossary"
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is valid

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

2 participants