-
Notifications
You must be signed in to change notification settings - Fork 5
Semantic Kernel Tutorial #68
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: main
Are you sure you want to change the base?
Conversation
Summary of ChangesHello @azaddhirajkumar, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed! This pull request introduces a new, in-depth tutorial designed to help developers build semantic search applications. It focuses on integrating Microsoft's Semantic Kernel with Couchbase's robust vector search features via the Couchbase .NET Vector Store Connector. The tutorial covers the entire process, from setting up the development environment and defining data models to generating embeddings with OpenAI, ingesting data, and executing various vector search queries. It also provides extensive information on configuring different types of Couchbase vector indexes, offering a practical guide for leveraging AI-powered search capabilities. Highlights
Using Gemini Code AssistThe full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips. Invoking Gemini You can request assistance from Gemini at any point by creating a comment using either
Customization To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a Limitations & Feedback Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments. If you're interested in giving your feedback about your experience with Gemini Code Assist for Github and other Google products, sign up here. You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension. Footnotes
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request adds a new tutorial for using the Semantic Kernel with Couchbase. The tutorial is comprehensive and well-structured. However, I've found several issues that need to be addressed before merging. There are critical errors in the provided JSON and SQL++ code snippets (missing commas, trailing commas) that will prevent them from working. Additionally, some links point to temporary or internal resources (a feature branch and a test documentation server), which should be updated to stable, public URLs. There are also some invalid tags in the frontmatter that will likely fail validation, and a section on embedding generation is potentially confusing. I've left specific comments with suggestions for each of these points.
"Couchbase": { | ||
"ConnectionString": "couchbase://localhost", | ||
"Username": "Administrator", | ||
"Password": "your-password", | ||
"BucketName": "demo", | ||
"ScopeName": "semantic-kernel", | ||
"CollectionName": "glossary" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON snippet for the Couchbase
configuration is invalid. It is missing commas (,
) between the key-value pairs. This will cause a JSON parsing error when a user tries to copy and use this configuration.
"Couchbase": { | |
"ConnectionString": "couchbase://localhost", | |
"Username": "Administrator", | |
"Password": "your-password", | |
"BucketName": "demo", | |
"ScopeName": "semantic-kernel", | |
"CollectionName": "glossary" | |
} | |
"Couchbase": { | |
"ConnectionString": "couchbase://localhost", | |
"Username": "Administrator", | |
"Password": "your-password", | |
"BucketName": "demo", | |
"ScopeName": "semantic-kernel", | |
"CollectionName": "glossary" | |
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is valid
USING GSI WITH { | ||
"dimension": 1536, | ||
"similarity": "cosine", | ||
"description": "IVF,SQ8" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The JSON object in the WITH
clause of the CREATE VECTOR INDEX
statement is invalid due to a trailing comma after the "similarity": "cosine"
line. This will cause a SQL++ syntax error when executing the statement.
USING GSI WITH { | |
"dimension": 1536, | |
"similarity": "cosine", | |
"description": "IVF,SQ8" | |
} | |
USING GSI WITH { | |
"dimension": 1536, | |
"similarity": "cosine", | |
"description": "IVF,SQ8" | |
} |
## Repository Links | ||
|
||
- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel | ||
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the example code points to a feature branch (Support-Bhive-and-Composite-Index
). This is not ideal for a public tutorial, as feature branches are often temporary and may be deleted. It's recommended to update this link to point to the main branch (e.g., main
or master
) or a specific release tag once the code is merged.
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase | |
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/main/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the example not be in Couchbase-Examples?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can link the example from the framework README
**Automatic Embedding Generation** - The connector integrates with Semantic Kernel's `IEmbeddingGenerator` interface to automatically generate embeddings from text. When you provide an embedding generator (in this case, OpenAI's `text-embedding-ada-002`), the text is automatically converted to vectors: | ||
|
||
```csharp | ||
// Generate embedding from text | ||
var embedding = await embeddingGenerator.GenerateAsync(glossary.Definition); | ||
glossary.DefinitionEmbedding = embedding.Vector; | ||
``` |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The title of this section is "Automatic Embedding Generation" and the text says "the text is automatically converted to vectors". However, the code snippet that follows shows a manual process of generating an embedding and assigning it to the model property. This is contradictory and could confuse the reader. If the UpsertAsync
method handles embedding generation automatically, this section should explain that and either omit this manual code snippet or clarify that this is what happens "under the hood".
- Use **Composite** when scalar filters eliminate large portions of data before vector comparison | ||
- Use **FTS** when you need hybrid search combining full-text and semantic search | ||
|
||
For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The link to the "Couchbase Vector Index Documentation" points to a preview.docs-test.couchbase.com
URL. This appears to be an internal or test documentation server. For a public tutorial, this should be updated to the final, public documentation URL.
For more details, see the [Couchbase Vector Index Documentation](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/use-vector-indexes.html). | |
For more details, see the [Couchbase Vector Index Documentation](https://docs.couchbase.com/server/current/vector-search/vector-search-overview.html). |
- `IVF1000,SQ6` - 1000 centroids, 6-bit quantization (faster, less accurate) | ||
- `IVF,PQ32x8` - Auto centroids, product quantization (better accuracy) | ||
|
||
For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This link points to a preview.docs-test.couchbase.com
URL, which appears to be an internal or staging documentation server. For a public tutorial, this should be updated to point to the official public documentation.
For detailed configuration options, see the [Quantization & Centroid Settings](https://preview.docs-test.couchbase.com/docs-server-DOC-12565_vector_search_concepts/server/current/vector-index/hyperscale-vector-index.html#algo_settings) documentation. | |
For detailed configuration options, see the [Quantization & Centroid Settings](https://docs.couchbase.com/server/current/vector-search/indexing-vectors.html) documentation. |
tags: | ||
- Semantic Kernel | ||
- OpenAI | ||
- Vector Search |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The tags Semantic Kernel
and Vector Search
are not in the list of allowed tags defined in test/test-markdown-frontmatter.js
. This will likely cause the frontmatter validation test to fail. Please use existing tags or add these new tags to the validation script. Some existing tags that might be relevant are Artificial Intelligence
and .NET
.
tags: | |
- Semantic Kernel | |
- OpenAI | |
- Vector Search | |
tags: | |
- .NET | |
- OpenAI | |
- Artificial Intelligence |
## Repository Links | ||
|
||
- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel | ||
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the example not be in Couchbase-Examples?
## Repository Links | ||
|
||
- **Connector Repository**: [couchbase-semantic-kernel](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel) - The official Couchbase .NET Vector Store Connector for Microsoft Semantic Kernel | ||
- **This Example**: [CouchbaseVectorSearchDemo](https://github.com/Couchbase-Ecosystem/couchbase-semantic-kernel/tree/Support-Bhive-and-Composite-Index/CouchbaseVectorSearchDemo) - Complete working example demonstrating vector search with Couchbase |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can link the example from the framework README
|
||
### 2. OpenAI API Access | ||
- **OpenAI API Key** - Get one from: https://platform.openai.com/api-keys | ||
- Used for generating text embeddings with `text-embedding-ada-002` model |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason to use the old embedding model? text-embedding-3-small should be better both from cost & performance perspective
|
||
### 3. Configuration Setup | ||
|
||
Update `appsettings.Development.json` with your credentials: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should the bucket, scope & collection exist?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You can also mention that these values can be changed with corresponding updation of the code.
"glossary", | ||
new CouchbaseQueryCollectionOptions | ||
{ | ||
IndexName = "bhive_glossary_index", // BHIVE index name |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Are you able to create the index without having any data? Or do you create the index after inserting the data? I think this point is worth highlighting.
Also is the index optional?
- **Include Fields**: Non-vector fields for faster retrieval | ||
- **Quantization**: `IVF,SQ8` (Inverted File with 8-bit scalar quantization) | ||
|
||
> **Note**: Composite vector indexes can be created similarly by adding scalar fields to the index definition. Use composite indexes when your queries frequently filter on scalar values before vector comparison. For this demo, we use BHIVE since we're demonstrating pure semantic search capabilities. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can you link to the composite index docs?
2. **Get Collection** - Use `GetCollection<TKey, TRecord>()` to get a typed collection reference | ||
3. **Generate Embeddings** - Use Semantic Kernel's `IEmbeddingGenerator` to convert text to vectors | ||
4. **Upsert Records** - Call `UpsertAsync()` to insert/update records with embeddings | ||
5. **Create Index** - Set up a vector index using SQL++ for optimal search performance |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is optional right? Without an index, the brute force knn search would be performed.
"Couchbase": { | ||
"ConnectionString": "couchbase://localhost", | ||
"Username": "Administrator", | ||
"Password": "your-password", | ||
"BucketName": "demo", | ||
"ScopeName": "semantic-kernel", | ||
"CollectionName": "glossary" | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This is valid
No description provided.