在dolly-3B模型使用自己的數據集無法輸出正確的回覆

目前我想用databricks 開源的Dolly模型做出個可以針對我給的數據集內的問題給專業知識的回覆的機器人
![image](https://github.com/liaokongVFX/LangChain-Chinese-Getting-Started-Guide/assets/83204959/aba872bb-f56f-4d20-a603-70055b106ca4)
數據庫裡大概都是這樣的教你步驟去解決問題
我想要用langchain去實現 不過我實際使用後發現回覆的不是我要的答案
這是我的code

```
from langchain.embeddings import HuggingFaceEmbeddings
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.prompts import PromptTemplate
import torch

hf_embed = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
reader = PdfReader('/content/gdrive/My Drive/data/operation Manual.pdf')
raw_text = ''
for i, page in enumerate(reader.pages):
    text = page.extract_text()
    if text:
        raw_text += text
text_splitter = CharacterTextSplitter(        
    separator = "\n",
    chunk_size = 1000,
    chunk_overlap  = 200,
    length_function = len,
)
texts = text_splitter.split_text(raw_text)
docsearch = FAISS.from_texts(texts, hf_embed)
model_name = "databricks/dolly-v2-3b"
instruct_pipeline = pipeline(model=model_name, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto", 
                               return_full_text=True, max_new_tokens=256, top_p=0.95, top_k=50)
hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.

{context}

Question: {question}
"""
PROMPT = PromptTemplate(
    template=prompt_template, input_variables=["context", "question"]
)
query = "I forgot my login password."          
docs = docsearch.similarity_search(query)
chain = load_qa_chain(llm = hf_pipe, chain_type="stuff", prompt=PROMPT)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
```
不只回覆的答案不是我數據集中的解答這個問題，產生回覆的速度也需要花費大約2.5個小時
不知道我在哪個步驟使用錯誤了
求大老幫忙解答一下 謝謝!

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

在dolly-3B模型使用自己的數據集無法輸出正確的回覆 #60

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

在dolly-3B模型使用自己的數據集無法輸出正確的回覆 #60

Description

Metadata

Metadata

Assignees

Labels

Projects

Milestone

Relationships

Development

Issue actions