-
Notifications
You must be signed in to change notification settings - Fork 664
Open
Description
目前我想用databricks 開源的Dolly模型做出個可以針對我給的數據集內的問題給專業知識的回覆的機器人
數據庫裡大概都是這樣的教你步驟去解決問題
我想要用langchain去實現 不過我實際使用後發現回覆的不是我要的答案
這是我的code
from langchain.embeddings import HuggingFaceEmbeddings
from PyPDF2 import PdfReader
from langchain.text_splitter import CharacterTextSplitter
from langchain.vectorstores import ElasticVectorSearch, Pinecone, Weaviate, FAISS
from langchain.chains.question_answering import load_qa_chain
from langchain.llms import HuggingFacePipeline
from transformers import AutoTokenizer, AutoModelForCausalLM, pipeline
from langchain.prompts import PromptTemplate
import torch
hf_embed = HuggingFaceEmbeddings(model_name="sentence-transformers/all-mpnet-base-v2")
from google.colab import drive
drive.mount('/content/gdrive', force_remount=True)
root_dir = "/content/gdrive/My Drive/"
reader = PdfReader('/content/gdrive/My Drive/data/operation Manual.pdf')
raw_text = ''
for i, page in enumerate(reader.pages):
text = page.extract_text()
if text:
raw_text += text
text_splitter = CharacterTextSplitter(
separator = "\n",
chunk_size = 1000,
chunk_overlap = 200,
length_function = len,
)
texts = text_splitter.split_text(raw_text)
docsearch = FAISS.from_texts(texts, hf_embed)
model_name = "databricks/dolly-v2-3b"
instruct_pipeline = pipeline(model=model_name, torch_dtype=torch.bfloat16, trust_remote_code=True, device_map="auto",
return_full_text=True, max_new_tokens=256, top_p=0.95, top_k=50)
hf_pipe = HuggingFacePipeline(pipeline=instruct_pipeline)
prompt_template = """Use the following pieces of context to answer the question at the end. If you don't know the answer, just say that you don't know, don't try to make up an answer.
{context}
Question: {question}
"""
PROMPT = PromptTemplate(
template=prompt_template, input_variables=["context", "question"]
)
query = "I forgot my login password."
docs = docsearch.similarity_search(query)
chain = load_qa_chain(llm = hf_pipe, chain_type="stuff", prompt=PROMPT)
chain({"input_documents": docs, "question": query}, return_only_outputs=True)
不只回覆的答案不是我數據集中的解答這個問題,產生回覆的速度也需要花費大約2.5個小時
不知道我在哪個步驟使用錯誤了
求大老幫忙解答一下 謝謝!
Metadata
Metadata
Assignees
Labels
No labels