# Conversational Interface - Chatbot with Claude LLM

> *This notebook should work well with the **`Data Science 3.0`** kernel in SageMaker Studio*

In this notebook, we will build a chatbot using the Foundation Models (FMs) in Amazon Bedrock. For our use-case we use Claude V3 Sonnet as our foundation models.  For more details refer to [Documentation](https://aws.amazon.com/bedrock/claude/). The ideal balance between intelligence and speed—particularly for enterprise workloads. It excels at complex reasoning, nuanced content creation, scientific queries, math, and coding. Data teams can use Sonnet for RAG, as well as search and retrieval across vast amounts of information while sales teams can leverage Sonnet for product recommendations, forecasting, and targeted marketing. 

## Overview

Conversational interfaces such as chatbots and virtual assistants can be used to enhance the user experience for your customers.Chatbots uses natural language processing (NLP) and machine learning algorithms to understand and respond to user queries. Chatbots can be used in a variety of applications, such as customer service, sales, and e-commerce, to provide quick and efficient responses to users. They can be accessed through various channels such as websites, social media platforms, and messaging apps.

### This notebook goes into the details of how to speed up the responses for large contexts and data sources


## Chatbot using Amazon Bedrock

![Amazon Bedrock - Conversational Interface](./images/chatbot_bedrock.png)


## Use Cases

1. **Chatbot (Basic)** - Zero Shot chatbot with a FM model
2. **Chatbot using prompt** - template(Langchain) - Chatbot with some context provided in the prompt template
3. **Chatbot with persona** - Chatbot with defined roles. i.e. Career Coach and Human interactions
4. **Contextual-aware chatbot** - Passing in context through an external file by generating embeddings.

## Langchain framework for building Chatbot with Amazon Bedrock
In Conversational interfaces such as chatbots, it is highly important to remember previous interactions, both at a short term but also at a long term level.

LangChain provides memory components in two forms. First, LangChain provides helper utilities for managing and manipulating previous chat messages. These are designed to be modular and useful regardless of how they are used. Secondly, LangChain provides easy ways to incorporate these utilities into chains.
It allows us to easily define and interact with different types of abstractions, which make it easy to build powerful chatbots.

## Building Chatbot with Context - Key Elements

The first process in a building a contextual-aware chatbot is to **generate embeddings** for the context. Typically, you will have an ingestion process which will run through your embedding model and generate the embeddings which will be stored in a sort of a vector store. In this example we are using Titan Embeddings model for this

![Embeddings](./images/embeddings_lang.png)

Second process is the user request orchestration , interaction,  invoking and returing the results

![Chatbot](./images/chatbot_lang.png)

## Architecture [Context Aware Chatbot]
![4](./images/context-aware-chatbot.png)


## Setup

⚠️ ⚠️ ⚠️ Before running this notebook, ensure you've  run these installs below


In [None]:
#!pip install langchain==0.1.17
#!pip install langchain-anthropic
#!pip install boto3==1.34.95
#!pip install faiss-cpu==1.8.0

#### To install the langchain-aws

you can run the `pip install langchain-aws`

to get the latest release use these commands below

In [None]:
import warnings

from io import StringIO
import sys
import textwrap
import os
from typing import Optional

# External Dependencies:
import boto3
from botocore.config import Config

warnings.filterwarnings('ignore')

def print_ww(*args, width: int = 100, **kwargs):
    """Like print(), but wraps output to `width` characters (default 100)"""
    buffer = StringIO()
    try:
        _stdout = sys.stdout
        sys.stdout = buffer
        print(*args, **kwargs)
        output = buffer.getvalue()
    finally:
        sys.stdout = _stdout
    for line in output.splitlines():
        print("\n".join(textwrap.wrap(line, width=width)))
        



def get_bedrock_client(
    assumed_role: Optional[str] = None,
    region: Optional[str] = None,
    runtime: Optional[bool] = True,
):
    """Create a boto3 client for Amazon Bedrock, with optional configuration overrides

    Parameters
    ----------
    assumed_role :
        Optional ARN of an AWS IAM role to assume for calling the Bedrock service. If not
        specified, the current active credentials will be used.
    region :
        Optional name of the AWS Region in which the service should be called (e.g. "us-east-1").
        If not specified, AWS_REGION or AWS_DEFAULT_REGION environment variable will be used.
    runtime :
        Optional choice of getting different client to perform operations with the Amazon Bedrock service.
    """
    if region is None:
        target_region = os.environ.get("AWS_REGION", os.environ.get("AWS_DEFAULT_REGION"))
    else:
        target_region = region

    print(f"Create new client\n  Using region: {target_region}")
    session_kwargs = {"region_name": target_region}
    client_kwargs = {**session_kwargs}

    profile_name = os.environ.get("AWS_PROFILE")
    if profile_name:
        print(f"  Using profile: {profile_name}")
        session_kwargs["profile_name"] = profile_name

    retry_config = Config(
        region_name=target_region,
        retries={
            "max_attempts": 10,
            "mode": "standard",
        },
    )
    session = boto3.Session(**session_kwargs)

    if assumed_role:
        print(f"  Using role: {assumed_role}", end='')
        sts = session.client("sts")
        response = sts.assume_role(
            RoleArn=str(assumed_role),
            RoleSessionName="langchain-llm-1"
        )
        print(" ... successful!")
        client_kwargs["aws_access_key_id"] = response["Credentials"]["AccessKeyId"]
        client_kwargs["aws_secret_access_key"] = response["Credentials"]["SecretAccessKey"]
        client_kwargs["aws_session_token"] = response["Credentials"]["SessionToken"]

    if runtime:
        service_name='bedrock-runtime'
    else:
        service_name='bedrock'

    bedrock_client = session.client(
        service_name=service_name,
        config=retry_config,
        **client_kwargs
    )

    print("boto3 Bedrock client successfully created!")
    print(bedrock_client._endpoint)
    return bedrock_client

In [None]:
import json
import os
import sys

import boto3




# ---- ⚠️ Un-comment and edit the below lines as needed for your AWS setup ⚠️ ----

# os.environ["AWS_DEFAULT_REGION"] = "<REGION_NAME>"  # E.g. "us-east-1"
# os.environ["AWS_PROFILE"] = "<YOUR_PROFILE>"
# os.environ["BEDROCK_ASSUME_ROLE"] = "<YOUR_ROLE_ARN>"  # E.g. "arn:aws:..."


boto3_bedrock = get_bedrock_client(
    assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),
    region='us-west-2', #os.environ.get("AWS_DEFAULT_REGION", None),
    runtime=True
)

### Anthropic Claude

#### Input

```json

"messages": [
    {"role": "user", "content": "Hello, Claude"},
    {"role": "assistant", "content": "Hello!"},
    {"role": "user", "content": "Can you describe LLMs to me?"}
        
]
{
    "anthropic_version": "bedrock-2023-05-31",
    "max_tokens": 100,
    "messages": messages,
    "temperature": 0.5,
    "top_p": 0.9
} 
```

#### Output

```json
{
    'id': 'msg_01T',
    'type': 'message',
    'role': 'assistant',
    'content': [
        {
            'type': 'text',
            'text': 'Sure, the concept...'
        }
    ],
    'model': 'model_id',
    'stop_reason': 'max_tokens',
    'stop_sequence': None,
    'usage': {'input_tokens':xy, 'output_tokens': yz}}
```




## Chatbot (Basic - without context)

This is bare bone boto3 calls examples and samples

**Note:** The model outputs are non-deterministic

In [None]:
from langchain_community.chat_models import BedrockChat
from langchain_core.messages import HumanMessage
from langchain.chains import ConversationChain
from langchain.memory import ConversationBufferMemory
from langchain_core.output_parsers import StrOutputParser
from langchain.prompts import ChatPromptTemplate

In [None]:
modelId = "anthropic.claude-3-sonnet-20240229-v1:0" #"anthropic.claude-v2"

messages=[
    { 
        "role":'user', 
        "content":[{
            'type':'text',
            'text': "What is quantum mechanics? "
        }]
    },
    { 
        "role":'assistant', 
        "content":[{
            'type':'text',
            'text': "It is a branch of physics that describes how matter and energy interact with discrete energy values "
        }]
    },
    { 
        "role":'user', 
        "content":[{
            'type':'text',
            'text': "Can you explain a bit more about discrete energies?"
        }]
    }
]
body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 100,
            "messages": messages,
            "temperature": 0.5,
            "top_p": 0.9
        }  
    )  
    
response = boto3_bedrock.invoke_model(body=body, modelId=modelId)
response_body = json.loads(response.get('body').read())
print(response_body)


def test_sample_claude_invoke(prompt_str,boto3_bedrock ):
    modelId = "anthropic.claude-3-sonnet-20240229-v1:0" #"anthropic.claude-v2"
    messages=[{ 
        "role":'user', 
        "content":[{
            'type':'text',
            'text': prompt_str
        }]
    }]
    body=json.dumps(
        {
            "anthropic_version": "bedrock-2023-05-31",
            "max_tokens": 100,
            "messages": messages,
            "temperature": 0.5,
            "top_p": 0.9
        }  
    )  
    response = boto3_bedrock.invoke_model(body=body, modelId=modelId)
    response_body = json.loads(response.get('body').read())
    return response_body


test_sample_claude_invoke("what is quantum mechanics", boto3_bedrock)   

### Use BedrockChat class as a bare bones samples

In [None]:
#print(dir(cl_llm))

modelId = "anthropic.claude-3-sonnet-20240229-v1:0" #"anthropic.claude-v2"

messages=[
    { 
        "role":'user', 
        "content":[{
            'type':'text',
            'text': "What is quantum mechanics? "
        }]
    },
]
body_json=json.dumps(
        {
            #"anthropic_version": "bedrock-2023-05-31",
            #"max_tokens": 100,
            "messages": messages,
            "temperature": 0.5,
            "top_p": 0.9
        }  
    )  
cl_llm = BedrockChat(
    model_id=modelId,
    client=boto3_bedrock,
    #model_kwargs={"max_tokens_to_sample": 100},
    model_kwargs={"temperature": 0.1, 'max_tokens': 100},
    
)
cl_llm.predict(body_json)


## OPTION 1: Use the Vector DB 

We use a vector DB to create a curated and simplified content and context . Thi will help to return the curated responses

This works to reduce the over all latency


LangChain provides several classes and functions to make constructing and working with prompts easy. We are going to use the [PromptTemplate](https://python.langchain.com/en/latest/modules/prompts/getting_started.html) class to construct the prompt from a f-string template. 

In [None]:
from langchain.document_loaders import PyPDFLoader
from langchain.tools.retriever import create_retriever_tool
from langchain_community.document_loaders import TextLoader, PyPDFLoader
from langchain_community.vectorstores import FAISS
from langchain_text_splitters import CharacterTextSplitter
from langchain.embeddings.bedrock import BedrockEmbeddings

br_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=boto3_bedrock)

loader = PyPDFLoader("./rag_data/Amazon-com-Inc-2023-Shareholder-Letter.pdf") # --- > 219 docs with 400 chars, each row consists in a question column and an answer column
documents_aws = loader.load() #
print(f"Number of documents={len(documents_aws)}")

In [None]:

docs = CharacterTextSplitter(chunk_size=2000, chunk_overlap=400, separator="\n").split_documents(documents_aws) #-  separator=","

print(f"Number of documents after split and chunking={len(docs)}")
vectorstore_faiss_aws = None

    
vectorstore_faiss_aws = FAISS.from_documents(
    documents=docs,
     embedding = br_embeddings
)

print(f"vectorstore_faiss_aws: number of elements in the index={vectorstore_faiss_aws.index.ntotal}::")



In [None]:
from langchain.memory import ConversationBufferMemory
from langchain.prompts import PromptTemplate

boto3_bedrock = get_bedrock_client(assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),region='us-west-2', runtime=True)

# turn verbose to true to see the full logs and documents
modelId = "anthropic.claude-3-sonnet-20240229-v1:0" #"anthropic.claude-v2"
cl_llm = BedrockChat(
    model_id=modelId,
    client=boto3_bedrock,
    model_kwargs={"temperature": 0.1, 'max_tokens': 100},
)

br_embeddings = BedrockEmbeddings(model_id="amazon.titan-embed-text-v1", client=boto3_bedrock)


##### With Langchain chain and sending in the full context or leverage the Vector DB

showing the Streaming and Non Streaming options

In [None]:
import time
system_message = """
System: Use the following portion of the long document below in Context to see if any of the text is relevant to answer the question. Return any relevant text verbatim. 
Use the relevant text returned and the question to create a final answer.
Context: {context}
"""
human_message = "{text}"

messages = [
    ("system", system_message),
    ("human", human_message)
]

context_doc = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0, separator="\n").split_documents(documents_aws) #-  separator=","
#context_doc = FAISS.from_documents(documents=docs, embedding = br_embeddings).similarity_search_by_vector(br_embeddings.embed_query("How did Amazon’s Advertising work do?"), k=4)

prompt = ChatPromptTemplate.from_messages(messages)

chain = prompt | cl_llm | StrOutputParser()
chain_input = {
        "context": context_doc, #"This is a sample context doc", #context_doc,
        "text": "How did Amazon’s Advertising work do?",
    }

start_time = time.time()
print("\n starting:: Streaming:\n")
for chunk in chain.stream(chain_input):

    print(chunk, end="", flush=True)
    
print(f"\n\nComplete: Results:returned in {time.time()-start_time}:: seconds")

print("\n starting::NON-STREAMING::\n")
start_time = time.time()
chain.invoke(chain_input)    
print(f"\n\nComplete: NON-STREAMING:Results:returned in {time.time()-start_time}:: seconds")


## Option 2 with Map-Reduce with Asyncio

langchain map-reduce is still sequential. Run with MAP reduce but custom template 

**Here we will run the MAP in parallel and then reduce post all map completed**

In [None]:
from langchain.embeddings import BedrockEmbeddings

# break down into smaller chunks

context_doc = CharacterTextSplitter(chunk_size=2000, chunk_overlap=0, separator="\n").split_documents(documents_aws) #-  separator=","
print(f"len:context:doc={len(context_doc)}", f"length characters of the doc={len(context_doc[0].page_content)}", f"approx tokens = {len(context_doc[0].page_content)/4}" )

context_doc_list = []
for each_doc in context_doc:
    context_doc_list.append(each_doc.page_content)
    print(f"len:context:doc={len(each_doc.page_content)}:: tokens={len(each_doc.page_content)/4}")

##### The CallbackHandler class is actually not needed, but shown for completion

In [None]:
import asyncio
from typing import Any, Dict, List

from langchain.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler
from langchain_core.messages import HumanMessage
from langchain_core.outputs import LLMResult
import boto3
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.chat_models import BedrockChat
from langchain.embeddings import BedrockEmbeddings

boto3_bedrock = get_bedrock_client(assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),region='us-west-2', runtime=True)

model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

model_kwargs =  { 
    "max_tokens": 256,
    "temperature": 0.0,
    "top_k": 250,
    "top_p": 1,
    "stop_sequences": ["\n\nHuman"],
}

class MyCustomSyncHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")


class MyCustomAsyncHandler(AsyncCallbackHandler):
    """Async callback handler that can be used to handle callbacks from langchain."""

    async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
        """Run when chain starts running."""
        class_name = serialized["name"]
        print(f"{class_name}:: Sonnet: calling for prompt:len={len(prompts[0])/4}:tokens::")

    async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Run when chain ends running."""
        #print(f"Sonnet call ended response::{response}::")
        pass
        


In [None]:
model_llm = BedrockChat(
    client=boto3_bedrock,
    model_id=model_id,
    model_kwargs=model_kwargs,
    callbacks=[MyCustomSyncHandler(), MyCustomAsyncHandler()],
)

In [None]:
question = "How did Amazon’s Advertising work do?"

def create_map_question(context_doc):
    prompt_question = """
    System: Use the following portion of a long document to see if any of the text is relevant to answer the question. 
    Return the content verbatim if relevant. If not relevant then return 'NO'. Do not return anything else

    Context:{context}

    question: {question}
    """.format(question=question, context=context_doc)
    return prompt_question

def parse_map_yes_no(map_call_list):
    ret_list = []
    for ai_message_ret_call in map_call_list:
        print(ai_message_ret_call.content)
        if ai_message_ret_call.content and not "NO" in ai_message_ret_call.content:
            ret_list.append(ai_message_ret_call.content)
            
    return ret_list

start_time=time.time()
map_call_list = await asyncio.gather( *[model_llm.ainvoke(create_map_question(prompt_question)) for idx, prompt_question in enumerate(context_doc_list)] )
map_call_list_new = parse_map_yes_no(map_call_list)
print(f"Time taken for MAP function::{time.time()-start_time}:seconds")
map_call_list_new

In [None]:
print(map_call_list_new)

In [None]:
question = "How did Amazon’s Advertising work do?"

def create_map_question(context_doc):
    prompt_question = """
    System: Use the following portion of a long document to answer the question below. If the context does not have the answer reply with 'i Do not know'. Do not replay with anything else

    Context:{context}

    question: {question}
    """.format(question=question, context=context_doc)
    return prompt_question

final_context = create_map_question("\n".join(map_call_list_new))
    
await model_llm.ainvoke(final_context)

## Summary
#### Summary template for MAP-reduce

**Summary is returning every context back but summarized**

create a chunk which is longer than the QA one - closer to like 4000 chars

In [None]:
import boto3
from langchain_core.output_parsers import StrOutputParser
from langchain_core.prompts import ChatPromptTemplate
from langchain_community.chat_models import BedrockChat
from langchain.embeddings import BedrockEmbeddings

boto3_bedrock = get_bedrock_client(assumed_role=os.environ.get("BEDROCK_ASSUME_ROLE", None),region='us-west-2', runtime=True)

model_id = "anthropic.claude-3-sonnet-20240229-v1:0"

model_kwargs =  { 
    "max_tokens": 256,
    "temperature": 0.0,
    "top_k": 250,
    "top_p": 1,
    "stop_sequences": ["\n\nHuman"],
}


# break down into smaller chunks

context_doc = CharacterTextSplitter(chunk_size=4000, chunk_overlap=0, separator="\n").split_documents(documents_aws) #-  separator=","
print(f"len:context:doc={len(context_doc)}", f"length characters of the doc={len(context_doc[0].page_content)}", f"approx tokens = {len(context_doc[0].page_content)/4}" )

context_doc_list = []
for each_doc in context_doc:
    context_doc_list.append(each_doc.page_content)
    print(f"len:context:doc={len(each_doc.page_content)}:: approx tokens={len(each_doc.page_content)/4}")

In [None]:
import asyncio
from typing import Any, Dict, List

from langchain.callbacks.base import AsyncCallbackHandler, BaseCallbackHandler
from langchain_core.messages import HumanMessage
from langchain_core.outputs import LLMResult

class MyCustomSyncHandler(BaseCallbackHandler):
    def on_llm_new_token(self, token: str, **kwargs) -> None:
        print(f"Sync handler being called in a `thread_pool_executor`: token: {token}")


class MyCustomAsyncHandler(AsyncCallbackHandler):
    """Async callback handler that can be used to handle callbacks from langchain."""

    async def on_llm_start(self, serialized: Dict[str, Any], prompts: List[str], **kwargs: Any) -> None:
        """Run when chain starts running."""
        #print("zzzz....")
        #await asyncio.sleep(0.3)
        class_name = serialized["name"]
        print(f"{class_name}:: Sonnet: calling for prompt:len={len(prompts[0])/4}:tokens::")

    async def on_llm_end(self, response: LLMResult, **kwargs: Any) -> None:
        """Run when chain ends running."""
        #print(f"Sonnet call ended response::{response}::")
        #await asyncio.sleep(0.3)
        #print("Hi! I just woke up. Your llm is ending")
        pass
        
model_llm = BedrockChat(
    client=boto3_bedrock,
    model_id=model_id,
    model_kwargs=model_kwargs,
    callbacks=[MyCustomSyncHandler(), MyCustomAsyncHandler()],
)

In [None]:

def create_map_question(context_doc):
    prompt_question = """
    System: Summarize the following portion of the document below in Context in less than 50 words. Do not return anything else

    Context:{context}

    """.format(context=context_doc)
    return prompt_question

def parse_map_yes_no(map_call_list):
    ret_list = []
    for ai_message_ret_call in map_call_list:
        print(ai_message_ret_call.content)
        ret_list.append(ai_message_ret_call.content) # this is summary so e have to include every response back 
            
    return ret_list

start_time=time.time()
map_call_list = await asyncio.gather( *[model_llm.ainvoke(create_map_question(prompt_question)) for idx, prompt_question in enumerate(context_doc_list)] )
map_call_list_new = parse_map_yes_no(map_call_list)
print(f"Time taken for MAP function::{time.time()-start_time}:seconds")
map_call_list_new

In [None]:
len("".join(map_call_list_new)), len("".join(map_call_list_new))/4

In [None]:
question = "What does this document talk about?"

def create_map_question(context_doc):
    prompt_question = """
    System: Use the following portion of a long document to answer the question below. Summarize your findings in less than 50 words

    Context:{context}

    question: {question}
    """.format(question=question, context=context_doc)
    return prompt_question

final_context = create_map_question("\n".join(map_call_list_new))
#print(final_context)
    
await model_llm.ainvoke(final_context)

### Finished the test calls