查找 | InterSystems Developer Community

Article

Muhammad Waseem · Juil 4, 2023 6m de lecture

Open Exchange

#Artificial Intelligence (AI) #ChatGPT #Large Language Model (LLM) #Machine Learning (ML) #Python #Vector Search #Caché #InterSystems IRIS for Health #VSCode

As an AI language model, ChatGPT is capable of performing a variety of tasks like language translation, writing songs, answering research questions, and even generating computer code. With its impressive abilities, ChatGPT has quickly become a popular tool for various applications, from chatbots to content creation.
But despite its advanced capabilities, ChatGPT is not able to access your personal data. So in this article, I will demonstrate below steps to build custom ChatGPT AI by using LangChain Framework:

Step 1: Load the document
Step 2: Splitting the document into chunks
Step 3: Use Embedding against Chunks Data and convert to vectors
Step 4: Save data to the Vector database
Step 5: Take data (question) from the user and get the embedding
Step 6: Connect to VectorDB and do a semantic search
Step 7: Retrieve relevant responses based on user queries and send them to LLM(ChatGPT)
Step 8: Get an answer from LLM and send it back to the user

NOTE: Please read my previous article LangChain – Unleashing the full potential of LLMs to get more details about LangChain and about how to get OpenAI API Key

So, let's begin

Step1: Load the document

First of all, we need to load the document. So we will import PyPDFLoader for PDF document

ClassMethod SavePDF(filePath) [ Language = python ]
{
#for PDF file we need to import PyPDFLoader from langchain framework
from langchain.document_loaders import PyPDFLoader
# for CSV file we need to import csv_loader
# for Doc we need to import UnstructuredWordDocumentLoader
# for Text document we need to import TextLoader
#import os to set environment variable
import os
#Assign OpenAI API Key to environment variable 
os.environ['OPENAI_API_KEY'] = "apiKey"
#Init loader
loader = PyPDFLoader(filePath)   
#Load document 
documents = loader.load()
return documents
}

Step 2: Splitting the document into chunks

Language Models are often limited by the amount of text that you can pass to them. Therefore, it is necessary to split them up into smaller chunks. LangChain provides several utilities for doing so.

Using a Text Splitter can also help improve the results from vector store searches, as eg. smaller chunks may sometimes be more likely to match a query. Testing different chunk sizes (and chunk overlap) is a worthwhile exercise to tailor the results to your use case.

ClassMethod splitText(documents) [ Language = python ]
{
#In order to split the document we need to import RecursiveCharacterTextSplitter from Langchain framework  
from langchain.text_splitter import RecursiveCharacterTextSplitter
#Init text splitter, define chunk size 1000 and overlap = 0
text_splitter = RecursiveCharacterTextSplitter(chunk_size=1000, chunk_overlap=0)
#Split document into chunks
texts = text_splitter.split_documents(documents)
return texts
}

Step 3: Use Embedding against Chunks Data and convert to vectors

Text embeddings are the heart and soul of Large Language Operations. Technically, we can work with language models with natural language but storing and retrieving natural language is highly inefficient.

To make it more efficient, we need to transform text data into vector forms. There are dedicated ML models for creating embeddings from texts. The texts are converted into multidimensional vectors. Once embedded, we can group, sort, search, and more over these data. We can calculate the distance between two sentences to know how closely they are related. And the best part of it is these operations are not just limited to keywords like the traditional database searches but rather capture the semantic closeness of two sentences. This makes it a lot more powerful, thanks to Machine Learning.

Text embedding models take text input and return a list of floats (embeddings), which are the numerical representation of the input text. Embeddings help extract information from a text. This information can then be later used, e.g., for calculating similarities between texts (e.g., movie summaries).

Text embedding models take a text as an input and output its numerical representation as a list of floats

    ClassMethod getEmbeddings(query) [ Language = python ]
    {
    #Get embeddings model from Langchain framework
    from langchain.embeddings import OpenAIEmbeddings
    #Define embedding
    embedding = OpenAIEmbeddings()
    return embedding
    }

Step 4: Save data to the Vector database

    ClassMethod saveDB(texts,embedding) [ Language = python ]
    {
    #Get Chroma db  from langchain
    from langchain.vectorstores import Chroma      
    # Embed and store the texts
    # Supplying a persist_directory will store the embeddings on disk
    # e.g we are saving data in myData folder in current application path
    persist_directory = "myData"
    vectordb = Chroma.from_documents(documents=texts, embedding=embedding, persist_directory=persist_directory)
    #save document locally
    vectordb.persist()
    vectordb = None
    }

Step 5: Take data (question) from the user and get the embedding

    ClassMethod getVectorData(query) [ Language = python ]
    {
    #NOTE : We should have same embedding used when we saved data
    from langchain.embeddings import OpenAIEmbeddings
    #get embeddings
    embedding = OpenAIEmbeddings()
    #take user input (parameter)
    query = query
    #Code continue...

Step 6: Connect to VectorDB and do a semantic search

 #code continue....     
 from langchain.vectorstores import Chroma
 persist_directory = "myData"
 ## Now we can load the persisted database from disk, and use it as normal. 
 vectordb = Chroma(persist_directory=persist_directory, embedding_function=embedding)
 return vectordb
 }

Step 7: Retrieve relevant responses based on user queries and send them to LLM(ChatGPT)

Conversational memory is how a chatbot can respond to multiple queries in a chat-like manner. It enables a coherent conversation, and without it, every query would be treated as an entirely independent input without considering past interactions.

The LLM with and without conversational memory. The blue boxes are user prompts and in grey are the LLMs responses. Without conversational memory (right), the LLM cannot respond using knowledge of previous interactions.

The memory allows a Large Language Model (LLM) to remember previous interactions with the user. By default, LLMs are stateless — meaning each incoming query is processed independently of other interactions. The only thing that exists for a stateless agent is the current input, nothing else.

The ConversationalRetrievalChain is a conversational AI model that is designed to retrieve relevant responses based on user queries. It is a part of the Langchain team's technology. The model uses a retrieval-based approach, where it searches through a database of pre-existing responses to find the most appropriate answer for a given query. The model is trained on a large dataset of conversations to learn patterns and context in order to provide accurate and helpful responses.

ClassMethod retriveResponse(vectordb) [ Language = python ]
{
from langchain.llms import OpenAI
from langchain.memory import ConversationBufferMemory
from langchain.chains import ConversationalRetrievalChain
#Conversational memory is how a chatbot can respond to multiple queries
memory = ConversationBufferMemory(memory_key="chat_history", return_messages=True)
#The ConversationalRetrievalChain is a conversational AI model that is designed to retrieve relevant responses based on user queries
qa = ConversationalRetrievalChain.from_llm(OpenAI(temperature=0), vectordb.as_retriever(), memory=memory)
return qa
}

Step 8: Get an answer from LLM and send it back to the user

ClassMethod getAnswer(qa) [ Language = python ]
{
#Get an answer from LLM and send it back to the user
getAnswer = qa.run(query)
return getAnswer
}

To check more details and features, please visit my application irisChatGPT

Related Video

Thanks

Discussion (0)1

Connectez-vous ou inscrivez-vous pour continuer

Article

Guillaume Rongier · Juil 4, 2023 2m de lecture

#Bonnes pratiques #Déploiement #Docker #InterSystems IRIS

When it comes to build an iris image, we can use the cpf merge files.

Here is an cpf merge example:

[Actions]
CreateDatabase:Name=IRISAPP_DATA,Directory=/usr/irissys/mgr/IRISAPP_DATA

CreateDatabase:Name=IRISAPP_CODE,Directory=/usr/irissys/mgr/IRISAPP_CODE

CreateNamespace:Name=IRISAPP,Globals=IRISAPP_DATA,Routines=IRISAPP_CODE,Interop=1

ModifyService:Name=%Service_CallIn,Enabled=1,AutheEnabled=48

CreateApplication:Name=/frn,NameSpace=IRISAPP,DispatchClass=Formation.REST.Dispatch,AutheEnabled=48

ModifyUser:Name=SuperUser,PasswordHash=a31d24aecc0bfe560a7e45bd913ad27c667dc25a75cbfd358c451bb595b6bd52bd25c82cafaa23ca1dd30b3b4947d12d3bb0ffb2a717df29912b743a281f97c1,0a4c463a2fa1e7542b61aa48800091ab688eb0a14bebf536638f411f5454c9343b9aa6402b4694f0a89b624407a5f43f0a38fc35216bb18aab7dc41ef9f056b1,10000,SHA512

1 Comment

Discussion (1)1

Connectez-vous ou inscrivez-vous pour continuer

Article

José Pereira · Juil 3, 2023 11m de lecture

Open Exchange

#Artificial Intelligence (AI) #FHIR #Generative AI (GenAI) #Large Language Model (LLM) #InterSystems IRIS for Health

This project is an experiment to use OpenAI API to answer to user prompts in the healthcare domain using FHIR resources and Python code.

11 Comments

Discussion (11)2

Connectez-vous ou inscrivez-vous pour continuer

Article

Oleksandr Zaitsev · Juil 3, 2023 3m de lecture

Open Exchange

#Python #SQL #Outils #InterSystems IRIS #Open Exchange

Introduction

A password manager is an important security tool that allows users to store and manage their passwords without the need to remember or write them down in insecure places. In this article, we will explore the development of a simple password manager using the Flask framework and the InterSystems IRIS database.

Key Features

Our password manager application will provide the following key features:

User registration with account creation.
User authentication during login.
Adding new passwords with a title, login, and password.
Encryption and secure storage of passwords in the database.
Viewing the list of saved passwords for a user.
Editing and deleting saved passwords.
Ability to log out and end the session.

Future Plans

While the current version of our password manager application already offers essential functionality, there are several potential future enhancements that can be considered:

Password strength evaluation: Implement a feature to analyze the strength of passwords entered by users. This can include checking for complexity, length, and the presence of special characters.
Two-factor authentication (2FA): Integrate a 2FA mechanism to add an extra layer of security during login. This can involve using SMS verification codes, email verification, or authenticator apps.
Password generator: Include a password generator that can generate strong, random passwords for users. This feature can provide suggestions for creating unique and secure passwords.
Password expiration and change reminders: Implement a mechanism to notify users when their passwords are due for expiration or recommend periodic password changes to enhance security.
Secure password sharing: Allow users to securely share passwords with others, such as family members or team members, while maintaining the necessary encryption and access controls.

By incorporating these enhancements, our password manager can evolve into a more robust and feature-rich application, catering to the increasing security needs of users.

Tools Used

To develop our password manager, we will be using the following tools:

Flask: A lightweight web framework for building web applications in Python.
InterSystems IRIS: A high-performance database that provides reliable data storage and management.

Benefits of Using Flask and InterSystems IRIS

Flask provides simplicity and conciseness in web development by offering a wide range of tools and functionalities.
Flask allows easy creation of routes, handling requests, and returning HTML responses.
InterSystems IRIS ensures reliable data storage, providing high performance and security.
Using a database for storing passwords ensures encryption and protection against unauthorized access.

Conclusion

A password manager developed using Flask and InterSystems IRIS provides a convenient and secure way to store and manage passwords. It allows users to store complex passwords without the need to remember or write them down in insecure places. Developing such an application is a great exercise for learning web development with Flask and working with databases.

1 Comment

Discussion (1)1

Connectez-vous ou inscrivez-vous pour continuer

Article

Muhammad Waseem · Juil 2, 2023 4m de lecture

Open Exchange

#ChatGPT #Embedded Python #Large Language Model (LLM) #ObjectScript #InterSystems IRIS

Hi Community

In this article, I will introduce my application irisChatGPT which is built on LangChain Framework.

First of all, let us have a brief overview of the framework.

The entire world is talking about ChatGPT and how Large Language Models(LLMs) have become so powerful and has been performing beyond expectations, giving human-like conversations. This is just the beginning of how this can be applied to every enterprise and every domain!

The most important question that remains is how to apply this power to domain-specific data and scenario-specific response behavior suitable to the needs of the enterprise.

LangChain provides a structured and effective answer to this problem at hand! LangChain is the technology that can help realize the immense potential of the LLMs to build astounding applications by providing a layer of abstraction around the LLMs and making the use of LLMs easy and effective. LangChain is a framework that enables quick and easy development of applications that make use of Large Language Models, for example, GPT-3.

The framework, however, introduces additional possibilities, for example, the one of easily using external data sources, such as Wikipedia, to amplify the capabilities provided by the model. I am sure that you have all probably tried to use Chat-GPT and find that it fails to answer about events that occurred beyond a certain date. In this case, a search on Wikipedia could help GPT to answer more questions.

LangChain Structure

The framework is organized into six modules each module allows you to manage a different aspect of the interaction with the LLM. Let’s see what the modules are.

Models: Allows you to instantiate and use three different types of language-models, which are:
- Large Language Models (LLMs): these foundational machine learning models that are able to understand natural language. These accept strings in input and generate strings in output.
- Chat Models: models powered by LLM but are specialized to chat with the user. You can read more here.
- Text Embedding Models: these models are used to project textual data into a geometric space. These models take text as input and return a list of numbers, the embedding of the text.
Prompts: The prompt is how we interact with the model to try to obtain an output from it. By now knowing how to write an effective prompt is of critical importance. This framework module allows us to better manage prompts. For example, by creating templates that we can reuse.
Indexes: The best models are often those that are combined with some of your textual data, in order to add context or explain something to the model. This module helps us do just that.
Chains: Many times to solve tasks a single API call to an LLM is not enough. This module allows other tools to be integrated. For example, one call can be a composed chain with the purpose of getting information from Wikipedia and then giving this information as input to the model. This module allows multiple tools to be concatenated in order to solve complex tasks.
Memory: This module allows us to create a persisting state between calls of a model. Being able to use a model that remembers what has been said in the past will surely improve our application.
Agents: An agent is an LLM that makes a decision, takes an action, makes an observation about what it has done, and continues in this manner until it can complete its task. This module provides a set of agents that can be used.

Now let’s go into a little more detail and see how to implement code by taking advantage of the different modules.

How LangChain works

Step1 :
User sends the question to LangChain

Step2 :
LangChain send this question to Embedding Model

Step3 :
Embedding model converts the text to vectors as text is stored as vectors in the database and returns to LangChain

Step4 :
LangChain send these vectors to the vector database (There are multiple vector database, We are using chroma in our application)

Step5 :
Vector database returns Top K Approximately Nearest Neighbors (KNN) Vectors

Step6 :
LangChain send question along with KNN vectors to Large Language Models (LLMs) (We are using OpenAI in our application)

Step7 :
LLM returns the answer to Langchain

Step8 :
Langchain returns the answer to the user

About Application

irisChatGPT application leverages the functionality of one of the hottest python framework LangChain built around Large Language Models (LLMs). LangChain is a framework that enables quick and easy development of applications that make use of Large Language Models. Application is built by using objectscript with the help of intersystems Embedded Python functionality. It also contains Streamlit web application which is an open-source Python app framework to create beautiful web apps for data science and machine learning.