Rechercher

Article
· Juil 27, 2024 7m de lecture

An Expanded Explanation of RAG, Vector Search, and how it is implemented on IRIS in the IRIS RAG App

I received some really excellent feedback from a community member on my submission to the Python 2024 contest. I hope its okay if I repost it here:

you build a container more than 5 times the size of pure IRIS

and this takes time

container start is also slow but completes

backend is accessible as described

a production is hanging around

frontend reacts

I fail to understand what is intended to show

the explanation is meant for experts other than me

The submission is here: https://openexchange.intersystems.com/package/IRIS-RAG-App

I really appreciate this feedback, not the least because it is a great prompt for an article about the project. This project includes fairly comprehensive documentation, but it does assume a familiarity with vector embeddings, RAG pipelines, and LLM text generation, as well as python and certain popular python libraries, like LLamaIndex.

This article, written completely without AI, is meant to be an attempt at an explanation of those things and how they fit together in this project to demonstrate the RAG workflow on IRIS.

The container is large because the library dependencies needed for the python packages involved in creating vector embeddings are very large. It is possible that through more selective imports, the size could be cut down considerably. 

It does take time to initially build the container, but once you have done so, it takes less time to start it. The startup time could still definitely be improved. The main reason startup  takes so much time is that the entrypoint.sh was updated with the assumption that changes might have been made to any part of the application since that last startup, including the database migrations, css configurations, javascript configurations, and the python backend code, and it recompiles the entire project every time it starts up. This is to make it easier to get started with developing on this project, since otherwise it can be tricky to properly run the frontend and backend builds whenever changes are made. This way, if you change any of the code in the project, you just need to restart the container, maybe recover the production in the backend, and your changes should be reflected in the interface and operation of the application.

I am fairly sure that the production in the backend is what is passing the http requests to the Django application, and is crucial to the interoperability in this package. I am new to the IRIS platform, however, and have more to learn about productions.

Next I’d like to provide a comprehensive explanation of vector embeddings, LLMs, and RAG. The first of these to be invented was the vector embedding. First we can describe a vector. In most contexts a vector is a direction. It’s an arrow pointing somewhere in space. More formally, a vector is “a quantity having direction as well as magnitude”. This could be exemplified by a firework, which travels in a particular direction and explodes at a particular point in space. Let’s say every firework is fired from the same central point, a point of origin, [0,0,0], but they all fly out and explode in a cloud around that origin point. Mathematically you could describe the location of each firework explosion using a three coordinate system, [x,y,z] and that would be a “vector embedding” for a firework explosion. If you took lots of video of a firework display and recorded all of the firework explosions as a dataset, then you would be creating a kind of vector embedding database, or vector store, of the fireworks display.

What could you do with that information about the fireworks display? If I pointed out a particular firework and asked for the fireworks that exploded closest to the same point throughout the entire display, you could find those other fireworks that exploded at nearby points in space. You just find the ones that are closest, and there’s math to do that.

Remember, we only recorded three numbers for each firework, the x, y, and z coordinates in a three dimensional space with [0,0,0] being the firework launcher on the ground. 

What if I wanted to also know the firework that exploded both closest in distance, and closest in time to another particular firework? To know that, we would have to go back through our video footage of the fireworks display and record the time of each explosion as well. Now we have a 4-dimensional vector with 4 numbers: the three dimensional position of the firework explosion and the time of the explosion. Now we have a more descriptive type of embedding for the firework display by adding another dimension to our vector embeddings.

How does this translate to machine learning? Well, long story short, by processing a huge amount of text data, computer scientists managed to create embedding models that can transform a piece of text like a phrase, sentence, paragraph, or even a page, and turn it into a very long series of numbers that represent a point in a theoretical high dimension space.

Instead of 4 numbers, there are 300, or 700, or even 1500. These represent 1500 ways in which one piece of text can be “close” or “far” away from another, or 1500 dimensions of meaning. It’s a captivating concept for many that we have the means to create numbers that represent in some way the semantic meaning of a piece of text.

Using math, two of these high-dimension text vector embeddings can be compared to find out how similar or “close” they are to one another if they were created by the same model. 

That’s the first thing that happens in this app. The user must put in a document and name it, and then choose a type of embedding. The server takes that document, breaks it into text chunks, and then turns each of those chunks into a vector embedding, and that chunk is saved as a row in a dedicated table for that document. Each document is stored in its own dedicated table to allow for the variable length of the vector embeddings created by different text embedding models.

Once a document is stored in the database as vector embeddings, the user can enter a query to “ask” the document. The query is used in two ways. The first way is to search the document. We don’t do a traditional text search, instead we are doing a “vector search”. The app takes the query, turns it into a vector embedding, and then finds the sections of the document with embeddings that are most similar to the query vector embedding. A similarity score between 0 and 1 is then generated for every document section, and several sections are retrieved from the vector database based on the top_k_similarity and the similarity_threshold. Basically, you can ask it how many document sections to retrieve, and how similar they must be to your query to qualify for retrieval.

That’s the Retrieval in Retrieval Augmented Generation. The next step is the generation.

Once computer scientists figured out how to convert text to semantically significant numeric vector embeddings, the next step was to create models that could produce text. They did so with great success, and now we have Large Language Models like GPT-4, LLama3, and Claude 3.5. These LLMs can take a prompt, or query, and deliver a completion, or answer, which is the text it thinks most likely to continue from the text presented, the prompt.

LLMs must be trained on large amounts of text data, and their responses, or completions, are limited to that training data. When we want the LLMs to provide completions that might include data not in their training sets, or ground their completions in a particular set of knowledge, one way to do that is to include extra contextual data in the prompt. Basically, if we want an answer from an LLM about something it wasn’t trained on, we have to give it the information in the prompt.

Many people found themselves in a situation in which they wished chatGPT or their local LLama installation could provide answers based on their own personal documents. It’s simple enough to search your documents for that information, paste it into the prompt, and put in your question, and people found themselves doing it manually. That is its own form of Retrieval Augmented Generation. RAG is just the automation of finding information relevant to the user query and providing it with the query to the LLM for a more accurate or useful response.

In this app, the document sections we retrieve with the vector search are sent with the query to the chosen LLM, labeled in the interface as the Model, to provide the context to the answer.

In the video example I made for this project, I ask the question “Who is the villain in this play?” with the documents “Hamlet” and “King Lear”, which contain the entire text of the two Shakespeare plays. The IRIS database already has two tables, one for Hamlet, and the other for King Lear. Each table is filled with rows of vector embeddings created from splitting the text of each play into sections. These embeddings are long series of numbers representing the many dimensions of meaning in each of the document sections. 

The server converts the question “Who is the villain in this play” into a numeric vector using the same text-to-vector model that generated the vector embeddings for King Lear, and finds the sections in the King Lear table that are most similar to it. These are probably sections that mention the word villain, yes, but possibly other villainous things, such as treachery, betrayal, and deceit, even if villainy is not explicitly mentioned. These document sections are added to the query and sent together as a prompt to an LLM which then answers the question based on the provided document sections.

This is done separately for each document, and this is why the answer to the query is different depending on the document being queried. This completes the acronym, since we are Augmenting the Generation of our answer from the LLM with the Retrieval of relevant context information using the power of vector search.

Many thanks to anyone who takes the time to read this and I would be happy to expand on any of these topics in a future article. Feedback is always welcome.

2 Comments
Discussion (2)1
Connectez-vous ou inscrivez-vous pour continuer
Article
· Juil 27, 2024 1m de lecture

第五章 控制 xsi type 属性的使用

第五章 控制 xsi:type 属性的使用

控制 xsi:type 属性的使用

默认情况下, SOAP 消息仅包含顶级类型的 xsi:type 属性。例如:

Discussion (0)1
Connectez-vous ou inscrivez-vous pour continuer
Question
· Juil 26, 2024

Process a flat file to the SDA using a RecordMap

I need an example of how to consume a pipe-delimited flat file place parts of it's content into parts of the SDA. I have the RecordMap built, but am unsure of where to go from here.

Any help would be greatly appropriated.

 

Thanks,

Lawrence

4 Comments
Discussion (4)1
Connectez-vous ou inscrivez-vous pour continuer
Question
· Juil 26, 2024

IRIS code and GitLab CICD Pipeline

I was watching this video about IRIS and GitHub and all is clear to me how it works and how code from each branch is getting deployed to each IRIS environment but the process to deploy is manual. My question is how can I, if possible, to utilize gti-source-control from GitLab CICD pipeline to deploy code automaticaly after PR approval instead going to the Git UI?

Thanks

2 Comments
Discussion (2)4
Connectez-vous ou inscrivez-vous pour continuer
Article
· Juil 26, 2024 7m de lecture

Cookie monster and other troubles (and some workarounds too) we ran into while doing Django on IRIS WSGI

As a part of the IRIS Python 2024 contest, my colleague Damir and I went with an idea to build a platform called ShelterShare for connecting victims and volunteers for shelter requests . To do so we chose django as a framework and proceeded to build the first version with 3 different docker containers, django, iris and nginx which would then utilize IRIS as a pure Database engine via the beautifly composed django_iris (cudos to Dimitry). As we were progressing fast, we decided to explore the option of running it within the same container as IRIS by utilizing WSGI added in 2024.1. We knew ahead of time that we won't be able to entirely rely on WSGI as we were utilizing WebSockets for instanenous updates and communication in the tool, but we figured we can always run uvicorn in the container in parallel to iris and hook-up websocket to it on a different port.

And, that's when we started running into problems...

Our first issue was that we were using an older version of django-iris which was relying on a package called iris and which was conflicting with the inbuilt iris.py (i.e. part of IRIS WSGI). We realized that the issue was resolved in a later django-iris package by renaming iris to intersystems_iris, so we updated django-iris, resolved the issue and moved on.

Our second issue came up when we wanted to utilize ipm to install the package from the module. For whatever reason it kept failing to do the migration with strange ConnectionReset errors...

 

sheltershare    | Waited 3 seconds for InterSystems IRIS to reach state 'running'
sheltershare    | 
sheltershare    | Load started on 07/26/2024 14:39:06
sheltershare    | Loading file /usr/irissys/csp/sheltershare/module.xml as xml
sheltershare    | Imported document: sheltershare.ZPM
sheltershare    | Load finished successfully.
sheltershare    | 
sheltershare    | Skipping preload - directory does not exist.
sheltershare    | Load started on 07/26/2024 14:39:07
sheltershare    | Loading file /usr/irissys/csp/sheltershare/module.xml as xml
sheltershare    | Imported document: sheltershare.ZPM
sheltershare    | Load finished successfully.
sheltershare    | 
sheltershare    | Loading sheltershare in process 716
sheltershare    | [%SYS|sheltershare]	Reload START (/usr/irissys/csp/sheltershare/)
sheltershare    | [%SYS|sheltershare]	requirements.txt START
sheltershare    | Collecting Django==5.0.7
sheltershare    |   Using cached Django-5.0.7-py3-none-any.whl (8.2 MB)
sheltershare    | Collecting uvicorn[standard]
sheltershare    |   Using cached uvicorn-0.30.3-py3-none-any.whl (62 kB)
sheltershare    | Collecting channels
sheltershare    |   Using cached channels-4.1.0-py3-none-any.whl (30 kB)
sheltershare    | Collecting Faker
sheltershare    |   Using cached Faker-26.0.0-py3-none-any.whl (1.8 MB)
sheltershare    | Collecting django-iris
sheltershare    |   Using cached django_iris-0.2.4-py3-none-any.whl (134 kB)
sheltershare    | Collecting tzdata
####LOG TRIMMED FOR BREVITY####
sheltershare    | [%SYS|sheltershare]	requirements.txt SUCCESS
sheltershare    | Skipping preload - directory does not exist.
sheltershare    | [%SYS|sheltershare]	Reload SUCCESS
sheltershare    | [sheltershare]	Module object refreshed.
sheltershare    | [%SYS|sheltershare]	Validate START
sheltershare    | [%SYS|sheltershare]	Validate SUCCESS
sheltershare    | [%SYS|sheltershare]	Compile START
sheltershare    | [%SYS|sheltershare]	Compile SUCCESS
sheltershare    | [%SYS|sheltershare]	Activate START
sheltershare    | [%SYS|sheltershare]	Configure START
sheltershare    | [%SYS|sheltershare]	Configure SUCCESS
sheltershare    | Studio project created/updated: sheltershare.PRJ
sheltershare    | [%SYS|sheltershare]	Activate SUCCESS
sheltershare    | ShelterShare installed successfully!
sheltershare    | 
sheltershare    | 126 static files copied to '/usr/irissys/csp/sheltershare/static', 3 unmodified.
sheltershare    | /usr/irissys/mgr/python/django/core/management/commands/makemigrations.py:160: RuntimeWarning: Got an error checking a consistent migration history performed for database connection 'default': [Errno 104] Connection reset by peer
sheltershare    |   warnings.warn(
sheltershare    | No changes detected
sheltershare    | Traceback (most recent call last):
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/dbapi/_DBAPI.py", line 47, in connect
sheltershare    |     return native_connect(
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/_IRISNative.py", line 183, in connect
sheltershare    |     connection._connect(hostname, port, namespace, username, password, timeout, sharedmemory, logfile, sslcontext, autoCommit, isolationLevel, featureOptions, application_name)
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/_IRISConnection.py", line 304, in _connect
sheltershare    |     raise e
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/_IRISConnection.py", line 212, in _connect
sheltershare    |     self._in_message._read_message_sql(sequence_number)
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/_InStream.py", line 46, in _read_message_sql
sheltershare    |     is_for_gateway = self.__read_message_internal(expected_message_id, expected_statement_id, type)
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/_InStream.py", line 59, in __read_message_internal
sheltershare    |     self.__read_buffer(header.buffer, 0, _MessageHeader.HEADER_SIZE)
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/_InStream.py", line 138, in __read_buffer
sheltershare    |     data = self._device.recv(length)
sheltershare    |   File "/usr/irissys/mgr/python/intersystems_iris/_Device.py", line 40, in recv
sheltershare    |     return self._socket.recv(len)
sheltershare    | ConnectionResetError: [Errno 104] Connection reset by peer

which we weren't able to resolve, so we fell back to using Dockerfile, entrypoint.sh and docker-compose to fully setup the django app in the /usr/irisys/csp folder, then relied on ipm to load our app.xml into Security.Applications.

import iris
iris.system.Process.SetNamespace("%SYS")
imported_status=iris.cls("Security.Applications").Import("/usr/irissys/csp/sheltershare/app.xml", num_imported,0)

This works well and is a reliable way of deploying. You can check out an example of setting it up here: ShelterShare-SingleDocker

Note though that without the merge.cpf, the irissetup.py script will fail to run iris due to Authorization issues... i.e.
 

RUN \
  cd /usr/irissys/csp/sheltershare && \
  iris start IRIS && \
  iris merge IRIS merge.cpf

 

Now we ran into a big problem, well big for us because it was really difficult to understand what was going on, but in the end fairly easy to workaround...

You see, we had relied on django authentication to handle our user accounts and groups etc. which we've utilized by simply doing:

user_obj = authenticate(username=request.POST['username'],
                        password=request.POST['password'])
if user_obj:
    login(request, user_obj)
    return redirect("index")

Which works great in gunicorn, uvicorn, even simple python manage.py runserver... but in IRIS it kept silently failing and throwing us back to login screen.

After much digging, browser console debugging and inspecting, we realized that in normal circumstances, the WSGI/ASGI server in question (e.g. uvicorn, gunicorn) will on successful login return 2 Set-Cookie headers, one with CSRF token and some other info, while the other contains the sessionid.

The sessionid then gets saved to browser storage and is utilized whenever accessing a page under the same domain.

However, it seems that IRIS WSGI, for whatever reason, combines the two Set-Cookie headers into one. See difference below:

UVICORN:

 
IRIS WSGI:

 and the associated Cookie store:

 
So we started figuring out how to circumvent this problem as Browser was obviously ignoring the sessionid from the single Set-Cookie header, so we attempted the following:

WORKAROUND:

user_obj = authenticate(username=request.POST['username'],
                        password=request.POST['password'])
if user_obj:
    login(request, user_obj)
    response = redirect("index")
    # Add your custom header
    # Extract the sessionid from the Set-Cookie header
    sessionid = request.session.session_key

    if sessionid:
        # Add the sessionid as a separate Set-Cookie header
        response.set_cookie('sessionid', sessionid, httponly=True)

    return response

 

and now the sessionid and CSRF were still in the same Set-Cookie header, but as it happens, sessionId was at the very beginnign of the Set-Cookie so the browser picked it up without any issue...
AFTER WORKAROUND IRIS WSGI:

 and the result in the browser Cookie storage:

 

So, with this Workaround, the django-authentication begun working properly and we could use the application for most of the non Async content. However, we then ran into the issue of POSTs not working correctly, possibly because the CSRF token was now pushed to the back of the Set-Cookie (but could be some other reason) and we didn't have enough time to research further workarounds. In the end, with POSTs issues and our need for ASGI, we went back to the initial three docker container solution and plan to investigate WSGI (and perhaps ASGI) some more in the future...

5 Comments
Discussion (5)2
Connectez-vous ou inscrivez-vous pour continuer