Technical surprises using VECTORs
>>> UPDATED
Building my tech. example provided me with a bunch of findings htt I want to share.
The first vectors I touched appeared with text analysis and more than 200 dimensions.
I have to confess that I feel well with Einstein's 4 dimensional world.
7 to 15 dimensions populating the String Theory are somewhat across the border.
But 200 and more is definitely far beyond my mathematical horizon.
So I looked to our Globe and found a vector of (latitude,longitude) with 2 dimension is enough for testing.
A handy table of capitals was found and provided sample test data (shortened).
CAPITAL | COUNTRY | LATITUDE | LONGITUDE |
---|---|---|---|
Kabul | Afghanistan | 34.28N | 69.11E |
Tirana | Albania | 41.18N | 19.49E |
Algiers | Algeria | 36.42N | 03.08E |
Pago Pago | American Samoa | 14.16S | 170.43W |
Andorra la Vella | Andorra | 42.31N | 01.32E |
Luanda | Angola | 08.50S | 13.15E |
Saint John's | Antigua and Barbuda | 17.127N | 61.846W |
Buenos Aires | Argentina | 36.30S | 60.00W |
Yerevan | Armenia | 40.10N | 44.31E |
Oranjestad | Aruba | 12.32N | 70.02W |
Canberra | Australia | 35.15S | 149.08E |
Vienna | Austria | 48.12N | 16.22E |
Baku | Azerbaijan | 40.29N | 49.56E |
Nassau | Bahamas | 25.05N | 77.20W |
Manama | Bahrain | 26.10N | 50.30E |
Dhaka | Bangladesh | 23.43N | 90.26E |
Bridgetown | Barbados | 13.05N | 59.30W |
Minsk | Belarus | 53.52N | 27.30E |
Brussels | Belgium | 50.51N | 04.21E |
#1
Loading that TAB separated text file with LOAD DATA (SQL) worked perfect
#2
Transforming the geo coordinates int INT was a minor coding exercises
It resulted in a ClassMethod projected as SQL Procedure used inana UPDATE over the table.
#3
As geo coordinates refer to (0°N,0°W) somewhere in the Atlantic this is just a theoretical base for my vectors.
IRIS supports some VECTOR functions but I found no AddVector() or SubtractVector() function.
so this was done "manually" from the input coordinates.
The need arises from transforming coordinates to a useful base point for later comparing vectors
So you see static BASE coordinates and active WORK coordinates.
Getting the Vector's values is easy with SQL using %EXTERNAL() function
while in ObjectScript I ended up with set vectorvalues=##class(%Vector).LogicalToOdbc(vectorvariable)
this was less impressive for working with Vectors.
#4
Similarity is calculated with VECTOR_COSINE() function.
You calculate the angle between 2 Vectors and COSINE norm it between +1 and -1
The input needs 2 Vectors of the same type and same dimension.
Examples as in documentation work fine if you compose your SQL String as suggested
and TO_VETOR(?,type,size) is OK with %SQLStatement for execution.
BUT:
I tried it with embedded SQL.
Code checking signalled some disagreement but compiled without problem
At runtime it turned out that host variables in TO_VECTOR(:myvec,INT,2) failedwhatever combination of quotes, braces, .... I tried.
So be warned. I returned to %SQLStatement to get my VCOS done.
UPDATE: TO_VECTOR(:myvec,INT,2)
set myvec="1314,-7979"
Nothing special: Just a plain String with comma-separated values
It seems I couldn't believe that simple approach.
My apologizes to ISC Engineering.
#5
It was a surprise to learn how wide VECTOR_COSINE is spreading.
checking the vector Paris >> Bucuresti traced half of Middle and East Asia.
So limiting the results to > 0.999 is a good practice in this scenario.