Rechercher

Résumé
· Juil 22

Vote for the Best Demo in the InterSystems Demo Games!

Dear Community Member,

The InterSystems Demo Games is a new contest where our Sales Engineers showcase their creativity and technical skills in short demo videos — and now it’s your turn to help pick the winners!

Browse the demo entries, discover fresh ideas and smart solutions, and vote for the demos that impress you most. Your vote will help decide the winners.

✔ Voting is open until September 14th on the Developer Community.
👉 Visit the contest page and cast your vote!

🔗 Full details here

New demos are added throughout the contest, so don’t miss your chance to discover and support your favorites.

May the best demo win!

Annonce
· Juil 22

Lembrete: Testadores beta são necessários para o nosso próximo exame de certificação profissional InterSystems IRIS SQL

Olá novamente,

A InterSystems Certification ainda está procurando pessoas para o teste beta do exame de Certificação Profissional InterSystems IRIS SQL. Esta é uma ótima maneira de obter a certificação gratuitamente! Prorrogamos o prazo do teste beta para 31 de agosto de 2025.

Observe que apenas candidatos com a certificação InterSystems IRIS SQL Specialist pré-existente são elegíveis para fazer o beta. Para obter detalhes, consulte o anúncio original.

Obrigado!

Discussion (0)1
Connectez-vous ou inscrivez-vous pour continuer
Question
· Juil 22

Does IRIS support JavaScript asynchronous calls (async/await)?

Working on wrapping an IRIS Cache ObjectScript method that runs for a few seconds. Trying to get UI updates to show BEFORE the method runs in an async/await function. But it seems to be running synchronously rather than asynchronously . So my question is does IRIS/ObjectScript CSP pages support futures with JavaScript or does it run all synchronously.

2 Comments
Discussion (2)2
Connectez-vous ou inscrivez-vous pour continuer
Article
· Juil 22 7m de lecture

Introduction to IRIS for SQL Developers and DBAs.

Are you familiar with SQL databases, but not familiar with IRIS?  Then read on...

About a year ago I joined InterSystems, and that is how IRIS got on my radar.  I've been using databases for over 40 years—much of that time for database vendors—and assumed IRIS would be largely the same as the other databases I knew.  However I was surprised to find that IRIS is in several ways quite unlike other databases, often much better.  With this, my first article in the Dev Community, I'll give a high-level overview of IRIS for people that are already familiar with the other databases such as Oracle, SQL Server, Snowflake, PostgeSQL, etc.   I hope I can make things clearer and simpler for you and save you some time getting started.

First of all, IRIS supports ANSI standard SQL commands and syntax. It has tables, columns, data types, stored procs, functions, ...  all that relational stuff.  And you can use ODBC, JDBC, and use DBeaver or whatever is your favorite DB browser.  So, yes, most of what you know and do from other DB's will work just fine on IRIS.  Yay!  

But what about those differences I mentioned?  Okay, buckle up:

Multi-Model: IRIS is a relational database, however it is also an object-oriented database, and also document store, and supports vectors, and cubes/MDX, and... well you see where I'm going.  The amazing thing about this is you can take advantage of all these models... in the same SQL statement!  And in many cases the data can be stored as multiple of these data structures — no need to store it twice — and certainly no need to have more than one type of DB!  When you access the very same data as though it were different data models InterSystems calls that CDP (Common Data Plane).  This is at least rare, if not unique, in the database industry.  Nobody paid much attention to CDP until the AI revolution made support for multi-model suddenly important. It's not a feature other DBs are likely to implement as it is baked right into the kernel.  IRIS makes multi-model and NoSQL and NewSQL easy for SQL folks:

For Object database, you extract a keyvals from the JSON tree, which is just the value in a regular table. 

-- example of Object database query
SELECT JSON_OBJECT('product':Product,'sizes':PopularSizes) FROM Sample.Clothing

-- this will return a list of key-val pairs. If a pair is missing IRIS 
-- will, by default, create one with a value of null.

For Vector, just think of it as simply another data type, but with some special functions that only work with that data type. 

-- example of creating a table with a vector column
CREATE TABLE Sample.CustEventV1 (
  CustID INTEGER,
  EventDt DATE,
  Vtest VECTOR(integer,4),
  EventValue NUMERIC(12,2),  
  EventCD VARCHAR(8)) 

-- You can use functions like VECTOR_DOT_PRODUCT or VECTOR_COSINE on Vtest

 

Taxonomy:  The terms database, schema, deployment, instance, etc are not used exactly the same by different database vendors. 

  • Instance: When you install the database software that is usually called an 'instance' by DB companies. I hear that term at InterSystems sometimes, but more often I hear the term 'deployment'.  This is probably because 'instance' is already used in the objected-oriented world.  Whichever you call it, the hierarchy for other databases is usually:
    • instance/deployment
      • database
        • schema
          • tables, views, etc.

            .. or maybe just:

  • instance/deployment (this *is* the database)
    • schema
      • tables, views, etc.

 

            .. but IRIS is a bit different in that it has an additional layer called 'namespace':

  • instance/deployment
    • namespace
      • database
        • schema
          • tables, views, etc.

Namespace is a logical entity that contains databases. Yet multiple namespaces can contain the same database so maybe it's not a hierarchy.  It's used mostly for access control. And it can contain databases from other instances/deployments!

HA: High Availability is accomplished through something called mirroring.  This is a type of replication where the entire DB is replicated, including code.  You may be thinking you don't want to replicate the entire database.  But because of namespaces, you can consider a database to be a sort of schema and break your data up so that what you want mirrored and not mirrored are in separate databases. 

Code Storage: So, yes, you heard me right; when you mirror a database the code goes with it!  This is a very new feature for some trendy databases but IRIS has always had this. You can put both code and data in the same database but typically people separate them.

ECP: Okay, Enterprise Cache Protocol is where IRIS gets really interesting.  I was not even aware this was possible, though I have since learned there are a couple of obscure NoSQL DB's that can do it.  With ECP you can set it up so that different deployments can share their caches!  Yes, I mean their actual memory caches.. not sharing the table data. It does this by keeping the cache of one deployment automatically in sync with that cache of a different deployment.  Talk about staying in sync!  It's super easy to set up, though it must be complicated behind-the-scenes. It's a whole different type of horizontal scaling and can make apps fly.

Translytical: This word, translytical, is used to describe a database that is both OLTP and OLAP. It may also be called HTAP or HOAP. Sometimes it's called hybrid but that term is too overused in the tech world so I'll stick with the T-word.  In the early days all DBs were translytical.  But with the advent of columnar and other structures, as well as new types of storage (think block store vs blob store) they got separated into OLTP and OLAP. Now vendors are trying to be both again.  It's a heck of a lot easier to add OLAP to an OLTP kernel than the other way around. Sure, DW-focused vendors can paste on some indexing for single-row lookups but I doubt you'll see them adding support for hard things, like triggers and fast inserts/updates, any time soon. The fact is that speedy OLTP is more complicated to build than OLAP.. it's a much more mature technology. IRIS is an excellent translytical database (see the analyst ratings to see why). For instance, some DBs support both row and column-store, but in separate tables.  IRIS can have row-store columns in the same table as column-store columns.

/* Example of row-store/column-store mix. 
   All columns are row-store (the default) except for EventValue.
   EventValue is explicitly defined as column-store. 
   If you queried average EventValue for the whole table it would be fAST! */
CREATE TABLE Sample.CustEvent (
  CustID INTEGER,
  EventDt DATE,
  EventValue NUMERIC(12,2) WITH STORAGETYPE = COLUMNAR,
  EventCD VARCHAR(8))

Installation: With other databases you usually need to either install it somewhere (on-prem or in the cloud) as you do with Postgres or SQL Server, or else use a cloud SAAS like RedShift or Snowflake. With IRIS it depends. There are three ways to get IRIS; via a license, via a managed service, or via Cloud SQL. 

  1. With a license you install, configure, and maintain it yourself. This can be on premises or whatever cloud you choose. Mostly, I've heard of it being run on AWS, Azure, GCP, and TenCent.
  2. With a managed service InterSystems will install, configure, and maintain IRIS for you on a public cloud. 
  3. With Cloud SQL it is a SAAS (or should I say PAAS? DBAAS?).  You don't install anything.  However Cloud SQL is a special case. It is designed to integrate into larger systems as a composable module, offering only a subset of IRIS functionality, such as SQL and machine learning (ML) functions.  The rest of this article is about licensed and managed IRIS, not Cloud SQL.

Embedded Languages: Besides SQL, IRIS has always supported an object oriented language called ObjectScript, which is a descendant of the MUMPS medical language. It's very powerful language but not many people know it. Don't worry, IRIS also supports Embedded Python. 

Documentation: Because IRIS has historically been intertwined with ObjectScript, the documentation tends to be worded in object-oriented terminology.  You may find simple things like tables referred to as 'persistent classes'.  But this seems to be fading from the documentation over time, and anyway you can just ignore it unless you want to be an IRIS coder.

So IRIS supports the SQL you know and love, as well as Python, is translytical, runs on prem or cloud, is multi-model, and has some futuristic features like ECP.  There is much more but these are the things that stood out to me as important and interesting.  I think they would be relevant to other SQL devs and DBAs coming from other products.   If that is you, and you are trying IRIS I would be interested to hear your thoughts on the experience.

1 Comment
Discussion (1)1
Connectez-vous ou inscrivez-vous pour continuer
Article
· Juil 22 5m de lecture

Vector Search Performance

Test Objectives

InterSystems has been testing Vector Search since it was announced as an “experimental feature” in IRIS 2024.1. The first test cycle was aimed at identifying algorithmic inefficiencies and performance constraints during the Development/QD cycle. The next test cycle used simple vector searches for single threaded performance analysis to model reliable, scalable and performant behaviour at production database scale in IRIS, and performed a series of comparative tests of key IRIS vector search features against PostgreSQL/pgvector. The current test cycle models the expected behaviour of real-world customer deployments using complex queries that span multiple indices (vector and non-vector) and run in up to 1000 concurrent threads. These tests will be run against IRIS, PostgreSQL/pgvector, and ElasticSearch

Test Platform

testing has been carried out on a variety of Linux deployment platforms; at the high end we have utilised the facilities of InterSystems Scalability Lab and our primary test platform has been an HPE bare metal server with 48 cores and 528GB RAM running Ubuntu 24.04 and current versions of IRIS and PostgreSQL/pgvector

Test Data

access to appropriate volumes of high-quality embedding data has been a significant issue during all iterations of testing. Testing during Development and QD phases was typically run against thousands of rows of synthetic (randomly generated) embedding data, production scale testing requires a minimum of one million rows. Synthetic embedding data provides an unsatisfactory basis for production scale testing because it does not support plausibility checking (there’s no source data to validate semantic proximity against) and does not reflect the clustering of data in vector space and its effects on indexed search performance and accuracy (specifically recall). We have used the following datasets during testing:

  • 40 thousand embeddings generated from historical bid submissions by the internal RF-Eye project
  • reference dataset downloaded from Hugging Face of embeddings generated against Simple Wikipedia articles at paragraph level using the Cohere multilingual-22-12 model, initially 1 million rows and then an additional 9 million rows (out of 35 million available)
  • 25 million rows of embeddings generated by a customer against public domain medical data using a custom model

to support our tests data is loaded into a staging table in the database then incrementally moved to vector data structures to support testing at increasing database sizes. Where required indexing is deferred and run as a separate post-move step. A small percentage (generally 0.1%) of the staged data is written to a query data store and subsequently used to drive the test harness; depending on whether we want to model exact vector matching this data may also be written to the vector data store.

We intend to scale our tests up to 500 million rows of embeddings when we can identify and access an appropriate dataset

Test Methodology

our standard vector search unit test performs 1000 timed queries against the vector database using randomly selected (not generated) query parameters. For indexed searches we also run an equivalent unindexed query with the same parameters to establish the ground truth result set, then compute RECALL at 1, 5 and 10 records. Each test run performs one hundred unit tests, and we perform multiple test runs against each configuration/database size

Unindexed Vector Search

in IRIS unindexed COSINE and DOT_PRODUCT search performance demonstrate predictable linear scaling until we reach the compute constraints of the server (typically by exhausting global buffers). Running our test harness against the same dataset in PostgreSQL/pgvector demonstrates that unindexed search is faster in IRIS than in current versions of PostgreSQL/pgvector

 

vector storage footprint and unindexed search performance are affected by the dimensionality of the embedding (the number of discrete components that define it in vector space), which is a function of the embedding model. IRIS DOT_PRODUCT and COSINE search cost per n dimensions shows better than linear scaling (the relative cost of searching n dimensions in an embedding decreases as the dimension count of the embedding increases), because the fixed overhead of accessing and processing the record is amortized over a greater number of component calculations

   

Indexed Vector Search

we have tested Approximate Nearest Neighbor (ANN) search  (implemented in IRIS using the Hierarchical Navigable Small World (HNSW) algorithm) with datasets up to 25 million rows of real-world embedding data and 100 million rows of synthetic data.

IRIS ANN index build times are predictable and consistent as the number of records being indexed increases, in our test environment (48 core physical server) each increment of 1 million records increases the index build time by around 3000 seconds

IRIS ANN search performance is significantly better than unindexed search performance, with sub-second execution of our test harness at database sizes up to 25 million real world records

ANN search recall was consistently above 90% in our IRIS tests with real world data

comparative testing against the same dataset in PostgreSQL/pgvector demonstrates that IRIS ANN search runs more slowly than an equivalent search in PostgreSQL/pgvector but has a more compact data footprint and quicker index builds

What’s Next?

our current benchmarking effort is focused on complex multi-threaded queries which span multiple indices (ANN and non-ANN). The ANN index is fully integrated into InterSystems query optimizer with an assigned a pseudo-selectivity of 0.5%. The query optimizer generates a query plan by comparing the selectivity of all indices available on the table being queried, with query execution starting with the most selective index and evaluating the ANN index at the appropriate step

Key Application Considerations (so far)

  • if you need guaranteed exact results unindexed search is the only option – try to limit the size of the dataset to fit into global buffers for best performance
  • restrict dimensionality to the minimum value that will provide the search granularity required by your application (model choice)
  • if you’re using ANN indexing understand the RECALL value your application needs and tune the index accordingly (remember this is data dependent)
  • the index-time resource requirement is likely to be different from the query-time requirement, you may need different environments
  • index build is resource intensive – maximize the number of worker jobs with Work Queue Manager and be aware of the constraints in your environment (IOPS in public cloud); ingest then index rather than index on insert for initial data load
  • keep the index global in buffers at query time, consider pre-fetching for optimal performance from first access
  • good IRIS system management remains key.
Discussion (0)1
Connectez-vous ou inscrivez-vous pour continuer