Rechercher

InterSystems officiel
· Juin 19

InterSystems annonce la prise en charge d'InterSystems IRIS pour Red Hat Enterprise Linux 10

InterSystems a le plaisir d'annoncer que les produits suivants sont désormais disponibles pour Red Hat Enterprise Linux 10 :

  • InterSystems IRIS Data Platform 2025.1.0.230.2
  • InterSystems IRIS for Health 2025.1.0.230.2
  • HealthShare Health Connect 2025.1.0.230.2

Cette version prend désormais en charge le système d'exploitation Red Hat Enterprise Linux 10. RHEL 10 inclut le noyau Linux 6.12.0, des améliorations de sécurité, des optimisations de performances et des outils de développement améliorés.

Comment obtenir le logiciel ?

Les packages d'installation sont disponibles sur la page des versions de maintenance étendue du WRC. Vous trouverez également des kits sur le site web des services d'évaluation.

Discussion (0)0
Connectez-vous ou inscrivez-vous pour continuer
Article
· Juin 19 10m de lecture

Towards Smarter Table Statistics

This article describes a significant enhancement of how InterSystems IRIS deals with table statistics, a crucial element for IRIS SQL processing, in the 2025.2 release. We'll start with a brief refresher on what table statistics are, how they are used, and why we needed this enhancement. Then, we'll dive into the details of the new infrastructure for collecting and saving table statistics, after which we'll zoom in onto what the change means in practice for your applications. We'll end with a few additional notes on patterns enabled by the new model, and look forward to the follow-on phases of this initial delivery.

I don't use SQL, why am I here? – that's a relevant existential question (wink) , but still this article may have some valuable information for you. Some higher-level capabilities in IRIS, notably Interoperability, make use of IRIS SQL under the hood, and some of the changes in behaviour may mean you want to switch on the new maintenance task. See the section on "automatic collection of table statistics below".

Table Statistics 101

SQL stands for Structured Query Language. SQL is a language in which you express what you want (the query), rather than how you want it (the code to fetch the query result). It is up to a database's query engine to consider all options and pick the presumably optimal plan to retrieve the result you're asking for. In order to figure out the optimal plan, the query optimizer needs essentially two things: knowledge about your table structure, such as where your data is stored and what indices are available, and knowledge about the table data itself: table statistics. These table statistics cover information such as the estimated size of the table, and the selectivity of every field, which expresses how large a fraction of that table corresponds, on average, to one particular field value. These are critical for making good decisions about query plans. Say you need to find the number of female customers in one particular zip code, and can choose between starting from an index on gender or one on zip code. The selectivity of the gender index will be around 50%, and the one on zip code may be as small as 1% or less, meaning you'll filter the set of rows to look up much more quickly by starting from the latter. 

While the table structure is understandably part of your application and can be packaged as part of the application code, that isn't necessarily the case for table statistics. If the previous example is taken from a small bookstore application, it's likely the field selectivity values hold for every bookstore the application gets deployed to, but if it is an electronic health record system, the selectivity of the gender field will differ between a maternity clinic and a general practice, and the number of zip codes (and therefore their selectivity) will depend on the size of the area the hospital serves. Clearly, you want your statistics and therefore query plans to be based on the actual data in the environment the application is deployed to.

Table statistics on IRIS - before 2025.2

On IRIS, table statistics historically have been stored as part of a class' definition. This has a number of drawbacks

  • As described, data distributions may vary significantly between the environments in which you're deploying your application. This is not just true for estimated table size and selectivity, but even more so for other, more advanced types of statistics such as outlier information and histograms.
  • When deploying an updated version of your application, you would usually want to keep any existing statistics – assuming they are based on that environment's actual data. This is trickier when they are stored as part of the code, and obviously you don't want to accidentally deploy the table stats from your dev or test environment to a customer system.
  • When application code is deployed in the form of a read-only database, or when it is part of an IRIS system library (like Interoperability messages), there is no good way to update those tables' statistics whatsoever.

See also this earlier article that goes into a little more detail on the above. These are some of the reasons why we set out to redesign how we manage table statistics on IRIS, and the new infrastructure is now included with InterSystems IRIS 2025.2.

What's changing?

Collected statistics vs Fixed statistics

As described in the examples before, storing statistics as part of application code only makes sense in very specific, controlled environments. In most cases, it is preferable to keep statistics with the data they were collected from, rather than fix them as part of the code, so that's what we're doing in the new model. We distinguish between collected statistics that are always based on actual data, and stored in the extent index (a global with other registry-style details about your table data, also referred to as extents) and fixed statistics that are hardcoded into the class definition by the application developer, and may be based on actual data or some educated assumptions. As you can guess, fixed statistics correspond to the pre-2025.2 model for managing table statistics. 

Use the new COLLECT STATISTICS FOR TABLE t  command (which is 100% synonymous with the existing TUNE TABLE one, and just us standardizing terminology) to populate a new set of statistics based on the current data in your table. Table statistics are tiny, so we actually won't overwrite the previous stats, but rather store a new set and keep the older statistics around for informational purposes until they meet default purge criteria, or are purged explicitly. If you'd like to switch to the fixed model, you can save your collected stats to the class definition using ALTER TABLE t FIX STATISTICS. This may be part of your application packaging procedure, if you're confident those stats will match your target environment and you prefer the static model.

By default, fixed statistics take precedence over collected statistics. This ensures backwards compatibility, and means as a developer you still have the final say. In fact, having any fixed statistic, even for just one field, means collected statistics will be ignored, and we'll fall back to safe estimates based on the field's datatype for any fields that don't have fixed statistics. If you want to test with collected stats without dropping your fixed stats, you can include the %NOFIXEDSTATS hint in the FROM clause of your query text (per table) to obtain the query plan solely based on collected stats, or use ALTER TABLE t SET RUNTIME IGNOREFIXEDSTATS = TRUE to set this behaviour at the table level. 

Automatic collection of table statistics

The introduction of collected statistics already addresses many of the packaging and deployment inconveniences pointed out in the introduction. However, it does not really do anything about one of the most common problems we see with customers: the lack of up-to-date statistics. As described in the introduction, accurate statistics are key to getting the best query plans for your specific data distribution. If you never bothered to gather any, the optimizer will use crude defaults based on your fields' datatype, but these are unlikely to properly account for the specifics of your application in all cases. Starting with 2021.2, IRIS would automatically collect stats for tables at query time that don't have any and are eligible for a fast sampling technique, but that would only happen once, and might be at a time when your table has not yet been populated with representative data. But even when statistics were collected at a reasonable moment, data distributions shift over time (notably histograms) and we very often see customers getting suboptimal query plans simply because they are based on stale statistics.

With 2025.2, we're introducing a new system task that runs during a nightly maintenance window, and for each namespace, considers all tables that have no statistics at all, for which the most recent stats have been invalidated (for example after making changes to the table's storage definition and recompiling), or for which at least 25% of the table data has changed since stats were last collected (approximately, this is measured based on the combined ROWCOUNT for DML operations against that table). Statistics are then collected for these tables one by one, or until the maximum duration for the maintenance task (configurable) is reached. Other configuration options include whether to consider tables with fixed statistics (ON by default), tables that map to remote storage (OFF by default), and tables that are not eligible for a fast sampling technique (OFF by default). If needed, you can mark individual tables using a SKIPAUTOMATICSTATSCOLLECTION class parameter (or equivalent runtime flag) such that they are skipped by this system task (note that this new flag also affects the pre-existing AutoTune behaviour).

In 2025.2, this system task is OFF by default, as we want to assess with customers the impact this has on IO patterns for real-world systems. We encourage customers to enable the system task and get in touch to share their experiences with respect to system load so we can make any necessary adjustments before switching it ON by default in a future release. At that time, we intend to retire the earlier AutoTune behaviour that might kick in during query processing.

Manual collection of table statistics is of course still supported, and running COLLECT STATISTICS after major data loads or purges is still a best practice.

What does this mean for your application?

We've taken care to design the new model such that for SQL applications that upgrade to IRIS 2025.2, nothing really changes. Pre-existing statistics will now be treated as fixed statistics, which take precedence over any statistics that would be collected automatically in the background. We advise that customers upgrading to the new model consider the following three recommendations:

Drop fixed statistics after upgrading

If your application or in fact any table had statisticsbefore upgrading (which would be the case for every table if you followed best practices), consider dropping them outright. Post-upgrade, they will be considered fixed statistics and take precedence over any stats you collect explicitly (maybe through some custom scripts calling TUNE TABLE) or implicitly. Fixed table statistics can be dropped using a new ALTER TABLE t DROP FIXED STATISTICS command, or the equivalent ALTER SCHEMA s DROP FIXED STATISTICS. You can use the %NOFIXEDSTATS hint to verify the impact on individual queries upfront, as you prefer.

Only when you believe the predefined stats must remain in place to maintain the current query plans, and no changes to data distributions are expected, would one stick with fixed statistics.

Consider statistics for system classes

It's worth noting that the new collected stats model is applicable to tables whose definition resides in a read-only database, including IRIS system tables such as Ens.MessageHeader. For such tables, table stats were impractical or even impossible to maintain before, but now they will become relevant and, when you have the collection system task enabled, will be maintained automatically. When using Interoperability productions or other pieces of IRIS infrastructure that may involve SQL under the hood, we recommend keeping an eye out for performance of search and similar operations after upgrading, and collect statistics manually or enable the system task. 

When using Message Search, we noted significant improvements in query performance after adopting the new model, as it could now use much more realistic selectivity and other stats, where previously it relied on crude defaults. As such, we have dropped the fixed statistics we shipped with these system tables.

Take care when importing and exporting across versions

When exporting table definitions with the /exportselectivity=1 flag (the default), statistics are included using a new format that caters to both fixed and the most recent collected statistics, but is incompatible with earlier versions. To support importing into IRIS instances running earlier releases, use /exportversion=2025.1 or /exportselectivity=0 as appropriate. Please note that collected statistics that are exported using this format will still be imported as collected statistics, and not silently become fixed because they were included in the class definition export file. Given these nuances, you may want to review your source control strategy to ensure the right information is tracked for your deployment model. Note also the symmetrical /importselectivity flag that can be used when importing class definitions. 

The Import() and Export() methods in the $SYSTEM.SQL.Stats.Table class have also been extended with an additional type argument to properly differentiate between both types of stats. See the class reference for details.

Future work

This release includes all the relevant infrastructure to take advantage of the new model. Apart from any minor convenience enhancements, we plan the following two updates in a next release (presumably 2025.3)

Enabling the automatic collection task

As noted earlier, we introduced a new system task for automatic collection of table statistics during a nightly batch window but kept it disabled in order to avoid unexpected IO load post-upgrade. In a next release, we will switch this to on by default, if not already enabled by the customer at that point.

Smarter sample size

We noted that our fast block sampling algorithm may in some cases overestimate the selectivity of highly selective fields in large tables, meaning the optimizer might not use corresponding indices when that would have been optimal. We're implementing an algorithm that will dynamically refine the sample size if we detect too much variability between samplings. This change is mostly independent of the overall infrastructure change described earlier in this article.

Share your experiences

We've been looking forward to rolling out this change for a long time, and are really glad it's now being included with IRIS 2025.2. We've taken care to test this change in many different setups and are confident the new infrastructure is robust, but realize the changes may touch highly customized deployment practices, including namespace mappings and code packaging. Therefore, we welcome any feedback you'd have, and look forward to making the new infrastructure make your life easier.

5 Comments
Discussion (5)3
Connectez-vous ou inscrivez-vous pour continuer
Article
· Juin 19 3m de lecture

Innovating for Generative Elegance

Audience

Those curious in exploring new GenerativeAI usecases.

Shares thoughts and rationale when training generative AI for pattern matching.

Challenge 1 - Simple but no simpler

A developer aspires to conceive an elegant solution to requirements.
Pattern matches ( like regular expressions ) can be solved for in many ways. Which one is the better code solution?
Can an AI postulate an elegant pattern match solution for a range of simple-to-complex data samples?

Consider the three string values:

  • "AA"
  • "BB"
  • "CC"

The expression: "2 Alphabetic characters" matches all these values and other intuitively similar values in general flexible way.

Alternatively the expression: "AA" or "BB" or "CC" would be a very specific way to match only these values:

Another way to solve would be: "A" or "B" or "C", twice over.

Challenge 2  - Incomplete sample

Pattern problems rarely have every example specified.
A performative AI needs to accept a limited incomplete sample of data rows and postulate a reasonable pattern match expression.
A Turing goal would be to meet parity with human inference for a pattern for representative but incomplete data.

Better quality of sample processing is a higher prioity than expanding the token window for larger sample sizes.

 

Challenge 3 - Leverage repeating sequences

Extending the previous example, to also include single character values

  • "A"
  • "B"
  • "C"

This seems more elegant than specifying ALL the possible values long-hand.

 if test?1(1"A",1"B",1"C",1"AA",1"BB",1"CC")

 

Challenge 4 - Delimited data bias

A common need beyond generalized patterns is to solve for delimited data. For example a random phone number format

213-5729-57986

Could be solved by expression:
3 Numeric, dash, 4 numeric, dash, 4 numeric 

 if test?3N1"-"4N1"-"4N

This can be normalized with repeat sequence to:

 if test?3N2(1"-"4)

Essentially this means having a preference for explicitly specifying a delimiter for example "-" instead of generalizing delimiters as punctuation characters. So generated output should prefer to avoid over-generalization for example:

 if test?3N1P4N1P4N

 

Challenge 5 - Repeating sequences

Consider formatted numbers with common prefix codes.

The AI model detects three common sequences across the values and biases the solution to reflect an interest in this feature:

On this occasion the AI has decided to generate a superfluous "13" string match.

Howeve, as indicated by the tool, the pattern will match all the values provided.

The pattern can easily be adjusted in the free text description and regenerated.

Inference Speed

Workbench AI assistance with qualified partial success can be an accelerator to implementation.
Above a complexity threshold an AI assistant can deduce proposals faster than manual analysis.
Consider the following AI inference attempt with qualified partial success:

The AI assistant uses as many rows of data that it can fit into its token context window for processing, skipping excess data rows.
The number of rows is quantified in the generated output showing how data was truncated for inference.
This can be useful to elevate preferred data rows back into the context window for refined reprocessing.

 

Training Effort

Targeting Nvidia Cuda A10 GPU on Huggingface.
Supervised model training.

Stage Continuous GPU training
Prototype base dataset 4 days
Main dataset 13 days
Second refined dataset 2 days

 

Conclusion

Single-shot generative inference with constrained token-size can usefully approach discrete code solution elegance in absence of chain-of-thought processing by curating subject expert bias into base training data.
AI assistants can participate in iterative solution workflows.

 

Explore more

Get hands on and explore the technology demo currently hosted via Huggingface.
The cog symbol on buttons in demo indicate where AI generation is being employed

Demo written for English, French and Spanish audience.

1 Comment
Discussion (1)2
Connectez-vous ou inscrivez-vous pour continuer
Question
· Juin 19

Stop HTTP Response content

Hi Guys,

below is a running a URL in a browser to connect and get live advertising packets data from Cassia AC server which keeps the connect live and keep going and population data. 

 

So now I'm doing the same in the below code but the issue I'm not getting anything and it gets stuck at Get(HttpURL) line because it hanging on that live connection I guess and not outputting, so how can I get the HTTP request to exit and get me only a snapshot of that data, maybe something like 20 sec worth of data?  

 BleMac="DC:0D:30:9E:3A:EC",GatewayMac="CC:1B:E0:E2:56:18"
Token=##class(SX3.Task.TemperatureCollection).GetAuth()
Set Httprequest=##class(%Net.HttpRequest).%New()
Set Httprequest.SSLConfiguration="RTLS"
Set Httprequest.Server="MyURL"
Set Httprequest.Timeout=30
Set Httprequest.Https=1
//S Httprequest.WriteTimeout=10
set Httprequest.ContentType="application/json"
Do Httprequest.SetHeader("Accept","application/json")
Do Httprequest.SetHeader("Accept-Language","en_US")
Do Httprequest.SetHeader("Authorization","Bearer "_Token)
Set HttpURL="/api/gap/nodes?filter_mac="_BleMac_"&filter_rssi=-75&mac="_GatewayMac_"&active=1&event=1&chip=1&access_token="_Token
^check("1222")=HttpURL_"|"_Token
Set tSc=Httprequest.Get(HttpURL)
^check("122")=1
^data=Httprequest.HttpResponse.Data.Read(Httprequest.HttpResponse.Data.Size)
//W !,Httprequest.HttpResponse.Data.ReadLine()
//S obj=##class(%ZEN.Auxiliary.jsonProvider).%ConvertJSONToObject(Httprequest.HttpResponse.Data.Read(Httprequest.HttpResponse.Data.Size),,.list)

 

Thanks

1 Comment
Discussion (1)1
Connectez-vous ou inscrivez-vous pour continuer
Question
· Juin 18

Custom Description Popup Text for Interoperability Production Business Hosts

I seem to remember making this work before, but I'm not having any luck digging up examples.

I've defined some custom properties for a business operation that could definitely benefit from having popup descriptions available in the Production Configuration. I have triple-slash comments before each property that do just that in the source. I thought those provided the text for the popup descriptions when clicking on the property name, but apparently not.

Any thoughts?

10 Comments
Discussion (10)2
Connectez-vous ou inscrivez-vous pour continuer