PDMX - Excelmeer

PDMX
Poll & Direct Metadata for sdmX
Last update

Introduction
The Excel PDMX tool supports the user in managing its SDMX repository by reporting all the major SDMX artifacts from its SDMX endpoint (dataflows, (meta)datastructures, conceptschemes and codelists). Poll & Direct Metadata for SDMX (with Excel)

Only JSON formatted content is supported, based on the SDMX standard (see below for more info on SDMX). Please note that some organizations require authentication for fair use of their SDMX API (which is not supported by PDMX).

The main features of PDMX are:

Reports all dataflows, (meta)datastructures, conceptschemes and codelists
Checks the mutual consistency of the retrieved artifacts
Enables comparison of results between two systems (e.g. preview and production system)
Counts the size of data of the dataflows (based on the number of observations or raw data)
Provides a report with results per (sub-)agency
Supports saving results to a history overview
Easy to deploy in any Office environment (with Excel)

In the main sheet (named PDMX) the results are summarized (see figure 1 below for an impression) and the requests can be initiated by selecting the proper button in the Form (to launch by the 'Form'-button or F1). Retrieving the artifacts may take some time depending on the number within the selected system and your network connection. In the statusbar the status of the retrieval can be followed.

Some supporting functions can be selected as well, to unhide or hide the configuration (of the SDMX endpoint details), to initialise the worksheet, to resize the column widths of all data sheets, to save summarized results to a history sheet. The results for the different artifacts are checked for their usage as well (to find any unused artifacts or 'orphans', see figure 2).

The standard interrelationships of the especially the datastructure artifact are therefore of great importance. The tool can be used for all standard SDMX-API endpoints (to configure by their url, accept header and REST parameters, see below the options). You can try it out with the already configured examples, they all work.

The configuration of the SDMX endpoints can be hidden of unhidden by the button in the general form. It is only needed to enter data in the configuration cells (yellow cells) and to select the endpoint of your choice in the main sheet (in the green or blue cell).

And remember, to prevent any malfunctioning of PDMX, do not update or delete cells in the existing sheets and do not change sheet names without a good reason!

Usage
After downloading PDMX from the download-page or by this link, you must first enable the macro by trusting this file (at your own risk). You can rename the file for your own convenience. The next thing is to add and select your endpoint-parameters in the configuration-part of the main sheet (named PDMX), which can be hidden afterwards. With the Form-button (or F1) all available functions can be disclosed:

Get DFs: Retrieve all dataflows from system (data is collected in sheet DF1 or DF2)
Get DSs: Retrieve all datastructures from system (data is collected in sheet DS1 or DS2)
Get CSs: Retrieve all conceptschemes from system (data is collected in sheet CS1 or CS2)
Get CLs: Retrieve all codelists from system (data is collected in sheet CL1 or CL2)
Agency rep.: Produces summary of all retrieved artifacts per agency (in sheet AG1 or AG2)
Count data: Retrieve datasize in sheet DT1 or DT2 for all dataflows (collected in sheet DF1 or DF2)
Count categ.: Retrieve categories and counts only the number present
Count constr.: Retrieve constraints and counts only the number present
Compare DFs: Compare all retrieved dataflows of system1 with system2 (in sheet DFc)
Compare DSs: Compare all retrieved datastructures of system1 with system2 (in sheet DSc)
Compare CSs: Compare all retrieved conceptschemes of system1 with system2 (in sheet CSc)
Compare CLs: Compare all retrieved codelists of system1 with system2 (in sheet CLc)
Add history: Add current totals of summary in main sheet as a new row in the History sheet
Unhide config: Unhide (and hide) the configuration data in the main sheet
Resize sheets: Resize all data sheets to its initial column sizes
Initialise: Remove all retrieved data from the worksheet (except the configuration)
Cancel: Cancel function selection with the form menu

Note 1: The first 8 functions can be applied for the 'green' and 'blue' system (or system1 and system2).
Note 2: The right order in retrieving a complete and consistent collection of a system is: dataflows, datastructures, conceptschemes and codelists.
Note 3: By hitting the little button next to label System 1 or System 2, the DFs, DSs, CSs and CLs can be retrieved in 'one go'.
Note 4: The Count data function may take a very long time depending on your connection or system endpoint. You can abort this function at any time (the data size is marked with an asterisk as incomplete). And if you start another Excel workbook, it will automatically abort (only one Excel workbook opened is allowed during data counting).

The results are displayed in the main PDMX-sheet, as well the last url used and a snippet of the reponse retrieved. The previous results are moved to the next column to the right. After retrieving the artifacts, you can examine the results in the appropriate sheets. Artifacts used are counted by their references, so artifacts not used can easily be recognized. Older versions of artifacts can be such a reason.

Unused artifacts
One of the main goals of PDMX is to detect artifacts that are isolated and not (anymore) used. The SDMX standard provides a highly interoperable and reusable way to manage statistical data with its metadata, but it has a drawback: managing all the artifacts present and their versions can become cumbersome. PDMX can help you with this, for that PDMX needs to count all related artifacts for datastructures and conceptschemas (see below also the paragraph on the reusability factor). Related artifacts are part of the reference information in each artifact, available with the request option 'details=full', which is the default option for the SDMX API.

There are two challenges though: First, it can be time consuming. Especially if the number of dataflows and datastructures is high. Second, it may require a lot of data to be retrieved from the API at once, which can result in gateway timeouts (504) or other errors.

So, to enable the detection of unused artifacts depends on the size of the artifact sets in your system. If this size is too high and gives an http error, unfortunately the detail option must be set to 'detail=allstubs'. The maximum number of artifacts to retrieve by PDMX is currently set to 8000, the maximum number of concepts derived from the datastructures and conceptschemes is currently set to 80000.

Besides reducing your metadata by reusing it whenever possible, it makes managing your metadata, even with the help of PDMX, more pleasant. To control the counting of related artifacts, PDMX offers a configuration parameter Count related artifacts, which is enabled by default. Disable it by entering 'N' as the value. A second processing option could be the Show processing details in status bar, it gives detailed information in the status bar when artifacts are retrieved and processed. Particularly useful for systems with large artifact sets.

Data counting
Statistical systems can contain a lot of data, which makes their maintenance difficult. The Data count function allows the data available for each dataflow to be counted. Most SDMX systems do support some way to get an indication of the size of a dataflow (like OECD), the AvailableConstraint artifact provides this functionality. It generally contains the number of observations used by PDMX. In the event that the SDMX system does not provide this artifact (like BIS or Unicef), the 'brute force' method is selectable via the 'Data' checkbox (by which PDMX retrieves the data itself). Especially in the latter case, it will be a costly operation, at least in duration. That's why PDMX offers an incremental approach (to enable via the 'Incremental' checkbox).

The data can be requested intermittently and repeated regularly. After all, the size of the dataflows once retrieved will not change much or very often. To facilitate incremental data retrieval, PDMX also provides two configurable parameters:

Speed calibration: adjust this counter so that the speed of the counter value (dependant of your system) is about one second equal
Maximum wait: set this counter to maximum 'seconds' (defined by speed calibration), set to 0 to disable

PS As PDMX retrieves the data, a counter (starting from zero) is displayed in the row of the counted dataflow. Once the data retrieval is complete, the result (number of bytes) replaces this counter value.

Both parameters allow you to set a maximum wait in duration before skipping the dataflow and moving on to the next one.. However, you can always stop it and continue on another occasion (via alt-b). The result is displayed as http status code 200 (if ok) or any 4** or 5** value. If the retrieval fails due to a maximum wait, the fictitious result value will be -100.

Some systems do not like continuous retrieval (or 'harvesting'), so a dynamic response may be possible. An appropriate way may be to return status code 429. PDMX can automatically respond to the 429 status by waiting a minute before retrying (if enabled by the Retry after 429 parameter in the configuration).

In summary, an incremental approach does not repeat previous successful requests and retries all other data count requests. And if you want to start with a clean slate, simply disable the 'Incremental' checkbox.

Reusability factor
Below an overview of the PDMX-statistics of different (SDMX-)systems is presented, with their numbers and their (experimental) reusability factor. These 'statictics of the statistics' were collected in early June 2025 and may change in the future. The reusability factor is calculated by this formula: (DFs/DSs+DSs/CSs+((DFs+DSs+CSs)/CLs))/3
Note: the reusability factor is currently only shown when the 'one go' function is selected.

System	DFs	DSs	CSs	CLs	Rf	Rl
ISTAT	3947	839	27	569	14.7	Excellent
ABS	1240	1239	73	618	7.5	High
OECD	1866	464	282	886	2.9	Medium
UNICEF	177	84	18	273	2.7	Medium
BIS	32	27	10	148	1.5	Poor
StatLine*	428	426	436	3041	0.8	Poor

* current beta version

Disclaimer
Please note that you use the PDMX tool at your own risk. Excelmeer is in no way liable for any damage resulting from its use.

Version	Date	Information
v0.9a	11-01-1025	First release made public, updated interface (to be independent of any locale), improved messages, added and checked examples.
v0.9b	27-01-2025	Add an incremental way of counting all available data, add maximum wait on data retrieval.
v0.9c	11-02-2025	Add an alternative and much quicker way of counting all available data, and improved minor things.
v0.9d	13-02-2025	Fix error when counting raw data and interrupting.
v1.0a	06-03-2025	Release 1.0, fix counting metadata structures, optimize getting datastructures & conceptschemes, add run indicator and more.
v1.1a	11-03-2025	Update 1.1, increased processing capacity, added option to explicitly count related artifacts, added option for status bar update detail level.
v1.1b	19-04-2025	Update 1.1, add check on other open workbooks (to prevent errors while processing), add reusability factor (for 'one go' function).
v1.2a	04-05-2025	Update 1.2, improve agency report with PDMX-logo, add duration to data count results, made display of data count results of dataflow retrieval, fix minor things.

Contact
If you have any suggestions or are looking for some customization in PDMX, please contact Excelmeer.

Figure 1: animated example of main screen (v0.9a)

Figure 2: example of retrieved codelists

SDMX background
SDMX, which stands for Statistical Data and Metadata eXchange, is an ISO standard designed to describe statistical data and metadata, normalise their exchange, and improve their efficient sharing across statistical and similar organisations.

SDMX is sponsored by eight international organisations including the Bank for International Settlements (BIS), European Central Bank (ECB), Eurostat (Statistical Office of the European Union), International Labour Organization (ILO), International Monetary Fund (IMF), Organisation for Economic Cooperation and Development (OECD), United Nations Statistical Division (UNSD), and World Bank.

The SDMX standard provides an integrated approach to facilitating statistical data and metadata exchange, enabling interoperable implementations within and between systems concerned with the exchange, reporting and dissemination of statistical data and their related meta-information. More info on SDMX can be found at the SDMX website.

Standard SDMX-API parameters

DETAIL-parameter: This attribute specifies the desired amount of information to be returned.

ALLSTUBS : all artifacts should be returned as stubs, containing only identification information, as well as the artifacts' name.
REFERENCESTUBS: referenced artifacts should be returned as stubs, containing only identification information, as well as the artifacts' name.
REFERENCEPARTIAL: referenced item schemes should only include items used by the artifact to be returned. For example, a concept scheme would only contain the concepts used in a DSD, and its ISPARTIAL flag would be set to true. Likewise, if a dataflow has been constrained, then the codelists referenced by the DSD referenced by the dataflow should only contain the codes allowed by the content constraint
ALLCOMPLETESTUBS: all artifacts should be returned as complete stubs, containing identification information, the artifacts' name, description,annotations and isFinal information.
REFERENCECOMPLETESTUBS: referenced artifacts should be returned as complete stubs, containing identification information, the artifacts' name, description, annotations and isFinal information.
FULL (default): all available information for all artifacts should be returned.

REFERENCES-parameter: This attribute instructs the web service to return (or not) the artifacts referenced by the artifact to be returned (for example, the code lists and concepts used by the data structure definition matching the query as well as the artifacts that use the matching artifact (for example, the dataflows that use the data structure definition matching the query).

NONE (default): no references will be returned.
PARENTS: the artifacts that use the artifact matching the query
PARENTSANDSIBLINGS: the artifacts that use the artifact matching the query, as well as the artifacts referenced by these artifacts.
CHILDREN: artifacts referenced by the artifact to be returned.
DESCENDANTS: references of references, up to any level, will also be returned.
ALL: the combination of parentsandsiblings and descendants, in addition, a concrete type of resource may also be used (for example, references=codelist).

Please note, for using PDMX, only the DETAIL attribute can be useful, the REFERENCES attribute is of no use. More info on the SDMX-API can be found here.