Graph image blog

Architecture of a Digital Twin Service

Note: This is Part 3 of a 3-part blog series

What’s required to build and maintain the digital twin?

Sean 1St Diagram 2 1

In part one of this blog series, Andy defined digital twins and their importance for asset operators, and in part two, Sameer gave an overview of the path of digital twin evolution. In the final post of this three part series, I'm going to discuss the process for building and managing Digital Twins (DT’s) throughout their evolution in order for them to provide value to asset operators.  I’ll touch on the four main stages for building and managing DT’s: Data Collection, Data Pipelines, Data Egress, and Data Integrity.

At Element Analytics, our solution tackles this challenge utilizing a unique architecture to easily build digital twins across any enterprise data silos, quickly identify and respond to changes in those systems, and deliver data in a common data layer making it easily accessible for any analytical use case.

The Architecture Required to Manage Digital Twins

To successfully achieve a digital transformation requires a data-centric approach because data is the core to any digital effort.  Poorly organized and prepared data leads to low quality analytical results.  Continuously applying both context and assurance to your aggregated operational data set is paramount. This architecture provides both while minimizing potential human error in the process to keep the data always ready for any digital application - letting the data reveal the insights for you, and avoiding the bias of hand picking the right sensors for an application.

Data Diagram Outlinedtext

Figure 1: An architecture for building and maintaining the digital twin

Data and Data Collection

The Industrial Digital Transformation begins with building Operational Digital Twins. These digital twins are comprised of two types of data: model data and time series data.

  • Model data is used to construct the digital representation of real world things like equipment, processes and people, and the relationships between those things. We do this by constructing  a graph model (the Element Graph) from operational data sources that fall in the purview of asset operators.

  • Time series data represents observations of the state of some physical thing at a given time. It can be either continuous or discrete. Temperature is continuous because even if we didn’t observe the temperature at some time, we know that there was a temperature at that time.  A maintenance log is discrete because while we see two events on the same day you cannot derive any conclusions about what happened between those two observations, e.g. even if we know that Bob replaced a pump in the morning, and Christine replaced a pump in the evening, we cannot make any assumption about what happened in the afternoon.

Building these two data types into the DT requires aggregating data from systems that already contain the data - your existing IT and OT investments. All of these systems contain relevant information about your assets and related equipment.  These data sets can be used to perform high value analytics once these independent data stores are aggregated and linked together. This is achieved through a time series stream ID that serves as a unique identifier for a data stream. The unique identifier allows the Element Graph to refer to a time series, and allows searches through both the graph model and the time series model.

We’ve also developed a highly flexible agent-based system in order to connect and collect all the relevant systems and data. Each agent knows how to talk to one of the systems in question, whether it be a Historian, an Enterprise Asset Management system, a Process Hazard system, or a bespoke system developed in house. Each agent talks to the enterprise system and translates the data into the language of the Element Analytics Ingest API on a regular schedule.

Data Pipelines

Once the data is collected, the Digital Twin is created using the Element Graph Studio. At this step, the disparate array of data sources from the Data Collection stage are merged into a single coherent model and exported into the Element Graph. This is done by using a visual programming language to build a data pipeline, which dictates how to merge multiple data sources and transform them into a graph (curious about why we use a graph model? Andrew Soignier discussed why graph models provide the most flexible and resilient way to model asset data in a recent blog post).

Pipeline screenshot

Figure 2: A complex data pipeline merging 4 data sources.

Simply connecting all of your IT and OT systems data once is not enough to provide value. To realize true value you also need to keep those systems in sync and connected to your digital twin as they change. Data Pipelines allow us to perform this activity continuously once the pipeline(s) have been published.

Data Integrity

In addition to collecting, modeling and organizing data we need to establish confidence in the graph model, and in the time series data itself. This requires understanding what data is missing, corrupt or suspicious, and therefore needs attention before it can be trusted for use in analytics. It’s common to have many data integrity issues, from sensor coverage gaps to null or miscalibrated sensors which can result in an incomplete or inaccurate Digital Twin.  Establishing trust in the Digital Twin requires both Data Model Assurance and analyzing the time series data to see if there are any detectable issues.

Data Model Assurance is the process of detecting issues in the Digital Twin structure which is accomplished by comparing Digital Twin Templates to the physical devices themselves.  Example. Think of the ideal form for a pump and what data is represented therein (the Digital Twin Template).  It has sensors for measuring temperature, voltage, input pressure, outlet pressure, and vibration. Then, there’s the actual physical pump, which doesn’t match up exactly with the ideal form (or the “schema”).

Both Pumps One File Divider

Figure 3: Ideal pump model or schema defining 5 sensors vs an actual pump, missing the temperature and vibration sensors.

As you can see, the actual pump doesn’t match up with the idealized pump model, as it’s missing both the temperature and vibration sensors. The Data Integrity component highlights these kind of issues which are common across assets and equipment. There are other more sophisticated mechanisms that we employ for detecting and highlighting these kind of structural issues in the data model (e.g. inconsistencies in the data types, or even the units of measure), but they all follow a similar pattern and so don’t bear a complete explanation here.

Data Integrity also looks at the actual data streams to detect issues with calibration, connectivity, or even physical issues with the instrumentation that collects the physical data. This is a set of analyses on either single variate or multivariate data looking for interesting anomalies, or simple patterns. These start simple by detecting static data (where a sensor is stuck and reports a static value over time), and become progressively more complicated, culminating with multivariate unsupervised anomaly detection.


With these two forms of data integrity, asset operators are assured that both the time series data and the Element Graph are correct and representative of reality.

Data Egress

Once collected, organized into a digital twin and analyzed to ensure trust in the accuracy of the data, the final stage is making use of the digital twin you’ve built to unlock a whole range of analytics value.

There are two separate paths for exporting data: 1.) Batch mode, where the Digital Twin is periodically exported to HDFS for later consumption by your systems;  and 2.) Sttreaming mode, which continuously flows through the Element Graph Studio. These two modes are suitable for different applications. Batch is more useful for human scale problems involving analytics which people will use to make decisions (or “decision support”), while streaming supports lower latency uses suitable for control systems or real-time decision making.

With your digital twin made available in the forms and formats required by analytics tools, for use cases from Deep Learning based Predictive Maintenance to fast and easy access to relevant data for Health, Safety, and Environmental Incident investigation, you now have the data foundation you need to begin your digital transformation and achieve the analytics use cases that will drive improved profitability and reduced risk in your business.

Continuous Readiness

At this point, you’ve built your digital twin and it’s ready for wider use. But what happens when the underlying asset changes? As an asset operator, you know your physical environment is not static; it constantly evolves in response to a changing environment (regulatory, financial, and operational). In turn, the systems that collect data from those assets are not static. In the case of process data historians teams add new tags when new equipment is installed, change the names of existing tags, and tweak the details of tags that are already being recorded based on what occurs with the underlying asset. The constant data changes that result from the physical changes are also reflected in the dozens of other systems, outside of the instrumentation data, that most asset operators handle as well. These include everything from laboratory results, to work management and permitting, to safety management systems, ERP, MRP, MES, and perhaps too many other TLAs to mention.

This leads us to the final required piece of a the Operational Digital Twin architecture - continuous readiness  - the mechanism by which we ensure your digital twin remains ready for use despite these constant and wide sweeping changes. Continuous readiness is achieved by running each step of the process from Data Collection to Data Egress as often as new data becomes available, then notifying the asset operator of any detected issues either through unrecognized tags in the Pipeline process, or new issues with model or time series based Data Integrity.

Continuous Readiness closes the loop to digital twin management, allowing you to operate your assets using a digital twin that is always trustworthy and up-to-date.

What’s next?

Through this blog series, we’ve discussed what digital twins are, why they’re important for industrial organizations and asset operators, what their evolution will look like, and how we at Element enable you to create them today.

The digital twin will enable your digital transformation, improve your operation’s profitability and reduce its risk. The architecture outlined here allows you to build an Operational Digital Twin today and realize the value of the digital transformation faster and more reliably than any other solution in the market. It also acts as a stepping stone to the more advanced digital twins of tomorrow, setting you up for realizing sustained value deep into the future.