Process Data Generation to Accelerate Data Modelling

Digitalization of production processes is becoming a strategic roadmap for almost all manufacturing plants in every industrial sector. Data analytics and data modelling is a key aspect of the solutions that aim to optimize the production processes for higher productivity, better energy efficiency or lower scrap rates. While many times multiple data management technologies are available, very often, representative process data sets are missing or are incomplete to guarantee a reliable result after applying data modelling algorithms.

Within CAPRI project, MSI contributes to the development of the different CSS, Cognitive Steel Sensors, by generating reliable datasets. The original data set as extracted from the process was selected carefully to guarantee it was complete and representative. However, from modelling perspective point of view, a data set of a limited timeframe, doesn´t contain enough information for good and robust modelling. MSI used this data set to create a larger one with more process variation by applying a “data simulation process”.

For the process data simulation, an internally tool, PDG or Process Data Generator, developed in another EU project (ArrowHead Tools), was used. This PDG can start from existing historical data, deterministic data or a combination of both. Data can be repeated in time while being transformed, randomised or disturbed in such a way that the original process data is enriched to fit better the needs of the characteristics of the required dataset. It is developed in Node-Red technology and available as OSS in the Node-Red library.

Within the CAPRI project, the produced dataset of the CSS is made available to the data consumer in two formats: DIR, Data in Rest, and DIM, Data In Motion. To access the DIR, an API REST interface has been implemented in such a way that the consumer can obtain Time Series and SQL Data, via one unique web interface. By applying the correct parameters, the interface knows how to retrieve data from the underlaying databases. The DIR API is a great option during the engineering phase of the data model, since historical data can be accessed in a transparent way.

The DIM interface is implemented in two ways, as an OPC UA server and via a KAFKA connection. Both ways will allow the data consumer to obtain the synthetic data on line, in real time very close to a real world experience. This interface improves the training and validation of the mathematical model.

Having representative Process Datasets opens multiple opportunities for better Digital Manufacturing Projects since it gives the opportunity to gain process control, obtain better process insights and implement process optimizations before “touching” the process.