news

Digital transformation in industry: from data acquisition to data processing

For communication between a manufacturing asset and a data acquisition sensor to be effective and deliver benefits, it is crucial to have an in-depth understanding of the equipment itself. This communication enables real-time monitoring, predictive maintenance, process optimization, and automation.

Sensors are the “gateway” for information into the system, measuring various physical or chemical variables of manufacturing assets and the environment. As an example, in the mining sector, some of the variables collected may include:

  • Temperature (physical): overheating of engines in large mining trucks, excavators, crushers, and mills;
  • Flow and rate (physical): monitoring water flow for dust control, or in beneficiation processes such as flotation and leaching;
  • Air quality and gases (chemical): monitoring the concentration of toxic gases (methane, carbon monoxide, carbon dioxide) and oxygen in underground mines.
Flowchart model in a mining operation – Source: ResearchGate

The data collected by sensors can be in analog or digital format. Analog signals vary within a range, for example: 0-10V voltage, 4-20mA current. Digital signals, on the other hand, are discrete and may be represented as: on/off, 0/1.

The way data is transmitted from the sensor to the manufacturing asset (or to a control/monitoring system) depends on several factors, including distance, industrial environment, required transmission speed, and cost. Physical (wired) connection is the most traditional and robust method, where sensors are connected to controllers using communication protocols.

Wireless connection, meanwhile, is increasingly adopted due to its flexibility, ease of installation, and reduced cabling costs. The most common types include:

  • Wi-Fi: adapted for industrial use with more robust standards (e.g., Wi-Fi 6 for IIoT);
  • Bluetooth/BLE (Bluetooth Low Energy): for small and portable sensors;
  • Zigbee/Z-Wave: for a large number of distributed sensors;
  • LoRaWAN: for sensors deployed across expansive factory areas;
  • 5G: for mission critical industrial applications.

From data request to analytics: connecting the industrial ecosystem

Unlike wired systems, where each sensor requires a physical cable to transmit data, wireless communication relies on radio waves to send information. First, the sensor is directly attached to the asset or positioned to monitor a specific variable within its environment.

For example, in a mining environment, a fleet of trucks is equipped with various wireless sensors (engine temperature, tire pressure, fuel level). These sensors communicate with a gateway or access point using wireless technologies (industrial Wi-Fi, LoRaWAN, 5G, BLE).

The most common and robust method for collecting sensor data is through the consumption of RESTful APIs or message protocols such as MQTT. Many IIoT (Industrial Internet of Things) platforms expose equipment telemetry data via REST APIs. The process can be illustrated as follows:

  1. Authentication: obtain credentials (API keys, OAuth 2.0 tokens);
  2. Endpoint: identify the API URL for accessing sensor data;
  3. Request: make HTTP GET requests (for reading) to the endpoint;
  4. Parsing: process the response, typically in JSON or XML format.
				
					import requests

# --- Essential Settings ---
API_URL = "https://api.exemplo-mineracao.com/data/MINER_007/_temperature/latest"
# In a real-world scenario, API_URL would be the exact endpoint that returns the required data.
# Ex: https://api.sua_plataforma.com/v1/assets/MINER_007/sensors/_temperature?limit=1

try:
    # 1. Performs the HTTP GET request
    response = requests.get(API_URL)

    # 2. Checks if the request was successful (status 200 OK)
    response.raise_for_status() # Raises an exception for HTTP status codes 4xx/5xx

    # 3. Extracts the data from the JSON response
    # We assume the response is a simple JSON such as: {"value": 85.5, "unit": "C", "timestamp": "..."}
    data = response.json()
    temperatura = data.get("value")
    unidade = data.get("unit", "°C")

    # 4. Prints the result (f"Temperatura: {temperatura} {unidade}")

except requests.exceptions.RequestException as e:
    print(f"Error connecting to or retrieving data from the API {e}")
    if 'response' in locals() and response.text:
        print(f"API Response (for debugging): {response.text}")
except Exception as e:
    print(f"An unexpected error occurred: {e}")
				
			
Python Code – Sensor Data Collection via API

The Python script simulated retrieving the temperature reading from a mining asset sensor through a REST API using the requests library. In this context, there are several advantages to collecting data from a wireless sensor, such as:

  • Flexibility: easy installation and relocation of sensors;
  • Reduced cost: lower installation and maintenance expenses;
  • Access to remote locations: enables monitoring of assets in hard-to-reach or hazardous areas;
  • Mobility: essential for monitoring mobile equipment such as trucks, excavators, and drilling rigs.
  • Enhanced predictive maintenance: continuous data acquisition improves insight into equipment condition and anticipates failures.

Data cleansing and standardization: critical steps prior to analytics

After data collection, it is always necessary to clean the data, especially when working in industrial environments such as the mining sector. Known as “data cleansing,” this step is crucial for identifying and correcting errors and inconsistencies in raw datasets. This process ensures the quality, accuracy, and reliability of the information that will be used for analysis and decision-making.

Data cleansing involves several techniques, many of which can be implemented with Python:

  • Handling missing data: removal and imputation;
  • Detection and treatment of outliers: statistical thresholds, physical and operational limits, removal or replacement;
  • Noise smoothing: filtering;
  • Removal of duplicates;
  • Standardization and normalization;
  • Consistency checking.

The Pandas library is the most popular tool for data cleaning and manipulation in Python. The example below demonstrates some of the techniques mentioned:

				
					import pandas as pd

import numpy as np

#Example Data from a Temperature Sensor

data = {

    'timestamp': pd.to_datetime(['2024-07-24 08:00', '2024-07-24 08:01', '2024-07-24 08:02', '2024-07-24 08:03', '2024-07-24 08:04', '2024-07-24 08:05', '2024-07-24 08:06']),

    'temperature_C': [85.2, 85.5, 150.0, np.nan, 86.1, 86.3, 86.0]
        }

#Creating a DataFrame

df = pd.DataFrame(data)

print("--- Raw Data ---")

print(df)

#--- Data Cleaning Process ---

#Step 1: Outlier Detection and Handling (operational thresholds)

#Assuming that the engine temperature should never exceed 100°C.

#Values above this threshold are replaced with NaN.

df.loc[df['temperature_C'] > 90, 'temperature_C'] = np.nan

print("\n---After removing the outlier (150°C) ---")

print(df)

#Step 2: Handling Missing Values (NaN)

#Using interpolation to impute missing data points.

df['temperature_C'] = df['temperature_C'].interpolate(method='linear')

print("\n--- After interpolating the missing values ---")

print(df)

#Step 3: Data Smoothing (noise filtering)

#Applying a 3 point moving average filter to attenuate minor fluctuations.

df['temperature_C'] = df['temperature_C'].rolling(window=3, min_periods=1, center=True).mean()

print("\n--- Smoothed (cleaned) Data ---")

print(df)
				
			

In this case, the raw data contained an outlier (950°C) and a missing value (nan). In step 1, cleaning removed the outlier, then filled the gap with an interpolated value, and finally smoothed the time series to remove noise. Since data cleaning is an iterative process, it requires domain knowledge (understanding what the sensor measures and the expected behavior of the asset) to be performed effectively.

The next step is transforming these data to convert them into a more useful and structured format for analysis, modeling, and visualization. This process can also be carried out using the pandas library, as it offers a wide range of functions to clean, format, aggregate, and manipulate time series data:

1. Standardization for analysis: pandas + scikit-learn:

				
					import pandas as pd
from sklearn.preprocessing import MinMaxScaler

#Example DataFrame with temperature and pressure data
data = {'temperature': [85, 82, 88, 85, 87],
        'pressure_psi': [1050, 1065, 1055, 1070, 1060]}
df = pd.DataFrame(data)

#Create a scaler object
scaler = MinMaxScaler()

#Apply scaling to selected columns
df_scaled = pd.DataFrame(scaler.fit_transform(df), columns=df.columns)

print("--- Original Data ---")
print(df)
print("\n--- Scaled Data (range 0 a 1) ---")
print(df_scaled)
				
			
Example: normalizing temperature data to a scale from 0 to 1.

2. Aggregation: analysis of daily or weekly trends:

				
					import pandas as pd
import numpy as np

#Example DataFrame with timestamps at 1 minute granularity
timestamps = pd.date_range(start='2024-07-25 08:00', periods=60, freq='T')
temperatures = np.random.uniform(80, 90, 60)
df = pd.DataFrame({'timestamp': timestamps, 'temperature': temperatures})
df = df.set_index('timestamp')

Resampling (aggregation) to compute the mean every 15 minutes
df_aggregated = df.resample('15T').mean()

print(" --- Original Data (first 5 rows) --- ")
print(df.head())
print("\n --- Aggregated Data (mean every 15 minutes) --- ")
print(df_aggregated)
				
			
Calculating the average temperature every 15 minutes from data with 1-minute granularity.

3. Feature engineering: producing more informative columns for analysis:

				
					import pandas as pd

#Sample DataFrame with temperature data
data = {'temperature': [85.0, 85.5, 86.1, 87.5, 88.0, 87.2, 86.8]}
df = pd.DataFrame(data)

#Creates a new feature: "Rate of Change"
#The .diff() method computes the difference between a row and its previous row.
rate_of_change
df['rate_of_change' ] = df['temperature' ].diff()

#Creates another feature: 3 point "Moving Average"
#The .rolling(window=3).mean() method calculates the mean across a sliding window of 3 data points.
df['media_movel_3pts'] = df['temperature'].rolling(window=3).mean()

print(" --- DataFrame with New Features --- ")
print(df)
				
			
Calculating the temperature “rate of change” and the “moving average” to smooth out noise.

The examples demonstrate how pandas enables robust and efficient data transformation and preparation, making datasets ready for analysis and modeling. Just as with temperature data in mining operations, this structured process allows you to identify patterns, anticipate risks, and underpin technical and strategic decisions for the industry.

Learn more about ST-One.

Fique por dentro

Editorial Notícias

ST-One Logotipo

Baixe aqui o material completo e descubra como a ST-One já impactou positivamente parceiros em mais de 23 países.