\makeatletter
\@ifundefined{HCode}
{\documentclass[screen,CRMECA,Unicode,biblatex,published]{cedram}
\addbibresource{crmeca20250952.bib}
\newenvironment{noXML}{}{}%
\def\tsup#1{$^{{#1}}$}
\def\tsub#1{$_{{#1}}$}
\def\stbfrac#1#2{{(#1)}/{(#2)}}
\def\sbfrac#1#2{{#1}/{(#2)}}
\def\sfrac#1#2{{#1}/{#2}}
\def\thead{\noalign{\relax}\hline}
\def\endthead{\noalign{\relax}\hline}
\def\tbody{\noalign{\relax}\hline}
\def\xsection{}
\def\xtbody{\noalign{\relax}}
\def\tabnote#1{\vskip4pt\parbox{.95\linewidth}{#1}}
\def\xxtabnote#1{\vskip4pt\parbox{.96\linewidth}{#1}}
\usepackage{upgreek}
\usepackage[T1]{fontenc}
\usepackage{multirow} 
\newcounter{runlevel}
\let\MakeYrStrItalic\relax
\def\jobid{crmeca20250952}
\def\nrow#1{\@tempcnta #1\relax%
\advance\@tempcnta by 1\relax%
\xdef\lenrow{\the\@tempcnta}}
\def\morerows#1#2{\nrow{#1}\multirow{\lenrow}{*}{#2}}
\def\textemail#1{\href{mailto:#1}{#1}}
\csdef{Seqnsplit}{\\}
\def\botline{\\\hline}
%\graphicspath{{/tmp/\jobid_figs/web/}} 
\graphicspath{{./figures/}} 
\def\0{\phantom{0}}
\def\mn{\phantom{$-$}}
\def\refinput#1{}
\def\back#1{}
\let\Lbreak\break
\let\rmmu\upmu
\allowdisplaybreaks
\DOI{10.5802/crmeca.350}
\datereceived{2025-11-25}
\daterevised{2026-01-19}
\datererevised{2026-03-18}
\dateaccepted{2026-02-12}
\ItHasTeXPublished
\def\xmorerows#1#2{{#2}}
\def\rmpi{\uppi}
\def\xsubfig{\@ifnextchar[{\@xsubfig}{\@xsubfig[ph!]}}
\def\@xsubfig[#1]#2#3{%
 \global\@namedef{sfig@#2#3}{#2#3}%
 \begin{figure*}[#1]%
 \def\t@the@figure{#2}%
\vspace*{-3pt} 
 \centering\includegraphics{fig#2}
\vspace*{-3pt} 
 \caption{\ffigcaption\vspace*{6pt}Continued on next page.}
\vspace*{-5pt} 
 \end{figure*}}
 \def\figcnt{\def\thefigure{3 (cont.)}}
 \def\xfigcnt{\def\thefigure{4 (cont.)}}
 \def\xcaption#1#2{\caption{#2}}
}
{\documentclass[crmeca]{article}
\usepackage{upgreek}
\usepackage[T1]{fontenc} 
\def\CDRdoi{10.5802/crmeca.350}
\def\centering{}
\def\newline{\unskip\break}
\def\hyperlink#1#2{#2}
\def\hypertarget#1#2{#2}
\def\sfrac#1#2{{#1}/{#2}}
\def\sbfrac#1#2{{#1}/{(#2)}}
\def\stbfrac#1#2{{(#1)}/{(#2)}}
\def\href#1#2{\url[#1]{#2}}
\let\figcnt\relax
\let\xfigcnt\relax
\def\xcaption#1#2{\caption{#1~#2}}
\def\xsubfig[#1]#2#3{}
}
\makeatother

\dateposted{2026-04-15}
\begin{document}

\begin{noXML}

\CDRsetmeta{articletype}{research-article}

\title{A hybrid HMM-RBFNN system using relevant and non-redundant
features for bearing fault classification}

\alttitle{Syst\`{e}me hybride HMM-RBFNN  utilisant des
caract\'{e}ristiques pertinentes et non redondantes pour la
classification des d\'{e}fauts de roulements}

\author{\firstname{Miloud} \lastname{Sedira}\CDRorcid{0009-0007-7221-6045}\IsCorresp}
\address{LMPA, Ferhat Abbas University, Setif 19000, Algeria}
\email[M. Sedira]{miloudsedira@yahoo.fr}

\author{\firstname{Ahmed} \lastname{Felkaoui}\CDRorcid{0009-0002-1000-389X}}
\addressSameAs{1}{LMPA, Ferhat Abbas University, Setif 19000, Algeria}
\email[A. Felkaoui]{a\_felkaoui@yahoo.fr}

\keywords{\kwd{Bearing}
\kwd{Diagnosis}
\kwd{Fisher score}
\kwd{Hidden Markov model (HMM)}
\kwd{Principal component analysis (PCA)}
\kwd{Radial basis function neural network (RBF)}
\kwd{Wavelet packet transform (WPT)}}

\altkeywords{\kwd{Roulement}
\kwd{Diagnostic}
\kwd{Score de Fisher}
\kwd{Mod\`{e}le de Markov cach\'{e} (MMC)}
\kwd{Analyse en composantes principales (ACP)}
\kwd{R\'{e}seau de neurones \`{a} fonction de base radiale (RBF)}
\kwd{Transform\'{e}e en paquets d'ondelettes (WPT)}}

\begin{abstract} 
This paper presents an innovative hybrid approach that combines Hidden
Markov Models (HMM) with Radial Basis Function Neural Networks (RBFNN)
for the automatic classification of mechanical faults in rolling
element bearings using vibration signal analysis. The signals are
sourced from the well-established database
(\url{https://engineering.case.edu/bearingdatacenter/welcome}), 
widely used in
fault diagnosis research. Raw signals are preprocessed to extract
relevant features across time, frequency, and time--frequency domains,
including wavelet packet decomposition. To enhance classification
robustness and reduce computational complexity, dimensionality
reduction is performed using Principal Component Analysis (PCA),
complemented by Fisher score-based feature selection. HMMs are trained
to capture the temporal dynamics of the signals, while RBFNNs leverage
the reduced feature space for fine-grained classification. A
comprehensive performance comparison is conducted between standalone
HMM and RBFNN models, as well as their integration within the hybrid
HMM-RBFNN system. Experimental results demonstrate that the proposed
hybrid method significantly improves classification accuracy,
highlighting its potential for industrial predictive maintenance
applications.
\end{abstract} 

\begin{altabstract} 
Cet article pr\'{e}sente une approche hybride innovante combinant les
mod\`{e}les de Markov cach\'{e}s (HMM) et les r\'{e}seaux de neurones
\`{a} fonction de base radiale (RBFNN) pour la classification
automatique des d\'{e}fauts m\'{e}caniques des roulements \`{a} billes
\`{a} partir de l'analyse des signaux vibratoires. Ces signaux
proviennent de la base de donn\'{e}es
(\url{https://engineering.case.edu/bearingdatacenter/welcome}), 
largement utilis\'{e}e
dans la recherche en diagnostic de d\'{e}fauts. Les signaux bruts sont
pr\'{e}trait\'{e}s afin d'extraire les caract\'{e}ristiques pertinentes
dans les domaines temporel, fr\'{e}quentiel et temps-fr\'{e}quence,
notamment par d\'{e}composition en paquets d'ondelettes. Pour
am\'{e}liorer la robustesse de la classification et r\'{e}duire la
complexit\'{e} de calcul, une r\'{e}duction de dimensionnalit\'{e} est
effectu\'{e}e par analyse en composantes principales (ACP),
compl\'{e}t\'{e}e par une s\'{e}lection de caract\'{e}ristiques
bas\'{e}e sur le score de Fisher. Les HMM sont entra\^{i}n\'{e}s \`{a}
capturer la dynamique temporelle des signaux, tandis que les RBFNN
exploitent l'espace de caract\'{e}ristiques r\'{e}duit pour une
classification fine. Une comparaison de performances exhaustive est
men\'{e}e entre les mod\`{e}les HMM et RBFNN autonomes, ainsi que leur
int\'{e}gration au sein du syst\`{e}me hybride HMM-RBFNN. Les
r\'{e}sultats exp\'{e}rimentaux d\'{e}montrent que la m\'{e}thode
hybride propos\'{e}e am\'{e}liore significativement la pr\'{e}cision de
la classification, soulignant ainsi son potentiel pour les applications
de maintenance pr\'{e}dictive industrielle.
\end{altabstract} 

%\input{CR-pagedemetas}

\editornote{Submitted by invitation following the DTE-AICOMAS 2025
conference, held February 17--21, 2025}
\alteditornote{Soumis sur invitation suite au colloque DTE-AICOMAS
2025, qui s'est tenu du 17 au 21 f\'{e}vrier 2025}

\maketitle

\end{noXML}

\section{Introduction and related work}\label{sec1}
Rotating machines play a key role in industrial production. Once a
failure occurs, it can lead to significant losses in time and then in 
costs~\cite{4,20}. Often, these failures are caused by the
deterioration of bearings, which are considered one of the most common
and vulnerable components of rotating  
machines~\cite{27}\footnote{\url{https://engineering.case.edu/bearingdatacenter/welcome}.}. 
Indeed,
many rotating machines use bearings, including aircraft engines,
high-speed trains, helicopters, wind turbines, machine tools,
industrial robots, etc. It is significant that bearings are responsible
for 30\% or more of the overall failures of rotating 
machines~\cite{1,29}. Indeed, condition monitoring, diagnosis, and
identification of bearing faults constitute an important guarantee to
improve the reliability of rotating machines and industrial 
equipment~\cite{19}. For over two decades, great importance has been
given to research on methodologies for automating the diagnosis of
mechanical faults in rotating machines, based on the analysis of
vibration signals, acquired using well-suited sensors installed near
the mechanical elements to be  monitored~\cite{12,32}. Indeed, fault
detection and diagnosis can be considered a pattern recognition problem
related to the state of rotating equipment; they can be divided into
three phases: data acquisition, feature extraction, and fault
classification, with the latter two being  priorities~\cite{7,45}. Given
that the information contained in vibratory signals is complex and
variable, it is important for diagnosis to accurately extract the
intrinsic characteristics of all fault  types~\cite{25}. The actually
collected signal often contains noise, and only by suppressing this
noise can the useful information in the original signal be represented
and exploited  effectively~\cite{4,14}. Signal processing is one of the
most commonly used methods for the second stage, typically using
time-domain, frequency domain, or time--frequency  analysis~\cite{29}.
Regarding classification, different algorithms can then be used to
classify the faults according to the predefined  classes~\cite{45}. The
techniques for diagnosing mechanical faults in rotating machines
currently available present various limitations. A robust and effective
method must be sought, and an automated system must be developed for
diagnostic  activities~\cite{2}. Indeed, recently, the intelligent
fault diagnosis approach is widely adopted in rotating machine
condition monitoring systems, especially with the advent of Industry
4.0 technologies. Throughout the literature, we notice that the second
stage plays a more important role; it aims to extract representative
features from the original  signals~\cite{11}. Similarly, many studies
have focused on how to extract relevant and effective features from
measured signals in the time domain, based on various signal processing
techniques~\cite{14}. In general, time-domain analysis calculates
statistical parameters such as root mean square (RMS), kurtosis,
structural resonances, etc. Frequency domain analysis is often
advantageous, as it allows for easy isolation and identification of key
frequency components. A commonly used tool is the Fast Fourier
Transform (FFT), as well as FFT-based methods, spectral analysis
methods, etc. Time--frequency analysis extends the capabilities of
frequency analysis to non-stationary vibration signals and includes
methods such as the Short-Time Fourier Transform, Wavelet Transform,
Empirical Mode Decomposition (EMD), Hilbert-Huang Transform (HHT),
methods associated with EMD, etc. With the development of nonlinear
dynamic theory, many entropy-based estimation methods offer useful
alternative approaches for extracting features related to faults hidden
in vibration signals and applying them to bearing fault 
detection~\cite{19}. Features extracted from vibration signals may
include redundant or insensitive information; therefore, some
dimensionality reduction strategies are applied, such as Principal
Component Analysis  (PCA)~\cite{6}, Linear Discriminant Analysis (LDA),
the distance evaluation technique, and Kullback--Leibler (K--L)
divergence, or class separation tools such as the Fisher Score (FS). As
for classification methods, they implement a mapping of the fault
feature vector using as inputs the features or descriptors established
in the second stage. With the dizzying development of computing means,
this aspect has seen rapid advancement; indeed, researchers and
engineers in the field have striven to implement a plethora of
algorithms and methods ranging from statistical/probabilistic methods
such as Gaussian Mixture Models (GMM), Bayesian inferences, Hidden
Markov Models (HMM) to artificial intelligence methods like Multilayer
Perceptron (MLP), Radial Basis Function Neural Networks (RBFNN),
Support Vector Machines (SVM),\unskip\break  
Deep Learning (DL), etc.

To support what has just been said, we can cite some relevant landmark
work from the last two decades; namely  that~\cite{29} implemented a
procedure for diagnosing bearing faults using three classifiers, namely
Multilayer Perceptron (MLP), Radial Basis Function Neural Networks
(RBFNN), and Probabilistic Neural Networks (PNN). Feature selection and
optimization were performed by Genetic Algorithms (GA) from statistical
data extracted from vibration signals in the time domain; appreciable
results were  recorded~\cite{34} used statistical features extracted
from the time domain as input data for a feedforward neural network for
bearing fault  classification~\cite{20} 
\mbox{demonstrated} the performance of
SVM compared to vector quantization and self-organizing maps, 
\mbox{after} a
\mbox{comparison}
performed for bearing fault diagnosis. Fault features were
extracted from vibration signals by wavelets, and for the selection of
the most relevant features, a statistical methodology based on the
minimum Shannon entropy criterion was  employed~\cite{36} conducted a
comparison between RBFNN and traditional ANN. Indeed, an application of
4 practical examples was implemented. Following this comparison, they
concluded that RBFNN is specially recommended for function
approximation problems, while for classification problems, traditional
neural networks can achieve better results with more efficiency for RBF
concerning learning when the number of data is  optimal~\cite{16}
presented a method for diagnosing bearing faults from multiple sources.
Thus, Multiple Frequency Energy Spectrum (MFES) was used for feature
extraction, to improve recognition accuracy and reduce uncertainty.
Fault classification was performed by an  RBFNN~\cite{44} showed after
a survey on RBFNN that they have an appreciable generalization capacity
and are recommended for the approximation and classification of
nonlinear functions; which constitutes a good alternative to 
MLP~\cite{5} used the Continuous Wavelet Transform (CWT) (Meyer
wavelet) for extracting bearing fault features and SVM for the
classification task. Results achieved 100\%  accuracy~\cite{3}
presented a bearing fault diagnosis system based on MLP. They
introduced a vibration feature selection based on automatic Bayesian
relevance determination, knowing that they used multiple data sources.
MLP were dedicated to the classification  task~\cite{12} defined a new
approach to characterize bearing degradations and simplify the
prognostic task into a classification task. Indeed, they used the
Weibull distribution for feature selection and artificial neural
networks for fault classification; an application on bearings was
performed to validate this  proposal~\cite{25} proposed a feature
fusion model based on a Probabilistic Neural Network (PNN) and entropy.
Thus, three types of entropy were extracted from vibration signals, in
the time domain, frequency domain, and time--frequency domain. These
entropies served as inputs to the classification system based on a PNN 
network~\cite{6} proposed a bearing diagnosis system based on SVM and
PCA. Feature extraction from vibration signals was performed by Wavelet
Packet Transform (WPT), feature selection by PCA, and classification by
SVM and ANN of two types (BP and RBFNN). They concluded that PCA
effectively eliminates redundant features, which ensures the
performance and reliability of SVM for fault classification relative to
the two types of  ANN~\cite{23} presented a method for diagnosing
bearing faults based on Convolutional Neural Networks (CNN); feature
extraction from the vibration signal in the time--frequency domain was
executed by Wavelet Packet Transform (WPT), then transformed into a
2-dimensional space (image), to subsequently serve as inputs to the CNN
for fault  classification~\cite{2} extracted fault features from
bearing vibration signals by Wavelet Packet Energy Entropy (WPEE) to
form a feature vector, in order to subsequently serve as inputs to the
classification and severity level identification system, namely the
multi-class Relevance Vector Machine  (mRVM)~\cite{31} performed a
comparison between Artificial Neural Networks (MLP) and Hidden Markov
Models (HMM) for bearing fault classification, using time-domain,
frequency-domain, and time--frequency (wavelets) features; indeed, they
demonstrated that the performance of MLP diminishes with the increase
in the number of features, while HMM remain very robust, which revealed
an important  performance~\cite{2} implemented a technique for
diagnosing bearing faults under variable speed conditions, via an
improved demodulation spectrum technique using wavelet packets for
feature extraction and the Self-Organizing Map (SOM) method for
dimensionality  reduction [OO] opted for a new approach for the
diagnosis and prognosis of bearing faults, based on a multi-feature
Hidden Markov Model (HMM). First, the time domain, frequency domain,
and wavelet packet decomposition are used to extract state features
from bearing vibration signals. The PCA method is then used to reduce
their dimensionality. The results reveal that the established approach
allows for effective fault diagnosis and estimation of the bearing's
remaining useful  life~\cite{15} proposed a methodology for diagnosing
bearing and gear faults, capable of identifying nine different health
state categories; healthy and faulty, 
\mbox{under~variable} and noisy load. A
deep neural network is used for fault classification into various
categories. Robust features such as semi-variance, spectral kurtosis,
and Shannon entropy were  used~\cite{28} presented a DC motor diagnosis
system based on vibration signals; three health states were selected
(healthy, incipient fault, and severe fault). Feature extraction was
performed by Continuous Wavelet Transform (CWT), dimensionality
reduction was made by PCA, while fault classification was performed by
K-Nearest Neighbors  (KNN)~\cite{41} presented a method for diagnosing
bearing faults, based on a Convolutional Neural Network (CNN). Feature
extraction from the vibration signal in the time--frequency domain was
performed by wavelets, then a transformation to a 2-dimensional space
(image) was also performed to compose the dataset of inputs to the
fault classifier based on Convolutional Neural Networks (CNN).

Recently, many researchers and engineers have turned to the hybrid
classification approach using two different classification methods to
improve diagnostic performance. In this context, we can cite the work 
of~\cite{14} who developed a hybrid system using LR-type fuzzy logic
and ANN to diagnose bearing faults. Wavelet decomposition was employed
for feature extraction from vibration signals emitted by the bearings
in  question~\cite{22} presented a method for classifying ball bearing
faults based on an FMM-RF hybridization technique, composed of the FMM
(Fuzzy Min-Max) neural network and the RF (Random Forest) model. They
tested the model on real data, consisting of power spectrum and entropy
features. A performance rate of 99.81\% was  achieved~\cite{21}
developed a hybrid GHMM-CM method using reduced decomposition features
for the recognition and classification of defect types and severity
levels. The vibration signal containing defect information was
decomposed into several modal components using the VMD method, in which
the generalized balancing parameter provides a concise representation
of random and epistemic uncertainties. The PCA technique was applied to
reduce the dimensionality of the features. Experimental results show
that the proposed hybrid GHMM-CM method is more accurate and reliable.
Validation of this algorithm was performed on the  
database (see Footnote~1), Ref.~\cite{40}
proposed a novel hybrid deep learning method (NHDLM), based on extended
deep convolutional neural networks with wide first-layer kernels
(EWDCNN) and Long Short-Term Memory (LSTM), for complex environments,
which improved the performance of feature classification and offered
the best performance and identification accuracy for diagnosing
rotating machinery  failures~\cite{29} implemented a new method for
diagnosing bearing faults based on a hybrid model of Convolutional
Neural Networks (CNN) and Multilayer Perceptron (MLP). This model
simultaneously processes input data of different types and consists of
two blocks: MLP for processing numerical inputs and CNN for processing
HHT images. They demonstrated that the proposed hybrid model is
superior to CNN and MLP models separately.

In conclusion, we find that the hybrid classification approach is more
performant than a classifier used separately, that RBFNN, due to the
simplicity of their topological structure, their elegant mathematical
formulation anchored in classical functional analysis, have proven
their rapid convergence, their nature of convex optimizations involved,
thus they possess an appreciable generalization capacity, and are
highly recommended for the approximation and classification of
nonlinear  functions~\cite{3,25} and that HMM are very robust for
modeling time  series~\cite{27}. This has motivated us to make our
contribution, which consists of implementing a hybrid HMM-RBFNN system
for the classification of bearing faults.

The originality of our contribution lies in the hybrid system that
combines the two previous approaches by exploiting the log-likelihoods
from the HMM as additional descriptors to feed the RBFNN. This strategy
aims to improve the robustness of the classifier by integrating dynamic
information (from HMM) with the statistical features from PCA. To
ensure a fair evaluation, all models were trained and tested on 
the dataset (see Footnote~1),
which has become a reference in the field for
testing bearing diagnosis algorithms.

The remainder of this document is described as follows: In 
Section~\ref{sec2}, we briefly recall the theoretical foundations of
the tools used, namely, statistical tools, frequency and time--frequency
analysis tools, RBFNN, and HMM. In  Section~\ref{sec3}, we present the
experimental data used.  Section~\ref{sec4} is dedicated to the
extraction of vibration features and the composition of the input data
for the automatic diagnosis system.  Section~\ref{sec5} is reserved for
the design of the diagnosis system, while the discussion of the results
is listed in  Section~\ref{sec6}. A conclusion is presented at the end
of this article.

\section{Apparatus and experimentation}\label{sec2}

\subsection{Test bench}\label{sec2.1}
The validation of our study was based on the exploitation of
experimental data from the Bearing Data Center of Case Western Reserve
University (CWRU).  Figure~\ref{fig1} illustrates the experimental
setup. The vibration signals of the bearing were collected for three of
the three elements of the bearing, namely; the inner ring (IR), the
rolling element (Ball) and the outer ring  (OR) (see Footnote~1),
this dataset is a widely recognized benchmark in
bearing fault detection research. It has served as the experimental
basis for hundreds of scientific publications, owing to the diversity
of simulated faults and the high quality of the acquired measurements. 
Indeed~\cite{39}, through a detailed analysis using classical signal
processing methods, demonstrated that these data are easily
diagnosable, making them particularly relevant for comparative
evaluation of various diagnostic and classification methods and 
algorithms (see Footnote~1). 

The test bench used for data collection illustrated in 
Figure~\ref{fig1} includes: a 2-horsepower motor (on the left), a
torque sensor/encoder (in the center), a dynamometer (on the right), an
electronic control box (not shown).The test bearings are mounted on the
motor shaft. Localized faults were introduced into the bearings using
electrical discharge machining (EDM) with diameters of 7, 14, 21, 28,
and 40 mils (1~mil ${=}$ 0.001 inch). SKF bearings
(Table~\ref{tab1}) were used for faults of
7, 14, and 21 mils, while NTN bearings were used for the 28 and 40 mils
faults. The defective bearings were then reinstalled on the test bench,
and vibration signals were recorded under various load conditions
ranging from 1 to 3 horsepower, corresponding to rotational speeds
between 1796 and 1720 rpm. Vibration signals were acquired at a
sampling frequency of 12.000~Hz on the fan end and 48.000~Hz on the
drive end. Measurements include both healthy and defective bearings
exhibiting different types of faults (Figure~\ref{fig2}). For more
technical details, readers are referred to the official documentation
(see Footnote~1).

\begin{table}[h!]%t3
\caption{\label{tab1}Geometric characteristics of the SKF 6205-2RS JEM
bearing}
\begin{tabular}{cc}  
\thead
Types & 
Values \\ 
\endthead
Rolling element number & 9 \\ 
Ball diameter $(d)$ & 312.6~mm \\ 
Pitch diameter $(D)$ & 1537~mm \\ 
Contact angle $(\theta)$ & 0 
\botline 
\end{tabular}
\end{table}

\begin{figure}
\includegraphics{fig01}
\caption{\label{fig1}Experimental setup of the rolling bearings for
Case Western Reserve  University (see \mbox{Footnote~1}).}
\end{figure}

\subsection{Data used}\label{sec2.2}
The primary purpose of our contribution is practical; our objective is
to design a system compatible with the main ISO standards used in
vibration monitoring. Specifically, we are addressing fault
classification and alert thresholds for predictive maintenance of
rotating machinery, according to the ISO series, notably ISO 13370 (for
alert thresholds and severity assessment) and the ISO 13374 series (for
the processing and interpretation of diagnostic data). These standards
define four~(\ref{eq4}) severity zones (A, B, C, D) as defined by 
Table~\ref{tab2}. However, for a more rigorous approach to
condition-based maintenance, ISO 10816 only provides for three classes;
this case is also covered by our study. Indeed, the bearing vibration
signals are annotated according to the type of defect (inner ring,
outer ring, ball) and its severity. Four operating states are defined:
a normal state and three levels of degradation corresponding to
localized defects of increasing size (0.18~mm, 0.36~mm, and 0.53~mm).
This structure allowed for the creation of three distinct datasets,
each dedicated to a defective component. This approach facilitates the
development and evaluation of specialized diagnostic models for each
bearing element.

\begin{table}%t1
\caption{\label{tab2}Alert thresholds and severity assessment}
\begin{tabular}{llcc}
\thead
Zone & 
Machine health status & 
\parbox[t]{3.5cm}{\centering
Presence and evolution or severity of the defect} & 
\parbox[t]{1.5cm}{\centering
Assigned class}
\vspace*{2pt}\\ 
\endthead
A & 
\parbox[t]{6.5cm}{\raggedright
The machine is generally considered new and in good condition} & 
No defects (healthy) & 1 
\vspace*{2pt}\\ 
B & 
\parbox[t]{6.5cm}{\raggedright
Defect zone acceptable for long-term operation} & 
Defect 1 & 2 
\vspace*{2pt}\\ 
C & 
\parbox[t]{6.5cm}{\raggedright
Defect zone not suitable for continuous long-term operation. Corrective
measures must be implemented} & 
Defect 2 & 3 
\vspace*{2pt}\\ 
D & 
\parbox[t]{6.5cm}{\raggedright
Vibration levels are considered severe enough to cause damage to the
machine. Operation must be stopped as soon as possible} & 
Defect 3 & 4
\vspace*{2pt}
\botline
\end{tabular}
\end{table}

The states corresponding to defects of 0.71~mm and 1.02~mm were
intentionally excluded because they were produced on a different
bearing type (NTN), while the first three defects were machined on SKF
bearings to eliminate the potential influence of the bearing type
(technology, materials) on the quality of the vibration signal. This
ensures a more reliable comparison between the classes and also
complies with the aforementioned ISO standards. The analysis focuses
exclusively on signals sampled at 12~kHz, covering all types of bearing
elements (balls, outer ring, inner ring), as shown in the  table below.

All signals used in this study were recorded under steady-state
conditions at a constant speed of 1797 rpm (approximately 29.95~Hz).
Several signal segments of equal duration were extracted for each class
to create a balanced and homogeneous dataset for analysis and
classification.

\begin{figure}
\includegraphics{fig02}
\caption{\label{fig2}Temporal signal of the ball in the 4 states
(healthy, defect 1, defect 2 defect 3).}
\end{figure}

\begin{table}%t2
\caption{\label{tab3}Constitution of data sets}
\begin{tabular}{ccccc}  
\thead
\xmorerows{1}{Class} & \xmorerows{1}{Bearing condition} & 
\multicolumn{3}{c}{Fault location}\\\cline{3-5}
 & & Outer race & Inner race & Rolling element (Ball) \\ 
\endthead
1 & 
Healthy  & 
``OR'' set & 
``IR'' set & 
``Ball'' set \\ 
2 & 
Defect 1 (0.007$^{\prime\prime}$) & 
``OR'' set & 
``IR'' set  & 
``Ball'' set \\ 
3 & 
Defect 2 (0.014$^{\prime\prime}$) & 
``OR'' set & 
``IR'' set  & 
``Ball'' set \\ 
4 & 
Defect 3 (0.021$^{\prime\prime}$) & 
``OR'' set & 
``IR'' set & 
``Ball'' set 
\botline 
\end{tabular}
\end{table}

\section{Features extraction and analysis}\label{sec3}

\subsection{Signal preprocessing}\label{sec3.1}
Before training the classification models, rigorous data preprocessing
is essential to ensure the quality of the results and reduce biases
related to scale or descriptor redundancy. The raw vibration signals
from the database (see Footnote~1)
are first preprocessed by dividing them
into segments of equivalent duration to ensure representativeness and
homogeneity of the samples for each class studied. All signals
considered are sampled at 12~kHz. Each segment constitutes a unique
observation, associated with a class corresponding to the bearing
condition (healthy or with a given defect) (Table~\ref{tab3}).

\subsection{Features extraction}\label{sec3.2}
To discriminately model vibration signals from bearings, 46 features
were extracted, divided into three domains: time domain, frequency
domain, and time--frequency domain. The following equations provide a
formal understanding of each indicator.

\subsubsection{Temporal features}\label{sec3.2.1}
An initial signal exploration phase was conducted to gain qualitative
insights into signal behavior across different defect classes.
Time-domain plots of representative signals were examined to identify
visible trends and irregularities. A first phase of signal exploration
was carried out in order to obtain qualitative information on its
behavior according to different classes of defects. The time traces of
representative signals  (Figure~\ref{fig3}) were examined in order to
identify visible trends and irregularities. In order to obtain
measurable indications of this information, we opted to calculate the
statistical indicators below. A discrete signal is considered
$x=\{x_{1},x_{2},\ldots ,x_{N}\}$, where: $N$ is the total number of
samples, $x_{i}$ is the signal value at the time, $\mu $ is the mean of
$x$ and $\sigma$ is the standard deviation.

The statistical indicators considered relevant to describe the
qualitative information contained in the signal $x_i$ 
are~\cite{18,38}:
{\begin{eqnarray}
&&\mathrm{Mean}{:}\quad
\mu=
\frac{1}{N}
\sum_{i=1}^{N}x_{i}
\label{eq1}\Seqnsplit
&&\mathrm{Variance}{:}\quad
\sigma ^{2}=
\frac{1}{N}
\sum_{i=1}^{N}
(x_{i}-\mu)^{2}
\label{eq2}\Seqnsplit
&&\text{Standard deviation}{:}\quad
\sigma =
\sqrt{\frac{1}{N}
\sum_{i=1}^{N}
(x_{i}-\mu)^{2}}
\label{eq3}\Seqnsplit
&&\text{RMS (Root Mean Square)}{:}\quad
\mathrm{rms}=
\sqrt{\frac{1}{N}
\sum_{i=1}^{N}x_{i}}
\label{eq4}\Seqnsplit
&&\text{Crest Factor}{:}\quad
\mathrm{Cr}=\frac{\max (| x_{i}| )}{\mathrm{rms}}
\label{eq5}\Seqnsplit
&&\text{Skewness (asymmetry)}{:}\quad
\mathrm{Skewness}=
\frac{\sum_{i=1}^{N}
(x_{i}-\mu)^{3}}{\sigma ^{4}}
\label{eq6}\Seqnsplit
&&\text{Kurtosis (flattening)}{:}\quad
\mathrm{Kurtosis}=
\frac{\sum_{i=1}^{N}
(x_{i}-\mu)^{4}}
{\sigma ^{4}}
\label{eq7}
\end{eqnarray}}\unskip
Moments of order 4 and 5 
{\begin{eqnarray}
m_{4} &=&
\frac{1}{N}
\sum_{i=1}^{N}{x}_{i}^{4}
\label{eq8}\Seqnsplit
m_{5} &=&
\frac{1}{N}
\sum_{i=1}^{N}{x}_{i}^{5}
\label{eq9}
\end{eqnarray}}\unskip

Therefore, in the time domain, we calculated nine~(\ref{eq9})
statistical indicators for the three (03) sets of data of the
constituent elements of the bearings (inner race, outer race and
rolling element (ball)).

\subsubsection{Frequency features}%\label{sec3.2.2}
Subsequently, a frequency-domain analysis was performed using standard
techniques such as Fast Fourier Transform (FFT), envelope spectrum
analysis, and Power Spectral Density (PSD). These methods revealed
distinct frequency components associated with bearing faults. Fault
characteristic frequencies namely, Ball Pass Frequency of Inner Race
(BPFI), Ball Pass Frequency of Outer Race (BPFO) and Ball Spin
Frequency (BSF) were calculated based on the mechanical configuration
of the test bench and overlaid on the spectral plots for 
validation~\cite{17}. These preliminary results reaffirm the
effectiveness and accessibility of classical signal processing methods
for fault diagnosis, in line with prior studies such  as~\cite{40}.
The frequency domain, obtained via the Fourier transform, provides
valuable information on the energy distribution in the spectrum. Five
characteristics are extracted: the dominant frequency, the average
frequency, the bandwidth, the spectral energy, and the spectral
kurtosis. The latter, in particular, is sensitive to abnormal energy
peaks in the spectrum, often associated with localized defects in the
bearings. The discrete Fourier transform (DFT) of the signal $x$ gives
the spectrum $\chi (f_{k})$  for $k=1,2,\ldots ,N_{f}$, where $f_k$ is
the $k$th discrete frequency and $S(f_{k})$  is the power spectral
density defined by~\cite{45}:
{\begin{eqnarray}
S(f_{k}) &=&
| \chi (f_{k})| ^{2}
\label{eq10}\Seqnsplit
\chi (f_{k}) &=&
\sum_{n=0}^{N-1}x
(n)\mathrm{e}^{-\mathrm{j}2\rmpi 
\frac{kn}{N}}
\label{eq11}
\end{eqnarray}}\unskip

\def\ffigcaption{\label{fig3}}
\xsubfig[ph!]{03a}{}
\figcnt
\begin{figure*}[t!]
\includegraphics{fig03}
\xcaption{\label{fig3}\ffigcaption}{Time signals and their  spectra.}
\end{figure*}

\def\figcnt{\setcounter{figure}{3}\def\thefigure{\arabic{figure}}}
\figcnt

The features are~\cite{40}:
{\begin{eqnarray}
&&\text{Dominant frequency}{:}\quad
f_{\mathrm{dom}}=
\mathrm{argmax}(f_{k})
\label{eq12}\Seqnsplit
&&\text{Average frequency}{:}\quad
f_{\mathrm{avr}}=
\frac{\sum_{k=1}^{Nf}fkS(fk)}
{\sum_{k=1}^{Nf}S(fk)}
\label{eq13}\Seqnsplit
&&\mathrm{Bandwidth}{:}\quad
\mathrm{BW}=
\frac{\sum_{k=1}^{Nf}
(fk-f\mathrm{avr})^{2}S(fk)}
{\sum_{k=1}^{Nf}S(fk)}
\label{eq14}\Seqnsplit
&&\text{Spectral energy}{:}\quad
E=\sum_{k=1}^{Nf}S(fk)
\label{eq15}\Seqnsplit
&&\text{Spectral Kurtosis}{:}\quad
\mathrm{Kurtf}=
\frac{1}{N}
\sum_{k=1}^{Nf}
\left(\frac{s(fk)-\mu f}{\sigma f}\right)^{4}
\label{eq16}
\end{eqnarray}}\unskip
$\mu f$  and $\sigma f$ are the spectral mean and standard  deviation
in the frequency domain, we extract five~(\ref{eq5})  important
characteristics which are; the dominant frequency, the average
frequency, the width of the frequency bandwidth, the spectral energy
and the spectral kurtosis.

To calculate the fault frequencies of a bearing, formulas must be used
that depend on the geometry of the bearing and its rotational speed.
These frequencies, called characteristic fault frequencies
(Table~\ref{tab4}), are: the
outer ring frequency (BPFO), the inner ring frequency (BPFI) and the
rolling element frequency (BSF) according to the following formula 
as~\cite{20}:
{\begin{eqnarray}
&&\text{Inner race defect frequency}{:}\quad
f_{\mathrm{id}}=
\frac{N\cdot f_{\mathrm{r}}}{2}
\left(1+\frac{d}{D}\cos (\theta)\right)
\label{eq17}\Seqnsplit
&&\text{Ball defect frequency}{:}\quad
f_{\mathrm{bd}}=
f_{\mathrm{r}}
\frac{D}{d}
\left(1-\left(\frac{d}{D}\cos (\theta)\right)^{2}\right)
\label{eq18}\Seqnsplit
&&\text{Outer race defect frequency}{:}\quad
f_{\mathrm{od}}=
\frac{N\cdot f_{\mathrm{r}}}{2}
\left(1-\frac{d}{D}\cos (\theta)\right)
\label{eq19}
\end{eqnarray}}\unskip
where $f_{\mathrm{r}}$ is the shaft rotation frequency,  $d$ is the ball diameter, 
$D$ is the pitch diameter,  $N$ is the number of rolling elements  and
$\theta $ is the contact angle. Applying these formulas to our case, we
found the values below.

\begin{table}%t4
\caption{\label{tab4}Bearing defect frequencies under the rotating
speed of 1796 rpm\vspace*{2pt}}
\begin{tabular}{ccc}  
\thead
Signal & 
Types & 
Values \\ 
\endthead
Healthy signal & 
Shaft rotation frequency $(f_{\mathrm{r}})$ & 
29.93~Hz \\ 
Signal 0.021$^{\prime\prime}$ & 
Inner defect frequency $(f_{\mathrm{id}})$ & 
151~Hz \\ 
Signal 0.021$^{\prime\prime}$  & 
Outer defect frequency $(f_{\mathrm{od}})$ & 
119~Hz \\ 
Signal 0.014$^{\prime\prime}$  & 
Ball defect frequency $(f_{\mathrm{bd}})$ & 
126~Hz
\botline 
\end{tabular}
\vspace*{4pt}
\end{table}

\subsubsection{Time--frequency features}%\label{sec3.2.2}
In order to capture the temporal evolution of the signal's frequency
components, we used the Wavelet Packet Transform (WPT) to obtain an
accurate time--frequency representation of our signals. This method
offers the advantage of providing an optimal joint resolution in time
and frequency. The choice fell on a Daubechies wavelet of order 5
(db5), offering a good compromise between regularity and temporal
support. The decomposition was carried out up to level 5, thus
generating 32 terminal nodes (2$^{5}$), each corresponding to a
specific frequency sub-band. To extract meaningful information and
reduce the dimensionality of the data, we calculated the energy
associated with each terminal node. The energy  $E_i$ of node $i$ is
defined by: 
energy of node~\cite{20}.
{\begin{equation}\label{eq20}
E_{i}=
\sum_{i=1}^{N}
| x_{i}(k)| ^{2},\quad
k=1,2,\ldots ,32
\end{equation}}\unskip
where $E_{i}$ is the energy of node $i$  and $x_{i}(k)$  represents the
$k$th wavelet coefficient of this node. This approach allows to obtain
a vector of 32 energy characteristics, revealing the spectral
distribution of the signal and its evolution over time. These energies
reflect the time--frequency distribution of the signal, useful for
detecting transient faults.

Finally, we obtained 32 characteristics representing the energies in
nodes, which gives us thirty-two~(\ref{eq32}) time--frequency
characteristics.

The time--frequency analysis was completed by examining spectrograms and
kurtograms. The spectrogram (obtained by short-term Fourier transform,
STFT) allows visualization of the evolution of spectral power over
time, providing a general map of signal activity. The kurtogram, for
its part, is a powerful tool for optimally locating transient and
impulsive frequency bands, characteristic of mechanical defects. The
joint analysis of these two representations aims to detect the presence
of anomalies, identify the frequency band carrying the defect
information and, ultimately, reveal the characteristic frequencies
associated with the damage\break  (Figures~\ref{fig7}--\ref{fig11}).

\subsection{Features engineering}\label{sec3.3}
The feature engineering phase produced a total of 46 descriptors per
signal sample: 9-time parameters, 5 frequency parameters and 32
energies resulting from the wavelet packet decomposition. These
features were structured in a matrix of dimensions  $N\times M$, where 
$N$ is the number of observations and  $M = 46$ the number of
variables (Table~\ref{tab5}). Each observation is also associated with a label of; class
(the type of defect); signal indication and rotation speed 
(Table~\ref{tab5}). This matrix is used as an input dataset for the
classification algorithm. We propose for this task a novel hybrid
model, merging Hidden Markov Models (HMM) with a Radial Basis Function
Neural Network (RBFNN). The validation of this approach was carried out
on three distinct datasets, each corresponding to a specific type of
defect: outer race (OR\_Race), inner race (Inner\_Race) and rolling
element (Ball).

\begin{table}%t5
\caption{\label{tab5}Data table}
\begin{tabular}{cccccc}  
\thead
Signal label & 
\parbox[t]{1.5cm}{\centering
Signal class} & 
\parbox[t]{1.5cm}{\centering
Statistical features}  & 
\parbox[t]{1.5cm}{\centering
Frequency features} & 
\parbox[t]{2.5cm}{\centering
Time--frequency features} & 
Rpm  
\vspace*{2pt}\\ 
\endthead
X97\_ DE\_time & 
1 & 
1, 2, {\ldots}, 10 & 
1, 2, {\ldots}, 5 & 
1, 2, {\ldots}, 32 & 
1796 \\ 
X97\_ FE\_time & 
1 & 
1, 2, {\ldots}, 10 & 
1, 2, {\ldots}, 5 & 
1, 2, {\ldots}, 32 & 
1796 \\ 
{\ldots} & 
{\ldots} & 
{\ldots} & 
{\ldots} & 
{\ldots} & 
{\ldots} \\ 
X105\_BA\_time & 
2 & 
1, 2, {\ldots}, 10 & 
1, 2, {\ldots}, 5 & 
1, 2, {\ldots}, 32 & 
1797 \\ 
{\ldots} & 
{\ldots} &  &  &  &
\botline 
\end{tabular}
\end{table}

\subsection{Data analysis and preparation}\label{sec3.4}

\subsubsection{Class separability---Fisher score}\label{sec3.4.1}
To assess the ability of extracted features to discriminate between
different classes, we use the Fisher score, a classic measure of
inter-class separability. The Fisher score is particularly relevant in
supervised classification because it highlights the proportion of
inter-class variance relative to intra-class variance. For a given
feature $x_{i}$, the Fisher score is defined as  follows~\cite{13,24}:
{\begin{equation}\label{eq21}
F(x_{i})=
\frac{\sum_{c=1}^{C}N_{c}(\mu_{c}-\mu)^{2}}
{\sum_{c=1}^{C}\sum_{j=1}^{N_{c}}(\mu_{c,j}-\mu_{c})^{2}}
\end{equation}}\unskip
where $C$ is the number of classes, $N_{c}$ the number of samples in
class $c$, $\mu_{c}$  the mean of the feature in class $c$ and ${\mu}$
the overall mean.

This score is calculated for each extracted feature. A high Fisher
score indicates better separability of the class concerned by the
feature studied. Thus, the features with the highest Fisher scores are
the most discriminating and therefore the most useful for supervised
learning. These scores can also guide dimensionality reduction.
Indeed, this procedure allowed us to select 
ten~(\ref{eq10}) features with the highest Fisher scores.

\subsubsection{Dimensionality reduction}\label{sec3.4.2}
After the initial selection of the ten~(\ref{eq10}) most relevant features via
the Fisher score, a Principal Component Analysis (PCA) was deployed to
address potential multicollinearity issues and achieve a more
aggressive dimensionality reduction. PCA projects the data into a new
orthogonal space of maximum variance, producing decorrelated principal
components  (PCs)~\cite{6}. Only the first k components, explaining a
predefined threshold of the total variance (97\%), were retained 
(Figure~\ref{fig4}).
This step simplifies the model structure, reduces
computational time, and mitigates overfitting by eliminating noise and
redundancy.

\def\ffigcaption{\label{fig4}}
\xsubfig[t!]{04a}{}
\xfigcnt
\begin{figure*}[t!]
\vspace*{-3pt}
\includegraphics{fig04}
\vspace*{-2pt}
\xcaption{\label{fig4}\ffigcaption}{Dimensionality reduction by PCA
(a)~Ball features, (b)~Inner\_race features and (c) Outer\_race
features.}
\end{figure*}

\def\xfigcnt{\setcounter{figure}{4}\def\thefigure{\arabic{figure}}}
\xfigcnt

\section{Theoretical overview of HMMs and RBFNNs}\label{sec4}

\subsection{Concept about HMM}\label{sec4.1}
The Hidden Markov Model was first described in the 1960s and first used
in speech recognition in the 1970s. In the late 1980s, it was applied
to DNA analysis and became an important technology in the field of
biological information. Through continued exploration and application
of this technology, it now finds widespread applications in many fields
such as fault diagnosis, machine learning, automated driving, natural
language processing, and target recognition. The HMM is a statistical
model for describing a hidden-state Markov process. It must first be
trained based on existing  data~\cite{40}.

An HMM is a doubly stochastic process with an underlying Markov process
that is not observable (the states are hidden), but can only be
observed through another set of stochastic processes that are produced
by the Markov process (the observations are probabilistic functions of
states). The observation values are used to determine the model 
parameters~\cite{27}. These values are then analyzed and identified
according to the established model.

HMMs can be defined by the following parameters: number of hidden
states $N$, number of observable states $M$ corresponding to each
hidden state, initial probability distribution $\rmpi $, state transfer
probability matrix $A$ and emission probability matrix $B$ or
probability density  function~\cite{19,23}, so we can therefore define
an HMM as: $\lambda =(A,B,\rmpi )$. An HMM can also be represented
graphically by different structures, including the example in
Figure~\ref{fig1} representing the structure used in our case, called
the left-right model or Bakis model. This choice models the
irreversible phenomenon of wear on the bearing elements. Indeed, wear
progresses forward without reversing, with probabilities estimated by
the Markov process described below; therefore, the probabilities of
backward wear are practically zero. The Markov states S1, S2, S3, and
S4 in  Figure~\ref{fig5} represent the classes that can potentially
affect the bearing elements, which is a hidden process, while the
outputs Y1, Y2, Y3, and Y4 represent the bearing's health states,
informing us of the bearing's current state, which is a visible process
based on probabilities estimated by the Markov process.
\looseness=-1

\begin{figure}
\includegraphics{fig05}
\caption{\label{fig5}Structure of the 4-state Markov model
(left-right).}
\end{figure}

\subsubsection{Markov hypothesis}  
The future state of a process does not depend on the past state, but
only on the present  state~\cite{23,27,40}.
{\begin{equation}\label{eq22}
q_{i+1}=f(q_{i}),\quad
i=1,\ldots ,t
\end{equation}}\unskip

\subsubsection{Immobility hypothesis} 
The output of the system is not related to time~\cite{27,40}.
{\begin{equation}\label{eq23}
P(q_{j+1}| q_{j})=
P(q_{i+1}| q_{i}),\quad
i\neq j, j\in 1,\ldots ,t
\end{equation}}\unskip

\subsubsection{Output independence hypothesis} 
The output of the system is only related to the current state of the 
system~\cite{27,40}.
{\begin{equation}\label{eq24}
P(o_{1},o_{2},\ldots ,o_{t}| q_{1},q_{2},\ldots ,q_{t})=
\prod_{i=1}^{t}P(o_{i}| q_{i})
\end{equation}}\unskip
In this paper, based on the estimation of model parameters
(Table~\ref{tab6}) and the
determination of the location and severity of fault signals by maximum
likelihood, the forward algorithm, the inverse algorithm, and the
forward-inverse algorithm are adopted. The definition and symbols of
each algorithm are as  follows~\cite{27,40}.

\subsubsection{Forward algorithm} 
The forward variable is defined as the probability of reaching a given
state, given the first $t$ observations of the  sequence~\cite{23,40}:
{\begin{equation}\label{eq25}
\alpha (t,i)=
P(o_{1},o_{2},\ldots ,o_{t}, Q_{t}=
q_{i}| \lambda ),\quad
1\leq t\leq T
\end{equation}}\unskip
where $T$ is the length of observations. The forward recursion is
expressed as~\cite{4,23}
{\begin{equation}\label{eq26}
\alpha (t+1)=
\left[\sum_{i=1}^{N}\alpha _{j}(t)a_{ij}\right]
B_{j}(o_{t+1})
\end{equation}}\unskip

\subsubsection{Backward algorithm}
The backward variable is defined as the probability of observing the
remaining observations given any starting  point $t$~\cite{26,37} 

Backward algorithm: the backward variable is defined as the probability
of observing the remaining observations given a starting point $t$.
{\begin{equation}\label{eq27}
\beta (t,i)=
P(o_{t-1},o_{t-2},\ldots ,o_{T}, Q_{t}=
q_{i}| \lambda ),\quad
1\leq t\leq T-1
\end{equation}}\unskip

\subsubsection{The backward recursion is expressed as~\cite{4,23}} 
{\begin{equation}\label{eq28}
\beta (t)=
\left[\sum_{i=1}^{N}a_{ij}B_{j}(o_{t+1})\right]
B_{j}(t+1)
\end{equation}}\unskip

\subsubsection{Forward--backward algorithm (Baum--Welch algorithm)}
Forward--backward algorithm obtains a set of forward probabilities and a
set of backward probabilities, wich can be used to jointly acquire the
distribution over states at any specific time  $t$~\cite{4,23}:
{\begin{eqnarray}
\xi _{t}(i,j)=
P(Q_{t}=q_{i}, Q_{t+1}=
q_{j}| O,\lambda )
&=&
\frac{\alpha _{t}(i)A_{ij}B_{j}(o_{t+1})\beta _{t+1}(j)}
{P(O| \lambda )}
\nonumber\\
&=&
\frac{\alpha _{t}(i)A_{ij}B_{j}(o_{t+1})\beta _{t+1}(j)}
{\sum_{i=1}^{N}\sum_{j=1}^{N}\alpha _{t}(i)A_{ij}B_{j}
(o_{t+1})\beta _{t+1}(j)}
\nonumber\\
&&
1\leq i,j\leq N,j\leq t\leq T
\label{eq29}
\end{eqnarray}}\unskip
HMMs mainly solve three application problems: evaluation, decoding and
learning. The first is to calculate the log-likelihood (LL) when the
parameters and observation sequences are known. The second is to find
the most probable state sequence when the observation sequence is
known. The third is to adjust the parameters to maximize the emission
probability of the observation sequence. The classic methods to solve
these three problems are the forward--backward algorithm, the Viterbi
algorithm and the Baum--Welch algorithm~\cite{24}.

\subsection{Concept about RBFNN}\label{sec4.2}
Radial basis function (RBF) networks are similar to multilayer
perceptron (MLP) networks. They also have direct unidirectional
connections, and each neuron is fully connected to the units in the
next layer. The neurons are organized in a layered topology with direct
dynamics (from inputs to outputs, without loops). RBF networks differ
fundamentally in how they model the relationship between inputs and
outputs. While MLP networks model this relationship in a single step,
an RBF network divides this learning process into two independent and
distinct steps  (Figure~\ref{fig6}). First, using the neurons in the
hidden layer called radial basis functions, RBF networks model the
probability distribution of the input data. Second, the RBF network
learns to relate the input data $x$ to a target variable $t$. Note: unlike
MLP networks, the bias term in an RBF neural network is connected only
to the output neurons. In other words, RBF networks do not have a bias
term linking the inputs to the radial basis  units~\cite{28}.

\begin{figure}
\includegraphics{fig06}
\caption{\label{fig6}Structure of an RBFNN~\cite{28}.}
\end{figure}

An RBF is represented by the structure in  Figure~\ref{fig2}; the
outputs are defined by  Equation~(\ref{eq9})~\cite{28}.
{\begin{eqnarray}
y_{k}
&=&
\sum_{j=1}^{M}\omega _{j}(k)\cdot
\varphi _{j}(x)+b
\label{eq30}\Seqnsplit
\varphi _{j}(x)
&=&
\mathrm{e}^{-\frac{(x-\mu_{j})^{2}}{\sigma_{j}^{2}}} 
\label{eq31}
\end{eqnarray}}\unskip
where $\omega _{j}(k)$ are the weights optimized by gradient descent,
$b $ is the bias.

Learning in an RBFNN consists of:
\begin{itemize}
\item
Choose the centers $\mu_{j}$ (for example $k$-means),
\item
Determine the widths $\sigma _{j}$,
\item
Estimate the weights $\omega _{j}(k)$  (by linear or pseudo-inverse
regression)
\end{itemize}
Radial basis networks can be used to approximate functions. newrb adds
neurons to the hidden layer of a radial basis network until it reaches
the specified mean squared error target.

\section{Supervised classification methodology for bearing
defects}\label{sec5}
The overall classification methodology is summarized in the flowchart
in  Figure~\ref{fig7}. 
This entire process is independently applied to
each of the three datasets corresponding to localized faults on the
outer race, the inner race, and the rolling element (ball). This
independent treatment allows the capture of specific structural
characteristics related to each type of mechanical defect, ensuring a
more accurate domain-specific modeling.

\begin{figure}
\vspace*{2pt}
\includegraphics{fig07}
\vspace*{4pt}
\caption{\label{fig7}Implementation flowchart of the hybrid HMM-RBFNN
bearing defect classification system.}
\vspace*{5pt}
\end{figure}

\section{Bearing defect classification}\label{sec6}

\subsection{Bearing Defect Classification Using HMMs}\label{sec6.1}
Supervised defect classification is orchestrated by a battery of hidden
Markov models (HMMs), following a so-called ``one-HMM-per-class''
architecture. For each health state (healthy, 0.007$^{\prime\prime}$ 
defect, 0.014$^{\prime\prime}$ defect, 0.021$^{\prime\prime}$ defect) 
of each component (OR, IR, Ball), a separate
HMM, denoted $\lambda_{k}$, is fully learned from its training data.
The fundamental principle of this learning is
maximum log-likelihood estimation. The objective is to adjust the model
parameters $\lambda= (A, B,\uppi)$, respectively the state transition
matrix, the emission (or observation) matrix and the initial
distribution, in such a way that it becomes the most likely generator
of the observation sequences  $O = (o_{1}, o_{2}, o_{3}, o_{4})$
presented to it.

\begin{table}%t6
\caption{\label{tab6}Simulation of classes using HMM parameters}
\begin{tabular}{ccc}  
\thead
Class designation & 
HMM designation  & 
Maximum likelihood \\ 
\endthead
Class 1 & $\lambda _{1}$ & 
Log $(P(O\mid \lambda _{1}))$ \\ 
Class 2 & $\lambda _{2}$ & 
Log $(P(O\mid\lambda _{2}))$ \\ 
Class 3 & $\lambda _{3}$ & 
Log $(P(O\mid\lambda _{3}))$ \\ 
Class 4 & $\lambda _{4}$ & 
Log $(P(O\mid\lambda _{4}))$ 
\botline 
\end{tabular}
\end{table}

Formally, this amounts to solving the following optimization problem
for each class:
{\begin{equation}\label{eq32}
\tilde{\lambda }=
\mathrm{argmax}
(\log (P(O\mid\lambda _{i}))),\quad
i=1,2,3,4
\end{equation}}\unskip

In practice, the probability $P (O \mid\lambda)$ itself is extremely
complex and difficult to maximize directly. This is why the Baum--Welch
algorithm, a specific instance of the EM (Expectation--Maximization)
algorithm, is employed. This iterative algorithm actually maximizes the
logarithm of the likelihood, according to relation~(\ref{eq32}), which
is mathematically equivalent but numerically more stable, avoiding the
overhead of product probability calculations.

The E (Expectation) process calculates, a current estimate of the
parameters $\lambda$, the forward  $(\alpha)$ and backward $(\beta)$
probabilities in order to estimate the expected value of the hidden
state assignment and transitions. The M (Maximization) process then
uses these expectations to update the  parameters $\lambda (A,B,\rmpi )
$  in order to maximize this expected likelihood. These two steps are
repeated until convergence, typically when the increase in the
log-likelihood $\mathcal{L}(\lambda)$ between two iterations falls
below a predefined threshold.

The concrete significance of this process for our application is
profound; for example, by  maximizing $P(O\mid \lambda _{1})$ for the
healthy model, we ``teach'' the HMM the normal dynamic and
probabilistic signature of a bearing. Conversely, by maximizing
$P(O\mid \lambda _{2})$  for defect model 1, we encapsulate in its
parameters (A, B) the unique ``stochastic vibration signature'' of an
incipient 0.007-inch defect, i.e., the probable frequency sequences and
their characteristic temporal concatenation. Thus, each HMM becomes a
compact and probabilistic representation of the vibration personality
of the class it represents. In the testing phase, classifying an
unknown signal with its sequence of observations  ``O'' simply amounts
to calculating the  log-likelihood $\log (P(O\mid\lambda _{k}))$  for
each HMM trained using the forward algorithm. The  class $\tilde{k}$ 
that maximizes this measure, $\tilde{k}=
\mathrm{argmax}(\log(P(O\mid\lambda _{k})))$,
is assigned to the signal, because its
model is the one that best explains, from a probabilistic point of
view, the genesis of the observed sequence.

\subsection{Bearing defect classification using Radial Basis Function
Neural Networks (RBFNN)}\label{sec6.2}

\subsubsection{Label preprocessing}\label{sec6.2.1}
To prepare the data for supervised learning, the discrete class labels
are first transformed into a format suitable for neural networks using
one-hot encoding. This technique represents each class as a binary
vector where a single element is active (value 1), indicating class
membership, while the others are inactive (value 0). This
transformation is crucial because it allows the RBFNN to output a
vector of scores or probabilities for each class, rather than a simple
discrete label.

\subsubsection{Training the RBF architecture}\label{sec6.2.2}
The RBFNN is then trained on the previously normalized training set.
The learning process relies on adjusting several critical
hyperparameters:
\begin{itemize}
\item
The dispersion parameter (spread or sigma) of the radial basis function
(often a Gaussian), which controls the width and influence of each
hidden neuron.
\item
The number of neurons in the hidden layer, which determines the model's
complexity.
\item
The error tolerance, which defines the algorithm's convergence
criterion.
\end{itemize}

A common approach is to build the network incrementally: neurons are
iteratively added to the hidden layer until the reconstruction error on
the training data falls below the defined tolerance threshold, thus
optimizing the model's ability to generalize.

\subsubsection{Prediction phase}\label{sec6.2.3}
During the testing phase, each sample is presented to the network. The
output layer produces a vector of scores (or activations) for all
classes. The decision rule adopted is simple: the predicted class is
the one associated with the highest score (max rule). The performance
of the RBFNN classifier is quantified using standard metrics as
illustrated in the following  section. This evaluation protocol,
identical to that used for Hidden Markov Models (HMM), allows an
objective and rigorous comparison of the performances between the three
approaches; HMM, RBFNN and hybrid HMM-RBFNN.

\subsection{Bearing defect classification using the hybrid HMM-RBFNN
system}\label{sec6.3}
This hybrid system was designed to leverage the complementary strengths
of Hidden Markov Models (HMMs) excellent for modeling temporal
dependencies in sequences and RBF Neural Networks---powerful for
capturing complex nonlinear relationships in point-like data. The
central idea is to merge their opinions to make a more robust and
accurate decision.

\subsubsection{Score fusion}\label{sec6.3.1}
For each test sample, the two classifiers independently produce their
own scores.

\medskip\noindent
{\textbullet}~\textbf{HMM Side} 

The model generates a log-likelihood score for each class. Since these
log-likelihood scores are not normalized and can vary considerably in
scale, they are transformed into a probability distribution using the
Softmax function. This produces a probability vector P\_HMM of size 4,
where each element is between 0 and 1 and the sum of the elements is 1.

\medskip\noindent
{\textbullet}~\textbf{RBFNN side}

The neural network directly produces a score vector (or sometimes
probabilities if an activation function like Softmax is used as
output). This vector is denoted S\_RBF for the 4 classes.

The fusion is performed by a weighted average of these two score
vectors, after scaling them  (usually $[0,1]$).

\medskip\noindent
{\textbullet}~\textbf{Fusion formula for class} $k$ 
{\begin{equation}\label{eq33}
\mathrm{Score}_{\mathrm{fusion}}(k)=
\alpha\cdot P_{HMM}(k)+
(1-\alpha )\cdot 
S_{\mathrm{RBFNN}}(k),\quad
\alpha \epsilon [0,1]
\end{equation}}\unskip

Parameter $\alpha$ controls the relative influence of each model on the
final decision. Several $\alpha$  values, both below and above 0.5,
were experimentally evaluated to analyze their impact on classification
performance. The results show that the value $(\alpha= 0.5)$ offers the
best compromise between the temporal modeling provided by the HMM and
the nonlinear discrimination capability of the RBFNN, thus leading to
the best overall classification performance. This observation confirms
the value of a balanced fusion that complementarily leverages the
strengths of both approaches.

\paragraph*{Hybrid model training and prediction}
The approach described above is a decision-level fusion (or  ``score
fusion'').

\paragraph*{Creating the fusion inputs}
A new dataset is created. For each training example, the HMM (P\_HMM)
and RBFNN (S\_RBF) score vectors are concatenated to form a new feature
vector of length 8 (4 $+$ 4).

\paragraph*{Retraining}
This new ``fusion'' feature vector becomes the input for a final classifier. Often, a simple model such as logistic regression or RBFNN is used. This hybrid classifier is retrained using the original one-hot labels. Its role is to learn how and when to trust each model to make the best collective decision.

\paragraph*{Prediction}
To leverage the complementary capabilities of the HMM and RBFNN models,
a decision fusion hybridization scheme was implemented. The outputs of
the two classifiers are merged using a weighting coefficient $\alpha$,
according to relation~(\ref{eq33}).

\subsection{Results and discussion} 
This section presents a detailed comparative analysis 
(Table~\ref{tab7}) of the
performance of the three classification architectures namely; Hidden
Markov Models (HMM), Radial Function Neural Networks (RBFNN) and the
hybrid HMM-RBFNN system. For the classification of bearing defects, the
evaluation is conducted separately for the three critical elements:
Ball, Inner Ring (IR) and Outer Ring (OR), aiming to discriminate the
four health states (Class 1 : Healthy, Class 2: Defect 1, Class 3:
Defect 2, Class 4: Defect 3).

\subsection{Overall performance}\label{sec6.4}
Table~\ref{tab8} summarizes 
the overall accuracy of each feature and
each classifier. A clear and consistent trend emerges from these
results.

The hybrid system achieves perfect or near-perfect performance (100\%
for Ball and IR, 99\% for OR), consistently outperforming both models
used alone. This superiority unequivocally demonstrates the effective
synergy between the HMM, which excels at modeling the temporal
evolution of vibration signatures, and the RBFNN, powerful at capturing
complex nonlinear relationships in the feature space. Their
complementary strengths allow the hybrid system to make a more robust
and accurate decision, correcting the errors that each model makes
individually.

\begin{table}%t7
\caption{\label{tab7}Comparison of overall accuracy (\%)}
\begin{tabular}{cccc}
\thead
\xmorerows{1}{Classifier} & 
\multicolumn{3}{c}{Accuracy (\%)} \\
\cline{2-4}
&  OR &  IR &  Ball \\ 
\endthead
HMM &  91.00 &  \098.66 &  \095.33 \\ 
RBFNN &  95.00 &  100.00 &  \098.66 \\ 
Hybrid HMM-RBFNN  &  99.00 &  100.00 &  100.00
\botline 
\end{tabular}
\end{table}

\subsection{Performance analysis by class and
interpretation}\label{sec6.5}
A granular analysis by class (Table~\ref{tab8}),
reveals the nature of the errors and the
strengths of each approach.

\begin{table}%t8
\fontsize{9.5}{11}\selectfont\tabcolsep2.5pt
\caption{\label{tab8}Detailed performance (Accuracy per class) of HMM,
RBFNN and Hybrid classifiers for each element of the bearing}
\begin{tabular}{cccccccccc}  
\thead
\xmorerows{1}{Class} &
\multicolumn{3}{c}{HMM} &
\multicolumn{3}{c}{RBFNN} & 
\multicolumn{3}{c}{Hybrid HMM-RBFNN} \\
\cline{2-4}\cline{5-7}\cline{8-10}
&  OR &  IR &  Ball &  OR &  IR &  Ball &  OR &  IR &  Ball \\ 
\endthead
\parbox[t]{1.5cm}{\centering
Class 1 (healthy)} & 
100.00\% &  100.00\% &  100.00\% &  100.00\% &  100.00\% &  100.00\% & 
100.00\% &  100.00\% &  100.00\% 
\vspace*{2pt}\\ 
\parbox[t]{1.5cm}{\centering
Class 2 (defect 1)} & 
\080.00\% &  100.00\% &  100.00\% &  \092.00\% &  100.00\% &  100.00\%
&  \096.00\% &  100.00\% &  100.00\% 
\vspace*{2pt}\\ 
\parbox[t]{1.5cm}{\centering
Class 3 (defect 2)} & 
100.00\% &  100.00\% &  \094.70\% &  100.00\% &  100.00\% &  100.00\% &
100.00\% &  100.00\% &  100.00\% 
\vspace*{2pt}\\ 
\parbox[t]{1.5cm}{\centering
Class 4 (defect 3)} & 
\084.00\% &  \094.70\% &  \086.80\% &  \088.00 &  100.00\% &  \094.70\%
&  100.00\% &  100.00\% &  100.00\%
\vspace*{2pt}
\botline 
\end{tabular}
\end{table}

\subsection{Consistent performance of the healthy state}\label{sec6.6}
Regardless of the classifier or the tested element, Class 1 (Healthy)
is always identified with 100\% accuracy. This crucial result indicates
that none of the models produces a false positive (a healthy part
identified as defective). This characteristic is essential for an
industrial diagnostic system, as it avoids unnecessary and costly
maintenance downtime.

\subsection{Systematic superiority of the hybrid system}\label{sec6.7}
The hybrid model made no errors on the inner ring (IR) and ball (Ball)
test sets, achieving 100\% accuracy across all their defect classes.
For the outer ring (OR) (Figure~\ref{fig8}),
it significantly improved the results of the
individual models, increasing the accuracy to 96\% for Defect 1 and
100\% for Defect~3. This perfect or near-perfect performance
demonstrates that merging the models corrects their respective errors
and creates a more robust and reliable system.

\begin{figure}
\vspace*{2pt}
\includegraphics{fig08}
\vspace*{4pt}
\caption{\label{fig8}OR Confusion matrices.}
\vspace*{6pt}
\end{figure}

\subsection{Comparative analysis of individual models}\label{sec6.8}

\medskip\noindent
{\textbullet}~\textbf{RBFNN vs HMM} 

The RBFNN outperforms the HMM in 100\% of cases where the HMM does not
reach 100\%. For example, for Defect 3 on the OR, the RBFNN (88\%)
outperforms the HMM (84\%) by 4 points. This trend confirms the greater
nonlinear modeling power of the RBFNN to capture complex relationships
in the data, whereas the sequence-based HMM shows its limitations.

\medskip\noindent
{\textbullet}~\textbf{Identified weaknesses of the HMM} 

The weaknesses of the HMM focus on specific defects. OR Defect 1
(80\%): The HMM's lower performance suggests that the vibration
signature of this particular defect may be less periodic or noisier,
making it difficult to model with a Markov chain.

Ball (86.8\%) and OR (84\%) Defect 3: These results confirm that the
HMM struggles with certain severe defects, likely because they generate
complex signals that encroach on the ``space'' of other classes in the
sequential model.

\medskip\noindent
{\textbullet}~\textbf{The Outer Ring (OR): the most critical element to
diagnose} 

The lowest results, across all classifiers (before hybridization), are
observed for the OR (Defaults 1 and 3). This identifies the OR as the
most challenging element to model and classify among the three.
Hybridization provides the most dramatic added value here, fully
addressing the deficiencies of the individual models.

\subsection{Performance conclusion}\label{sec6.9}   
This analysis unequivocally demonstrates that while the individual
models (HMM and RBFNN) are already highly efficient, the hybrid
HMM-RBFNN system is the architecture of choice for reliable and
accurate bearing fault diagnosis. It optimally combines the strengths
of both approaches:

\begin{itemize}
\item
The HMM's ability to model temporal dynamics.
\item
The RBFNN's ability to learn complex nonlinear decision boundaries.
\item
The robustness of the fused system, which eliminates single points of
failure.
\end{itemize}
The hybrid system guarantees near-infallible fault detection and
identification, regardless of the faulty component, making it a
cutting-edge solution for predictive maintenance.

\begin{table}%t9
\caption{\label{tab9}Maximum log-likelihood of observation sequences
for rolling element and class-trained HMM models}
\begin{tabular}{cccc}  
\thead
\xmorerows{1}{Class} & 
\multicolumn{3}{c}{Maximum Log-likelihood Log
$(P(O\mid\lambda _{i}))$}\\
\cline{2-4}
&  Outer race &  Inner race &  Ball \\ 
\endthead
Class 1 (healthy) & 
${-}$53129.73 & 
${-}$3967.20 & 
${-}$80048.80 \\ 
Class 2 (defect 1) & 
\0${-}$1154.20 & 
\0${-}$353.19 & 
\0\0${-}$946.80 \\ 
Class 3 (defect 2) & 
\0\0\0${-}$94.23 & 
\0${-}$290.34 & 
\0${-}$2031.18 \\ 
Class 4 (defect 3) & 
\0${-}$1131.56 & 
\0${-}$358.32 & 
\0\0${-}$205.50 
\botline 
\end{tabular}
\vspace*{8pt}
\end{table}

Table~\ref{tab9}
presents the maximum log-likelihood
$(\mathrm{Log}(P(O\mid\lambda _{i})))$ of the test data for the HMM
models specific to each class and rolling element.

\subsection{Differential performance by element}\label{sec6.10}
Modeling complexity varies considerably depending on the rolling
element. Models for the inner ring (IR) exhibit overall less negative
and more homogeneous log-likelihoods (from ${-}$3967 to ${-}$290) than
those for the outer ring (OR) and ball (Ball), suggesting that the
vibration signatures of IR defects are more distinctive and easier to
model by HMMs. This corroborates the overall high performance of the
classifiers on the IR.

\subsection{Healthy state modeling}\label{sec6.11}
For all three elements, the ``Healthy'' class (Class 1) consistently
exhibits the lowest (most negative) log-likelihood. This indicates that
the vibration signal of a healthy bearing is more complex, noisier, and
exhibits greater variability, making it inherently more difficult to
model with a high probability of an HMM compared to the periodic and
structured signals generated by localized defects.

\subsection{Identification of the most distinctive
defects}\label{sec6.12}
For each element, one defect stands out with significantly better
modeling performance:

\noindent
OR---Class 3 (Defect 2): The exceptionally high value (${-}$94.23)
indicates that this defect generates an extremely stereotypical
vibration signature that is perfectly captured by the HMM. This pattern
is so specific that it risks  ``attracting'' sequences belonging to
other classes, explaining possible confusion.  IR---Class 3 (Defect 2):
Shows the best log-likelihood (${-}$290.34), confirming that it is the
most characteristic defect for this element.  Ball---Class 4 (Defect
3): Has the highest log-likelihood (${-}$205.50), which corresponds to
excellent detection of the most severe defects on the ball.

\subsection{Confusion point prediction}\label{sec6.13}
Defect pairs with close log-likelihoods are potential sources of
confusion for the HMM classifier alone. For example: For OR: Classes 2
and 4 have very similar values (${-}$1154.20 vs.\ ${-}$1131.56), which
suggests a risk of confusion between Defect 1 and Defect 3. For Ball: A
significant gap exists between Class 4 (${-}$205.50) and Class 3
(${-}$2031.18), suggesting that Defect 2 is much more difficult to
correctly identify than Defect 3, which is reflected in the
class-specific accuracy results. 

\subsection{Conclusion}\label{sec6.14}
This log-likelihood analysis provides a fundamental explanation for the
performance of the HMM classifier. It highlights the most distinctive
defects, the most difficult classes to model (healthy), and predicts
classification ambiguities. It thus fully justifies the need to use a
non-linear classifier like the RBFNN, and ultimately the hybrid system,
to compensate for these limitations and achieve optimal and robust
classification accuracy.

\section{Multi-base generalization}\label{sec7}
Having demonstrated the effectiveness of the HMM-RBFNN hybrid system on
the CWRU database, widely used as a reference in the literature for
bearing fault diagnosis, it becomes necessary to evaluate the
generalizability of the proposed approach. Indeed, a high-performing
classification method must not only deliver good results on a given
database but also maintain its performance when applied to datasets
from different test benches, operating conditions, and sensors. With
this in mind, the study is extended to the Paderborn and Axial Bearing
databases, which present experimental characteristics and fault
scenarios significantly different from those of the CWRU database.

The Paderborn and Axial Bearing databases were integrated to evaluate
the robustness and generalizability of the HMM-RBFNN hybrid system.
Unlike the CWRU database, which relies primarily on artificially
localized defects and relatively controlled operating conditions, the
Paderborn database includes real defects resulting from progressive
damage processes, as well as greater variability in rotational speeds
and mechanical loads. The Axial Bearing database, on the other hand, is
distinguished by its analysis of vibration signals measured in the
axial direction, thus providing a complementary evaluation framework
based on a different mechanical configuration and vibration dynamics.
Applying the HMM-RBFNN hybrid system to these two databases therefore
allows us to verify its ability to effectively exploit features
extracted from signals exhibiting different statistical distributions,
frequency contents, and temporal dynamics.

\begin{figure}
\includegraphics{fig09}
\caption{\label{fig9}Paderborn test bench (see Footnote~2).}
\end{figure}

\subsection{Paderborn University (PU) database}\label{sec8.1} %7.1
The Paderborn database\footnote{\href{https://mb.uni-paderborn.de/en/kat/research/bearing-datacenter?utm\_source=chatgpt.com}{https://mb.uni-paderborn.de/en/kat/research/bearing-datacenter}.},
developed by the University of
Paderborn (Germany), is a reference database for studying bearing
defect diagnosis under conditions closely resembling real-world
industrial environments. Unlike many databases based on artificially
created defects, this database includes real defects resulting from
progressive damage processes. Measurements are performed on a modular
test bench (Figure~\ref{fig9})
incorporating an electric motor, a belt drive system, a
bearing-supported shaft, and a controlled loading device. Vibration
signals (Figure~\ref{fig10})
are collected using accelerometers mounted on the bearing
housing at various rotational speeds and load levels. The classes
studied include sound bearings as well as several types of localized
defects, including inner and outer race defects.

Following the same procedure established for the CWRU database, the
same analysis process was carried out to arrive at the results below.

\subsubsection{Fault classification} 
Table~\ref{tab10} presents the classification performance by class and the
overall accuracy obtained by three approaches: HMM, RBFNN, and the
hybrid HMM-RBFNN system.
The results show that the HMM model achieves perfect accuracy (100\%)
for all classes, confirming its ability to effectively model the
temporal dynamics of vibration signals. The RBFNN model also exhibits
excellent overall performance (99.50\%), although a slight degradation
is observed for class 2 (97.50\%), indicating increased sensitivity to
overlapping features between closely related fault states.

\begin{figure}
\includegraphics{fig10}
\caption{\label{fig10}Paderborn data base signal analysis.}
\end{figure}

\begin{table}%t10
\caption{\label{tab10}Classification accuracy (\%)}
\begin{tabular}{cccc}  
\thead
&  HMM &  RBFNN &  Hybrid HMM-RBFNN \\ 
\endthead
Class 1 &  100.00 &  100.00 &  100.00 \\ 
Clas 2 &  100.00 &  \097.50 &  100.00 \\ 
Class 3 &  100.00 &  100.00 &  100.00 \\ 
Overall &  100.00 &  \099.50 &  100.00 
\botline 
\end{tabular}
\end{table}

\subsubsection{Conclusion}
These results demonstrate that the HMM-RBFNN hybridization surpasses
individual approaches by guaranteeing perfectly stable classification
for all classes. The combined integration of temporal information and
nonlinear generalization capabilities thus constitutes an effective and
reliable strategy for bearing fault diagnosis, reinforcing the
relevance of the hybrid system for real-world industrial applications.
In contrast, the hybrid HMM-RBFNN system restores 100\% accuracy for
all classes by combining the sequential modeling offered by HMM with
the strong nonlinear discrimination capabilities of RBFNN. This
complementarity improves the robustness of the diagnosis, particularly
for classes that are more difficult to separate.

\subsection{Axial bearing database}\label{sec8.2}%7.2
The Axial Bearing database~\cite{10}, is dedicated to the analysis of
bearing defects subjected to axial loads, thus offering a complementary
framework to traditional databases primarily focused on radial
vibrations. The test bench (Figure~\ref{fig11})
consists of an electric motor driving a
shaft supported by ball bearings, on which various axial load
conditions are applied. Spall-type defects are introduced on the
bearing elements to simulate realistic degradation scenarios.

Vibration measurements (Figure~\ref{fig12}) 
are performed using accelerometers positioned in
the axial direction, enabling the capture of specific vibration
signatures often overlooked in conventional diagnostic approaches.

The results obtained (Table~\ref{tab11}) 
show that the hybrid approach maintains high
performance across all the databases studied, thus confirming its
robustness in the face of variations in experimental conditions and
data structures.

\begin{figure}
\includegraphics{fig11}
\caption{\label{fig11}Schematic and photograph of a special bearing
testing machine~\cite{10}.}
\end{figure}

\begin{figure}
\includegraphics{fig12}
\caption{\label{fig12}Axial bearing data base Ball signal analysis.}
\end{figure}

\begin{table}%t11
\caption{\label{tab11}Classification accuracy (\%)}
\begin{tabular}{cccc}  
\thead
Classifier  &  OR &  IR &  Ball \\ 
\endthead
HMM &  78.667 &  78.667 &  78.667 \\ 
RBFNN  &  100 &  100 &  100 \\ 
Hybrid &  100 &  100 &  100 
\botline 
\end{tabular}
\end{table}

The results of the  Table~\ref{tab12} confirm that the HMM-RBFNN hybrid
system is not limited to performance specific to a given database, but
constitutes a generalizable and robust diagnostic approach, capable of
adapting to varied experimental contexts and different bearing
configurations

\subsubsection{Recapitulation} 
The summary  Table~\ref{tab12} presents the classification performance
obtained by the three approaches studied (HMM, RBFNN, and Hybrid
HMM-RBFNN) on three reference datasets: CWRU, Paderborn, and Axial
Bearing.

\begin{table}%t12
\caption{\label{tab12}Performance of the HMM-RBFNN hybrid system as a
percentage (\%) for the three databases studied}
\begin{tabular}{cccc} 
\thead
\xmorerows{1}{Classifier} & 
\multicolumn{3}{c}{Database} \\
\cline{2-4}
&  CWRU &  Paderborn &  Axial bearing \\ 
\endthead
HMM  &  94.99 &  98.65  &  78.667 \\ 
RBFNN  &  97.88  &  98.78  &  100 \\ 
Hybrid &  99.66 &  99.33 &  100 
\botline 
\end{tabular}
\end{table}

The results show that HMM offers good performance on the CWRU (94.99\%)
and Paderborn (98.65\%) datasets, confirming its ability to model the
temporal dynamics of vibration signals. However, its accuracy drops
significantly in the Axial Bearing dataset (78.67\%), indicating
increased sensitivity to the variability of experimental conditions and
the complexity of axial 
\mbox{defects.}

RBFNN improves overall performance on all three datasets, particularly
on Axial Bearing where it achieves perfect accuracy (100\%), thanks to
its strong nonlinear feature separation capability. Nevertheless, its
performance remains slightly lower than that of the hybrid system on
the CWRU and Paderborn datasets.

The hybrid HMM-RBFNN system consistently achieves the best or
equivalent maximum performance across all databases, with
classification rates of 99.66\%, 99.33\%, and 100\%, respectively.
These results confirm the robustness and generalizability of the hybrid
model, resulting from the complementarity between the temporal modeling
of HMM and the discriminatory power of RBFNN.

\vspace*{-3pt}

\subsection{Comparison with the state of the art}\label{sec7.1}

\vspace*{-3pt}

The performance of the proposed hybrid system is compared with recent
work in the field.
The results obtained (Table~\ref{tab13})
are competitive with the most recent and efficient
approaches published in the literature. The system achieves some
accuracy equivalent to that of hybrid models based on deep
convolutional neural networks (CNN), which are known to be very data-
and computationally intensive. This fully validates the choice of the
HMM-RBFNN hybridization as an efficient, robust and elegant
architecture for the bearing fault diagnosis task.

\begin{table}[b!]%t13
\caption{\label{tab13}Comparison with hybrid methods in the literature\vspace*{-3pt}}
\tabcolsep=3.5pt\fontsize{9.5}{10.8}\selectfont
\begin{tabular}{cclc}  
\thead
Reference & 
Hybrid method & 
Acronym decoding/description & 
\parbox[t]{2cm}{\centering
Reported accuracy (\%)}
\vspace*{2pt}\\ 
\endthead
\cite{22}& 
Hybrid FMM-RF & 
\parbox[t]{5cm}{\raggedright
Fuzzy Min-Max (FMM) neural network and the Random Forest (RF) model,} & 
99.89 
\vspace*{4pt}\\ 
\cite{37} & 
\parbox[t]{2.8cm}{\centering
Hybrid CNN-MLP model} & 
\parbox[t]{5cm}{\raggedright
Convolutional Neural Network-Multi Layer perceptron} & 
98.00 
\vspace*{4pt}\\ 
\cite{42} & 
Hybrid CNN-MLP & 
\parbox[t]{5cm}{\raggedright
Convolutional Neural Network-Multi Layer perceptron} & 
100.00 
\vspace*{4pt}\\ 
\cite{43} & 
PCA $+$ Hybrid NN & 
\parbox[t]{5cm}{\raggedright
PCA: Principal Component Analysis (dimensionality reduction); 
NN: Neural Network} & 
${\approx}$98.70
\vspace*{4pt}\\ 
\cite{33}  & 
\parbox[t]{2.8cm}{\centering
CWT-CNN-BiLSTM-Attention} & 
\parbox[t]{5cm}{\raggedright
CWT: Continuous Wavelet Transform; 
BiLSTM: Bidirectional Long Short-Term Memory; 
Attention: feature weighting} & 
${>}$99.00 
\vspace*{4pt}\\ 
\cite{35}  & 
IHHO-DBN-ELM & 
\parbox[t]{5cm}{\raggedright
IHHO: Improved Harris Hawks Optimization; 
DBN: Deep Belief Network; 
ELM: Extreme Learning Machine} & 
${\approx}$99.50 
\vspace*{4pt}\\ 
\cite{30}& 
\parbox[t]{2.8cm}{\centering
CNN-Dual Feature Selection $+$ SHAP} & 
\parbox[t]{5cm}{\raggedright
SHAP: SHapley Additive explanations (model interpretability)} & 
${>}$99.00 
\vspace*{4pt}\\ 
\parbox[t]{2.8cm}{\centering
Our contribution}  & 
HMM-RBFNN & 
\parbox[t]{5cm}{\raggedright
Hidden Markov Models-Radial Basis Function Neural Network} & 
${\approx}$99.67
\vspace*{4pt} \\
{}\cite{46} & 
\parbox[t]{2.8cm}{\centering Deep CNN $+$ Random Forest} & 
\parbox[t]{5cm}{\raggedright Deep
CNN: Convolutional Neural Network (automatic feature extraction); 
Random Forest: ensemble learning classifier} & ${\approx}$99.20 
\vspace*{2pt} 
\botline 
\end{tabular}
\tabnote{{Note}: A weighted average of the results on Ball, IR
and OR gives an overall accuracy of 99.69\%.}
\end{table}

\subsection{Summary of findings}\label{sec7.2}
Merging the scores of the HMM and RBFNN models proved to be a winning
strategy. It compensates for the individual weaknesses of each model
and leverages their complementary strengths to produce a more robust
and accurate classifier, capable of diagnosing any type of fault on the
various bearing components with an exceptionally high success rate.

\section{General conclusion}\label{sec8}
This comparative study confirms that hybrid diagnostic frameworks
significantly improve bearing fault classification by combining
complementary modeling capabilities. Deep learning-based hybrids,
particularly CNN-based architectures, exhibit remarkable accuracy but
often require large training datasets and substantial computing
resources. Conversely, the proposed HMM-RBFNN approach offers
competitive performance while maintaining reduced complexity and better
adaptability to data-constrained environments. By leveraging the
time-domain modeling power of Hidden Markov Models and the nonlinear
classification capability of radial basis function neural networks, the
hybrid system effectively captures the dynamic and discriminating
characteristics of vibration signals. The consistent performance
achieved on the CWRU, Paderborn, and Axial Bearing datasets underscores
the robustness and generalizability of the proposed method. These
results demonstrate that the HMM-RBFNN hybrid is a reliable and
efficient alternative to complex deep learning solutions for practical
bearing fault diagnostic applications.

\section*{Future work and perspectives}
While this work opens promising avenues, several areas for improvement
and research can be considered:
\begin{itemize}
\item
Automatic hyperparameter optimization: Further exploration, via
Bayesian optimization algorithms or expanded search grids, could
further refine performance, particularly by optimizing the fusion
weighting factor $\alpha$.
\item
Generalization of more varied data: It would be relevant to test the
model's robustness on datasets from different machines, operating under
varied conditions (load, speed, noise levels) to assess its
generalization capacity.
\item
Exploration of other fusion architectures: Testing earlier fusion, at
the feature level rather than at the score level, could potentially
lead to the discovery of even more discriminating patterns. 

\item Towards
an embedded system: The next step would be to optimize the final model
for deployment on an embedded platform, enabling real-time diagnostics
directly on site, which would represent a significant advancement for
industrial predictive maintenance.
\end{itemize}

In conclusion, this work not only enabled the development of a powerful
classifier; it also highlighted the immense potential of hybrid
architectures to address the complex challenges of Industry 4.0,
combining the strengths of multiple models to create a solution that is
smarter and more reliable than the sum of its parts.

\section*{Declaration of interests}
The authors do not work for, advise, own shares in, or receive funds
from any organization that could benefit from this article, and have
declared no affiliations other than their research organizations.

\back{}

\printbibliography
\refinput{crmeca20250952-reference.tex}

\end{document}
