Mechanical dissimilarity of defects in welded joints via Grassmann manifold and machine learning

Assessing the harmfulness of defects based on images is becoming more and more common in industry. Today these defects can be insert in digital twins that aim to replicated in a mechanical model what is observed on a component. We propose a methodology for defect classification and defect labeling in view of fast prediction of their harmfulness by using hyper-reduced order models. Mechanical models studied here are non parametric. This problematic issue is circumvent by introducing a classification of defect. Then, a local hyper-reduced order model is defined in each class of defect. The proposed methodology for the classification, consists in seeing each defect by its mechanical effects more than by its morphology. Each defect is encoded as a point on a Grassmann manifold. This data encoding uses a projection-based hyper-reduction. We restrict our attention to a mechanical modeling of elastic stresses around defects. We show that the hyper-reduced equations are similar to an oblique projection of the finite element prediction. Hence we obtain an upper bound of the approximation error for displacement that depends on a Chordal distance between points on a Grassmann manifold. From the available data set of defects, we have observed that the Grassmann distance is magnifying the Chordal distance between defects. A k-medoids algorithm is used in order to create clusters of defects from a matrix of Grassmann distance between defects. A simulation-based labeling of defect is proposed. This labeling procedure relies on hyper-reduction and a related error indicator. This error indicator measures the discrepancy between the stresses predicted via hyper-reduction and equilibrated stresses. In spite of the large variety of defects in the data set, we show accurate predictions of stresses for most of defects in the test set, while training the clustering and the hyper-reduced order model on a different training set. These accurate results are obtained by combining two contributions for a reduced basis for displacements : macroscopic modes related to a defect-free mechanical problem and elastic fluctuation modes specific to each class of defect. In this paper, we pay particular attention to the compromise to be made between the accuracy of the numerical approximations and the memory space required to save the reduced bases.


Introduction
Mechanical modeling based on images is becoming increasingly important in material science and in industrial applications for the assessment of harmfulness of defects. The early detection of defects in industrial processes has been studied for more than a decade. In fact, in 2011, scientists tried to improve the efficiency of early fault detection on gears, which are critical in many machinery operations [1]. Moreover, nondestructive inspection techniques are able to detect and locate voids for a wide range of materials and welding processes: resistance seam welding of aluminum, zinc, and galvanized steel [2]; resistance spot welding of ferritic/martensitic steels [3]; electron beam welding of steel to Fe-Al alloy [4]; laser welding of stainless steels [5] and aluminum alloys [6].
Here, machine learning enhances the value of data related to defects observed in the past if these data are available in a memory storage system. In the framework of image-based diagnosis, machine learning aims to consider the following assertion: if two defects are similar, they have equivalent harmfulness. Accounting for this similarity should facilitate the prediction of the harmfulness of new defects by using a training dataset of defects. The purpose of this paper is to sample a training dataset of defects so that a set of representative defects is defined. The representative defects are assumed to be well distributed in the training dataset according to an appropriate metric. This metric classifies defects according to the displacement fluctuations they cause around them. Here, the sets of both experimental data and simulation data, related to the representative defects, define a dictionary of digital twins. This dictionary contains the simulation data for the construction of local reduced order bases [7] in the nonparametric space of the observed defect. Section 2 presents a schematic view of these ideas.
Pure data-driven approaches have been detailed in the literature for defect diagnosis. Provided that a wide range of data is available, machine learning methods can eventually detect defects automatically and classify them into different classes given some prescribed criteria. For instance, supervised machine learning is used for defect classification issues related to a freezing process in [8]. Convolutional neural networks (CNNs) [9], which are very helpful in computer vision [10], have been trained to detect and diagnose defects. In [11], a CNN detects defects on an automotive damper. Another encouraging study published recently uses CNNs to diagnose defects on freight trains with a precision of approximately 80% [12]. An amazing property of CNNs is their ability to learn features, or kernels [10], solely by using labeled data and deep learning. However, in our opinion, using deep learning does not necessarily imply forgetting the available knowledge about the mechanics of materials when it comes to predicting the harmfulness of defects. The purpose of this paper is to couple machine learning and mechanical modeling so that transfer learning [13] is achieved.
In the mechanics of materials, image-based meshing methods [14,15] enable generating complex finite element meshes of digital images obtained via X-ray computed tomography [16]. The finite element models used as digital twins of mechanical components are fed by a huge knowledge on mechanical behavior for various materials, metals [17], composites [18], and concrete [19] for instance. Unfortunately, they cannot be used as a tool to assess the quality of a component in a serial production framework. The required fine meshes of defective components generally lead to prohibitive computational time as explained in [20]. However, the use of mechanical knowledge in digital twins would ensure data continuity between the component design phase and the diagnosis of defect harmfulness [20]. Besides, projection-based model order reduction methods enable reusing knowledge about materials science while using machine learning to generate approximation spaces for a fast solution of partial differential equations (PDEs) [21,22]. Therefore, physics principles and material constitutive equations are preserved in weak formulations of partial derivative equations. In this paper, the machine learning task is restricted to the construction of an approximation space for the purpose of model order reduction. Here, this approximation space is piecewise constant over the space of all possible defects. Its numerical representation is a dictionary of digital twins based on representative defects. Usually, the manifold containing the solution of a PDE can be accurately depicted as residing in a small vector space. Therefore, linear machine learning methods, such as proper orthogonal decomposition [23], singular value decomposition (SVD), and noncentered principal component analysis, aim at learning this reduced vector space from simulation data. This subspace is then used as a single approximation space for the projection-based model order reduction of PDE. In practice, the finite element shape functions are replaced by vectors, or empirical modes, that span the reduced approximation space. In some situations, the solution of the governing PDEs lies in a manifold that cannot be covered by a single vector subspace without increasing its dimension, thus degrading the computational complexity of projection-based model reduction. In these cases, deep learning algorithms are useful for reducing the complexity of approximation spaces by using simulation data. For instance, in [24], physics-informed neural networks are proposed for solving supervised learning tasks while respecting any given laws of physics described by general nonlinear PDEs. These networks are no longer used as projection-based model order reduction schemes. However, such reduced schemes are found in [25] and [22], where a deep classifier recommends a reduced order model depending on input variables having a tensor format (e.g., images). In [25], a CNN recommends a reduced order model related to a loading environment seen on the image of an experimental setup. In [22], a deep classifier using CNN is trained to recommend hyperreduced order models for lifetime prediction depending on a three-dimensional (3D) stochastic temperature field. These two contributions follow the same neural network architecture termed ROM-net [22]. Such ROM-nets are trained by using simulation data encoded as points on a Grassmann manifold, which is a set of vector subspaces of given common dimensions in the same ambient space. This data encoding is quite general when considering simulation data in the framework of projection-based model order reduction. In this paper, it is extended for the classification of voids according to their mechanical effects on mechanical components.
In the current work, observational data are two-dimensional (2D) slices of experimental 3D images of voids. These 3D images have been obtained via X-ray computed tomography by Lacourt [26]. The reduced approximation space for displacements is spanned by two types of vectors: macroscopic modes of an ideal defect-free medium and fluctuation modes around each defect. By following the two-scale machine learning approach proposed in [20], the fluctuation modes are computed by assuming dilution conditions and scale separation for each defect separately. The volume fraction of the defect is negligible when computing the fluctuation modes. In linear elasticity, each defect admits an exact reduced basis of fluctuation modes for strains and displacements. For 2D problems, this ideal reduced basis contains three modes; for 3D problems, it contains six modes. This reduced basis is said to be ideal because its computation requires the finite element solution of an elastic problem, which is specific to the defect. In the proposed approach, such solutions are available only during the training phase of the approximation space. In the test phase, the ideal reduced basis is not available. The methodology, introduced in [22] for ROM-nets, aims to take in a dictionary the reduced approximation space related to a similar defect as a substitute for the ideal reduced basis. Here, this dictionary is called the dictionary of digital twins. This requires defining the following: • a dissimilarity evaluation between defects, • a dictionary of representative defects containing the related digital twins with a small number of items, • a classifier that finds the best item in the dictionary for the construction of an approximation space dedicated to the target digital twin in a test set or for real application.
In this paper, the representative defects are medoids selected by the k-medoids algorithm [27]. The number of medoids has to be prescribed prior to data clustering via the k-medoids algorithm. Too many items in the dictionary make the classifier too complex and would eventually entail no complexity reduction for image-based modeling. In this paper, the best item in the dictionary is approximately selected via an error indicator and not via a CNN as proposed in [22]. This error indicator measures the discrepancy between the stresses predicted via hyper-reduction (HR) for the target digital twin and the equilibrated stresses [28]. A theoretical analysis of the convergence of the hyper-reduced approximation shows that a partial approximation error has an upper bound that scales linearly with the sines of the principal angles between the ideal modes and the modes involved in the approximation space. Identical reduced bases have principal angles equal to zero as well as sines. The convenient space to measure these angles is a Grassmann manifold [29,30]. The sines of these angles are termed chordal distances [31]. Hence, the proposed dissimilarity criterion accounts for the mechanical effect of the defects via model reduction of displacement fluctuations around defects. It is not a direct evaluation of morphology dissimilarity.
For the sake of simplicity, the target problem used for defect diagnosis is similar to the micromechanical problem that defines the fluctuation modes. The reader can find in [20,32,33] more complex target mechanical problems that are solved by the HR method used in this paper. As the finite element approximation space is specific to each defect, we need to design a common ambient space for the computation of Grassmann distances. This is performed through an encoding mesh. We have paid particular attention to the compromise to be made between the accuracy of numerical approximations and the memory space required to save the simulation data related to the encoding mesh.
The present paper is structured as follows. Section 2 is a commented graphical abstract. Section 3 presents the projection-based model-reduction method, an upper bound for partial errors on displacement predictions, and the encoding mesh associated with the Grassmann manifold. Section 4 details the training dataset of defects, the partition of these data by using the k-medoids algorithm in the Grassmann manifold, and some validation results. Section 5 presents the stress prediction results for the test set of defects and the classifier used to select an item from the dictionary of digital twins. Section 6 draws the conclusion of this paper.

Commented graphical abstract
Assume that a training dataset of defects is available with detailed digital twins and all related data. A schematic view of the training dataset is plotted in Figure 1. Training data are plotted as blue points in this figure.
We propose to select representative defects and their data to form a dictionary of digital twins for the purpose of model reduction. Representative defects are assumed to be well distributed in the training dataset according to a dissimilarity measurement. Two representative defects are plotted in red and green in Figure 1. The main novelty is a proper dissimilarity definition between defects so that the representative defects are medoids selected by using the k-medoids algorithm [27].
The original ambient space (Figure 1(a)) that contains the predictions of fluctuations around defects in the training dataset does not account for the linearity of elastic balance equations. We know that displacement fluctuations belong to a small vector subspace spanned by fluctuation modes, which is denoted by V for each defect individually. These vector subspaces are points in a manifold termed Grassmannian. As a result, prior to dissimilarity computation between defects, the training data are placed in a proper ambient space: a Grassmannian. A sketch of the Grassmannian ambient space is shown in Figure 1 Displacement fluctuations or stress fluctuations around defects have magnitudes directly related to the loading magnitude defined in the mechanical target problem. They can be easily computed. This is the reason why fluctuation magnitudes do not matter in similar linear predictions. Then the Grassmannian ambient space is plotted as a unit circle. The geodesic distance on this circle is the angle θ shown in Figure 1(b). One can also consider in Figure 1 the chordal distance, which is the length of the straight line between two points on this circle for small values of θ. This chordal distance is closely related to an upper bound of partial approximation errors.

Figure 2.
On the left, an experimental defect and its close-up. On the right, the horizontal component of ∆u ( j ) for two traction modes related to E (1) and E (2) and one shear mode related to E (3) .
In what follows, the simulation data of the representative defects are the training data used for the HR of digital twins.
Data in the test set cannot be plotted in Figure 1 because their related fluctuation fields have to be predicted via the proposed dictionary-based model-reduction method.

Projection-based hyper-reduction of the target mechanical problem
This section describes the target mechanical problem used as a digital twin of a target defect. In the mechanics of materials, the harmfulness of defects is evaluated through the magnification of the Cauchy stress around each defect. We restrict our attention to the prediction of the Cauchy stress via the approximate solution of PDEs in linear elasticity of isotropic materials. The principal variable of these equations is the displacement field.
The domain occupied by the material, denoted by Ω , is a surrounding box around each defect. We give an example in Figure 2. Formally, Ω has no parameter. Its morphology totally depends on the observation of a defect through a digital image. In the following, the superscript refers to mathematical objects that are specific to the digital image of a defect. This notation emphasizes the variability due to the input image. All mathematical objects with the superscript can be seen as the output of an implicit function depending on Ω without introducing any parameter for Ω . We avoid giving a morphological definition of an input space that contains all possible values of Ω . Hence, the proposed setting is a nonparametric mechanical modeling for digital twins.
The outer boundary of the surrounding box is denoted by ∂ E Ω . The boundary of the defect, which is modeled as a void, is denoted by ∂ V Ω . The local frame follows the principal axes of the second-order moments of the volume distribution in the defect. These principal axes are denoted by e 1 and e 2 . All defects have a circumscribed circle of diameter equal to D. The length of Ω is 30D. The displacement field and the Cauchy stress are denoted by u and σ , respectively.
The fluctuation of the displacement field is denoted by ∆u . It is defined as the counterpart of the homogeneous displacement field such that where E is the macroscopic strain tensor imposed on ∂ E Ω . It is a given symmetric second-order tensor. Here, E · x is the defect-free macroscopic mode for displacements; V is the usual finite element approximation space that accounts for the defect geometry and Dirichlet boundary conditions (2) on ∂ E Ω .
Here, Ω is sufficiently large to fulfill a dilution assumption and the following boundary condition on ∆u : The Hooke tensor for elasticity is denoted by C. Hence, the elastic constitutive equation reads as where ε(u ) is the deformation tensor, that is, the symmetric part of the displacement gradient. The weak form of the elastic equilibrium equation reads as follows: find ∆u ∈ V such that Here, E is a symmetric second-order tensor; it has only three components for 2D problems. Therefore, as the elastic equations are linear, there exists an ideal reduced basis containing three fluctuation modes denoted by (ψ k ) k=1,...,3 such that where γ ∈ R 3 is a vector of exact reduced coordinates. As the fluctuation modes are vectors of the approximation space, one can introduce the reduction matrix V , which contains the finite element coordinates of the fluctuation modes: In what follows, approximate reduced bases for fluctuation modes are denoted by V. In Figure 1(b), V is represented by the blue arrow and V is represented by the red arrow. The parameter V may not be totally specific to Ω . The approximate continuous modes are For a given reduced matrix V, the approximate displacement fluctuation reads as The two reduced bases span two vector spaces of the same dimension N in the same ambient space R N , N < N . Each of these vector spaces is a point in a Grassmann manifold, which is denoted by Gr (N , N ). This manifold is a huge space as it contains all the vector subspaces of dimension N in R N . For instance, it contains also the vector subspaces spanned by all sets of N vibration modes. Such vibration modes are not relevant here. However, the Grassmann manifold is equipped with geodesic metrics, which is very convenient for the definition of the dissimilarity between defects, as explained in Section 3.2, when considering approximation errors due to the HR method.
The HR method [34] aims at computing reduced coordinates γ introduced in (9) by projecting the equilibrium equation on V via a restriction of the domain Ω to a reduced integration domain (RID) denoted by Ω R . By following the empirical interpolation method [35], interpolation points are computed for column vectors in V. We choose the RID Ω R such that it contains the interpolation points related to both the reduced basis for displacement and a reduced basis for stresses. We give more details about the construction of Ω R in Appendix. We define a set of test reduced functions denoted by ψ R j : As explained in [34], these test functions are null on the interface between Ω R and the counterpart of the domain as if Dirichlet boundary conditions were imposed. On this interface, the displacement follows the shape of the modes ψ k . The HR method gives access to reduced coordinates γ that fulfill the following balance equations: The matrix form of the hyper-reduced balance equations reads as follows: find γ ∈ R N such that q HR = Vγ (15) where V[F , :] denotes a row restriction of matrix V to indices in F . We assume that the matrix K HR is of full rank. This assumption is always checked in numerical solutions of hyper-reduced equations. Rank deficiency may appear when the RID construction does not account for the contribution of a reduced basis dedicated to stresses. In this paper, the hyper-reduced prediction is supplemented by the following equilibrium step over the RID: find δu such that where F is the set of all degrees of freedom in Ω R except those belonging to elements connected to the interface between Ω R and its counterpart. During this correction step, displacements are frozen on elements connected to the interface between Ω R and its counterpart. There is no natural boundary condition on this interface. Here, they are forecast by the hyper-reduced prediction. This correction step has been proposed in [36] for the evaluation of contact forces. This is a local correction step over Ω R . The solution u eq = Ex + ∆u + δu is a hybrid solution that weakly couples HR and a finite element approximation over the RID. We refer the reader to [36][37][38] for more details about hybrid HR schemes. The equilibrated stress is quite accurate compared to the stress computed via HR solely. The hyper-reduced stress prediction is denoted by σ HR , while the equilibrated stress is denoted by σ eq . Here, δσ is the correction term for stress predictions: Property 1. If K HR is of full rank, then the hyper-reduced balance equations are equivalent to an oblique projection of the finite element prediction: where Π = K [:, F ]V[F , :]. Hence, the hyper-reduced prediction of the reduced coordinate vector γ is a minimizer for f (β): Here, Π is a projector for elastic stresses in Ω R according to the reduced test functions: where σ is the finite element stress prediction.
The proof is straightforward. Here, K HR = Π T V. The Jacobian matrix for f reads as J = V T ΠΠ T V = (K HR ) T K HR . If K HR is of full rank, then J is symmetric definite positive and J −1 = (K HR ) −1 (K HR ) −T . Then, both the minimization problem and the hyper-reduced equation have a unique solution. The solution of the minimization problem is Since We can note that if Ω R = Ω , F contains all degrees of freedom indices and δu = ∆u − ∆u. However, the correction step has the same computational complexity as the full finite element model. In this case, the size of Ω has been carefully chosen so that the reduction in complexity is not trivial. Here, the most complex operations are indeed the computation of K HR and the solution of the reduced linear system of equations. They scale linearly with card(F )N 2 and N 3 , respectively. Hence, N 3 has to be sufficiently small compared to N if we consider the computational complexity for the solution of sparse linear systems in the finite element method.

Approximation errors
Let us introduce three canonical macroscopic strains, E (1) , E (2) , and E (3) : In what follows, the three finite element solutions obtained for each of the three canonical macroscopic strains E (1) , E (2) , and E (3) are saved in a matrix Q such that the three displacement fields read as All mechanical simulations are run with the Z-set software suite. One can find more information on the software website (http://www.zset-software.com/). The reduced basis V is obtained by using a truncated SVD of Q : where I is the 3 × 3 identity matrix. Because of the linearity of the elastic problem above, C is proportional to the Young modulus. Hence, V does not depend on the value of the Young modulus thanks to the normalization of the modes. Regarding the Poisson coefficient, for the sake of simplicity, we restrict our attention to isotropic materials having a Poisson coefficient equal to 0.3. An example of displacement fluctuations is shown in Figure 2.
Similarly to Céa's lemma, but in the finite dimension, there exists an upper bound for the approximation error observed through the projector Π. The best projection of the exact solution in the approximation space via the 2-norm is denoted by γ P : where q = V γ and K q = F . Hence, γ P = V T V γ .

Property 2.
There exists a stability coefficient c , which does not depend on F (the loading condition), such that the partial approximation error has the following upper bound: where Π T (q − Vγ) 2 is the partial approximation error that does not account for errors in stress predictions outside Ω R . Hence, the smaller the Euclidean distance between the subspace spanned by V and the finite element prediction, the better the prediction of the stress in Ω R according to the projector Π. It is therefore relevant to train V by using a training dataset of defects. When the RID covers the full domain, a certification of the reduced projection can be achieved, where all errors admit an upper bound, by following the constitutive relation error proposed in [39,40].
The proof of the previous property is straightforward in the finite dimension. Let us denote by α an upper bound of the highest singular value of Π. Then, Moreover, as γ is a minimizer for f (·), we obtain and Property 3. If V ∈ R N ×N and V ∈ R N ×N are two orthonormal matrices of the same ambient space R N , then they span subspaces that belong to the same Grassmann manifold. The partial approximation error has an upper bound depending on the chordal distance [31], denoted by d Ch (V , V), between the subspaces spanned by V and V, respectively: Here, the chordal distance [31] uses the principal angles θ ∈ R N , θ k ∈ [0, π/2[ for k = 1, . . . N computed via the following SVD: where · F is the Frobenius norm. Here, cos(θ) and sin(θ) are the cosine and sine diagonal matrices, respectively. In addition, the following property holds when a full SVD is computed: The proof is the following: Hence, For all matrices A ∈ R n×m and B ∈ R m×n , the following property holds: and for a ∈ R n : a F = a 2 . Thus, and Property 3 is a convergence property for hyper-reduced predictions. If θ = 0, then VU = V U and ⇒ q H R = q and σ HR = σ .
Hence, the hyper-reduced prediction is the same as the finite element prediction. However, it uses fewer floating-point operations.
In what follows, we assume that there is a correlation between the approximation error σ HR − σ in Ω R and the upper bound in (40). This correlation is evaluated through a numerical experiment in Section 4.1.

Grassmannian ambient space
As explained in [29,30], Grassmann manifolds are the adequate concept when considering interpolation of reduced order models. Property 3 shows that it is also relevant for data encoding in a proper ambient space and for the classification of voids according to their mechanical effects. We state that the target defect Ω is similar to a defect having V for the ideal reduced basis if In what follows, each defect in the training dataset is encoded by using its ideal reduced basis V as shown in Figure 1(b).
A common Grassmann manifold is defined for all defects by using a common encoding mesh. Each finite element prediction related to the training dataset of defects is transferred on this encoding mesh prior to any machine learning including SVD and dissimilarity computations. To facilitate the transfer of simulation data, the prediction ∆u ( j ) is extended inside the defect by introducing a very small Young modulus in the void (10 −3 E ). Figure 3 shows the encoding mesh.
This data encoding from the reduced mechanical response of the input, has the huge advantage to be universal. This encoding may be applied in other fields. Each data in the dataset is supplemented by a reduced basis V (i ) , where i is the defect index.
Several geodesic distances are available between points in Grassmann manifolds. The chordal distance is one of them. The Grassmann distance is another one, which is more common in the literature. The Grassmann distance between the subspaces spanned by V and V, respectively, is denoted by d Gr (V , V). It reads as In the following numerical experiment, the Grassmann distance magnifies the distance between subspaces compared to the chordal distance similarly to the one-dimensional equation θ ≥ sin(θ) for θ ∈ [0, π/2]. As a result, we choose d Gr for the partition of the training dataset of defects into classes of defects. The encoding mesh must have the number of degrees of freedom N as small as possible to lower the memory use in a storage system, but it must be sufficiently fine to • lower the norm of transfer errors from the original mesh to the encoding mesh; • detect dissimilarities between defects up to a given accuracy.
To highlight the influence of the encoding mesh on transfer errors, we varied the number of elements on each half-side of the middle square that has a regular mesh. Figure 4 presents the convergence of the Grassmann distance between a perfect circle and an isotropic defect or an anisotropic defect. Both isotropic and anisotropic defects are shown in Figure 5. Grassmann distances are calculated for several encoding meshes. The ideal reduced basis for the fluctuation modes around the circular defect is denoted by V (0) . Here, V (1) and V (2) denote the reduced  bases related to the anisotropic and isotropic defects, respectively. The two Grassmann distances are In Figure 4, the Grassmann distances and the encoding mesh successfully distinguish the two defects, and we observe a convergence of the distances when the mesh exceeds four elements on each half-side of the middle square. A difference of 20% in the Grassmann distance between the two defects is consistent as the pilot circular defect and the anisotropic defect are mechanically very different. The gap may be sufficient to distinguish them during the partition of training data. Therefore, we suggest keeping the encoding mesh with 609 nodes (four elements on each halfside).

Numerical results on the training dataset of defects
Three-dimensional images of voids have been obtained by Laurent Lacourt [20] via X-ray computed tomography of welded joints. These 3D images have been cut into slices so that we have more 2D samples. The dataset contains n I = 2,745 samples of 2D images with only one defect per image. However, one defect may involve several voids that have a high mechanical interaction. Eighty percent of these data have been randomly selected as training data. The remaining data belong to the test set.

Data clustering
Clustering methods have been already used for the construction of local reduced order bases in parametric spaces. We refer the reader to [41,42], for instance, and for more recent examples to [43]. Local reduced order bases are known to be more accurate over a global reduced basis [41]. Here, the set of possible defects is not a parametric space. For this reason, k-means is useless because no barycentric coordinates are available here. The k-medoids algorithm [27] circumvents this difficulty in the case of nonparametric modeling via the selection of representative defects.
All available images in the dataset have been converted into finite element predictions via image-based digital twins. Displacement fluctuations have been transferred to the encoding mesh prior to the computation of the related reduced basis V (i ) (i = 1, . . . , n I ). Then, a dissimilarity matrix has been computed, accounting for all the Grassmann distances between defects in the training dataset: A k-medoids clustering algorithm [27] has been used for the partition of data according to their Grassmann distance. The k-medoids algorithm proposed in [27] can be summarized as follows: • Initialization step: select K rows in D Gr as indices of initial medoids (m 1 , . . . , m K ).
• Repeat the following two steps until convergence: -Data assignment step: assign each point of the dataset to the cluster corresponding to its closest medoid: -Medoid update step: for each cluster, update the medoid by finding the point that minimizes the sum of distances to all the points in the cluster: We arbitrarily set the number of clusters to K = 5. The set C k contains the indices of defects in the cluster number k. Hence, the sizes of the clusters are card(C 1 ) = 554, card(C 2 ) = 200, card(C 3 ) = 202, card(C 4 ) = 901, and card(C 5 ) = 339. The defects located in the medoids are shown in Figure 6. These defects have different anisotropies. The most isotropic defect is m 4 . The most anisotropic defects are m 2 and m 3 . These last defects are rather symmetric. In future work, this symmetry should be removed from the dataset.
Defects m 2 and m 3 contain two voids. They are so close to each other that fluctuation modes account for the local interactions between these voids.

Validation of hyper-reduced predictions for the training dataset
The meta-parameters of HR are set up by using the simulation data available in the training dataset.
For each medoid m (k) and for each target image number i , we define an image-specific hyperreduced model denoted by H R(k, i ) without using the encoding mesh here. The hyper-reduced model incorporates a reduced basis for displacement fluctuations and a reduced basis for stresses (V (k,i ) , V σ(k,i ) ). These reduced bases are computed after the transfer of simulation data from the mesh of the medoid onto the mesh of the target problem, which is defect-dependent. The matrix of macroscopic modes related to a defect-free mechanical problem is denoted by V macro ∈ R N ×3 . The column number j in V macro is related to the macroscopic displacement E ( j ) x. The complete reduced basis for displacement reads as The hyper-reduced equilibrium equation is (13) with empirical modes obtained by substituting V (k,i ) for V in (8). A zone of interest is designed automatically around each defect. It contains four layers of elements from the border of the defect ∂ V Ω . The construction of the RID follows the procedure explained in Appendix so that Ω R contains the zone of interest. For each image #i in the training dataset, a hyper-reduced model is built. It is denoted by H R(k, i ) for each medoid (k = 1, . . . , 5). Hyper-reduction predictions are performed for the three macroscopic strains E ( j ) ( j = 1, . . . , 3). The predicted stresses on the RID are denoted by σ HR(k,i , j ) . The finite element prediction of this stress is denoted by σ (i , j ) . The exact error on the stress prediction in Ω (i ) R , i ∈ {1, . . . , n I }, reads as where · Ω R = Ω R · : · dΩ for stress tensors. Figure 7 reports the correlation between the exact error e HR(k,i ) and the chordal distance to the medoid. This plot contains 5 × n I points. The coordinates of these points are (D Ch m k ,i , e HR(k,i ) ). The average chordal distance is 0.7. The average error is 29%.
The correlation between e HR and d Ch is not perfect, but it is sufficient here for the clustering of simulation data. In some situations, e HR and d Ch may not be correlated. A good correlation requires that the target mechanical problem activate all the modes in V (i ) and V (k,i ) , which means that the related reduced coordinates have no null component. This is the case here. If one of the reduced components is null, the upper bound can be obviously simplified without using all the principal angles in d Ch (V (i ) , V (k,i ) ).  For each cluster of data, we have reported in Figures 8-12 the histograms of the density distribution of e HR(k,i ) , for i ∈ C k , inside each cluster separately.
The clustering results are interesting because most of the errors are below 29% (the average error on the whole dataset). However, many points have an error e HR bigger than 20%. The clustering may not perform well for clusters where there is a lack of similar data. Here, clusters C 2 and C 3 have few points with an error lower than 20%.
Some clusters could be rejected for HR in a sense that a full finite element prediction may be preferable for the defects in these clusters. In what follows, we do not reject any cluster, and we include simulation data related to all the medoids in the dictionary of digital twins.

Selection of a reduced digital twin in the dictionary
In this section, we restrict our attention to simulation data in the test set. This test set aims to evaluate the full modeling procedure for the fast prediction of stresses around defects. The average speedup for the solution of linear systems is 0.03/0.003 = 10 for hyper-reduced predictions. The average speedup for the computation of the stresses in this equation is 0.01/0.005 = 2. Speedups of approximately 1000 are obtained for similar 3D problems [20].
An error indicator has been developed for the selection in the dictionary of the medoid that is expected to give the best stress prediction via HR. This error indicator is similar to the error e HR(k,i ) , where the exact stress σ is replaced by the equilibrated stress σ eq . The error indicator  reads as where δσ (k,i , j ) is the stress correction in (23), which is computed by the equilibrium step related to H R(k, i ).
For each defect in the test set, we select a medoid k (i ) for stress prediction such that The best medoid selection aims to lower the error in the prediction of stresses: = argmin k e HR(k,i ) .
(67) Figure 13. Correlation plot between e HR(k,i ) and η (k,i ) for points in the test set. The color dots are related to the cluster index k.
Then, we obtain an automatic labeling of the defects in the test set. The estimated labels are denoted by L η i : L η i = argmin k η (k,i ) . The perfect labels are denoted by L i : L i = argmin k e HR(k,i ) . Figure 13 reports the correlation between e HR(k,i ) and the error indicator η (k,i ) .
There is a strong correlation between e HR(k,i ) and η (k,i ) for low errors: e HR(k,i ) ≈ η (k,i ) for e HR(k,i ) < 40%. This correlation exists for every cluster. As a result, this indicator helps us to find the medoid as if we have the exact finite element prediction.
The better the σ eq approximates the σ , the more accurate the error indicator. The error on equilibrated stresses reads as e eq(k,i ) = 100 (68) Figure 14 shows the correlation between e eq(k,i ) and η (k,i ) . Such a correlation is sufficient to select the same hyper-reduced order model for the predictions with or without an equilibrium step. The range of errors on equilibrated stresses is much smaller than that on σ HR . Therefore, the equilibrated stresses are the simulation outputs of interest for the reduced digital twin.
In Figures 15-19, for each cluster of data, we have reported the histograms of the density distribution of e eq(k,i ) , for L η i = k, inside each cluster separately. Most of the errors on equilibrated stresses are lower than 5%. The modeling procedure is very accurate. Errors related to medoids m 1 , m 4 , and m 5 are more concentrated below 5% than those for medoids m 2 and m 3 . Hyper-reduced predictions attached to medoids m 2 and m 3 are less accurate. Such results have been anticipated during the clustering procedure.

Detailed numerical results on hyper-reduced predictions
In this section, we report local numerical results for defects in the test set. The index of the first image is i = 1664. Its label via the error indicator is L η i = 3. This label is the optimal: L η i = L i . It is intentionally related to a medoid of the small cluster C 3 for which there is certainly a lack of observed defects. Related numerical results are shown in Figure 20. The prediction is globally accurate (e eq(3,1664) = 5%).  Better results are obtained for the test data attached to m 4 , the medoid of the largest cluster. The index of the second image is i = 1987. The global error is e eq(4,1987) = 0.2%. Local predictions of the shear stress for E (3) are reported in Figure 21.
Local predictions are very accurate for this defect certainly because cluster C 4 involves a large number of observed defects.

Conclusion
In this paper, local defects are compared according to a mechanical metric using model order reduction techniques and a Grassmann manifold. Each simulation data related to a defect in a training set is encoded as a point on a Grassmann manifold by using an ideal reduced basis of  displacement fluctuations around each defect. This approach can be easily extended to many types of local defects in homogeneous or locally homogeneous materials.
Five categories of defects are proposed by using a k-medoids algorithm via the Grassmann distance between the reduced bases attached to each defect. Each category of defect (also called a "cluster") has a representative defect in the training set, which is a medoid. It turns out that most defects in the proposed training set are almost spherical. However, some have a very complex shape, including several voids in interactions. The hyper-reduced predictions of the stresses around the defects in a given cluster are compared to the true simulation data. This pertains to the training data only as a validation of the modeling process. Most of the errors are below 29%. More training data would have probably improved the stress prediction by increasing the number of defect categories.    The distances to all the representative defects, for data in the test set or for newly observed defects, are evaluated by a fast error estimator on the hyper-reduced prediction of stress. The fact that the closest representative defect is a defect in the training set facilitates the interpretation of its selection.
The error estimation is restricted to an RID around the input defect. The equilibrated stresses involved in this error estimation are very accurate, with global errors on stress below 5% for data in the test set, when using the best representative defect. Such excellent results can be explained by the following facts: (i) the Grassmann metric is theoretically founded for hyper-reduced predictions as shown in this paper; (ii) both the simulation data of selected representative defects and the hyper-reduced equations enable the prediction of accurate Dirichlet boundary conditions on the boundary of the RID for the computation of equilibrated stresses. The larger the extent of this reduced domain, the more accurate the boundary conditions because they tend toward a given macroscopic value. However, the larger the RID, the higher the computational complexity of the error estimator. This complexity limits the number of defect categories when selecting the best representative defect for stress predictions.
To develop this promising approach to defect classification, more data are needed. Both data acquisition and data augmentation techniques are in development. In the future, more defect categories must be defined. In such a case, the selection of the best representative defect could be carried out by a deep classifier as in ROM-nets.

Appendix. Details about RID construction
Let us introduce two mathematical operators. The first mathematical operator collects the degrees of freedom of a subdomain Ω α : The second mathematical operator aggregates the support of finite element shape functions having their index in a set G : The extension of this subdomain by adding n layers of connected elements reads as The operator L is suitable for displacement fields because they are approximated by finite element shape functions. A similar operator is also introduced for stresses. When collecting simulation data related to stresses, in the matrix Q σ , we store all the stress components at all Gauss points for all elements. Each row of Q σ is related to one component of the stress tensor at a Gauss point in an element. Then, the DEIM algorithm applied to V σ gives a set of indices of components of the stress tensor at some Gauss points in some elements. This set is denoted by P σ . We denote by L σ (P σ ) the support of the elements related to set P σ . Here, L σ (P σ ) is a subdomain of Ω.
In this paper, the RID construction is the following: where Ω ZOI is the zone of interest.