In the past, data in which science and engineering is based, was scarce and frequently obtained by experiments proposed to verify a given hypothesis. Each experiment was able to yield only very limited data. Today, data is abundant and abundantly collected in each single experiment at a very small cost. Data-driven modeling and scientific discovery is a change of paradigm on how many problems, both in science and engineering, are addressed. Some scientific fields have been using artificial intelligence for some time due to the inherent difficulty in obtaining laws and equations to describe some phenomena. However, today data-driven approaches are also flooding fields like mechanics and materials science, where the traditional approach seemed to be highly satisfactory. In this paper we review the application of data-driven modeling and model learning procedures to different fields in science and engineering.
Accepted:
Published online:
Francisco J. Montáns 1; Francisco Chinesta 2; Rafael Gómez-Bombarelli 3; J. Nathan Kutz 4
@article{CRMECA_2019__347_11_845_0, author = {Francisco J. Mont\'ans and Francisco Chinesta and Rafael G\'omez-Bombarelli and J. Nathan Kutz}, title = {Data-driven modeling and learning in science and engineering}, journal = {Comptes Rendus. M\'ecanique}, pages = {845--855}, publisher = {Elsevier}, volume = {347}, number = {11}, year = {2019}, doi = {10.1016/j.crme.2019.11.009}, language = {en}, }
TY - JOUR AU - Francisco J. Montáns AU - Francisco Chinesta AU - Rafael Gómez-Bombarelli AU - J. Nathan Kutz TI - Data-driven modeling and learning in science and engineering JO - Comptes Rendus. Mécanique PY - 2019 SP - 845 EP - 855 VL - 347 IS - 11 PB - Elsevier DO - 10.1016/j.crme.2019.11.009 LA - en ID - CRMECA_2019__347_11_845_0 ER -
%0 Journal Article %A Francisco J. Montáns %A Francisco Chinesta %A Rafael Gómez-Bombarelli %A J. Nathan Kutz %T Data-driven modeling and learning in science and engineering %J Comptes Rendus. Mécanique %D 2019 %P 845-855 %V 347 %N 11 %I Elsevier %R 10.1016/j.crme.2019.11.009 %G en %F CRMECA_2019__347_11_845_0
Francisco J. Montáns; Francisco Chinesta; Rafael Gómez-Bombarelli; J. Nathan Kutz. Data-driven modeling and learning in science and engineering. Comptes Rendus. Mécanique, Data-Based Engineering Science and Technology, Volume 347 (2019) no. 11, pp. 845-855. doi : 10.1016/j.crme.2019.11.009. https://comptes-rendus.academie-sciences.fr/mecanique/articles/10.1016/j.crme.2019.11.009/
[1] https://en.wikipedia.org/wiki/Novum_Organum Wikipedia team (Addressed 11/22/2018):
[2] Could big data be the end of the theory in science? A few remarks on the epistemology of data-driven science, EMBO Rep., Volume 16 (2015) no. 10, pp. 1250-1255
[3] Data science: a comprehensive overview, ACM Comput. Surv., Volume 50 (2017) no. 3, p. 43
[4] From data mining to knowledge discovery in databases, AI Mag., Volume 17 (1996) no. 3, pp. 37-54
[5] The Fourth Paradigm: Data-Intensive Scientific Discovery, Vol. 1, Microsoft Research, Redmond, WA, 2009
[6] Pattern Recognition and Machine Learning, Information Science and Statistics, Springer-Verlag New York Inc., Secaucus, NJ, USA, 2006
[7] Data-driven approaches to empirical discovery, Artif. Intell., Volume 40 (1989), pp. 283-312
[8] Discovering governing equations from data by sparse identification of nonlinear dynamical systems, Proc. Natl. Acad. Sci. (2016) (201517384)
[9] Constrained sparse Galerkin regression, J. Fluid Mech., Volume 838 (2018), pp. 42-67
[10] Physics informed deep learning (Part II): data-driven discovery of nonlinear partial differential equations, 2017 | arXiv
[11] Data-driven discovery of partial differential equations, Sci. Adv., Volume 3 (2017) no. 4
[12] Data driven, predictive molecular dynamics for nanoscale flow simulations under uncertainty, J. Phys. Chem. B, Volume 117 (2013) no. 47, pp. 14808-14816
[13] The NIH big data to knowledge (BD2K) initiative, J. Amer. Med. Inform. Assoc., Volume 22 (2015) no. 6, p. 1114
[14] The iPlant collaborative: cyberintraestructure for enabling data to discovery for the life sciences, PLoS Biol., Volume 14 (2016) no. 1
[15] Mapping transcriptional networks in plants: data-driven discovery of novel biological mechanisms, Annu. Rev. Plant Biol., Volume 67 (2016), pp. 575-594
[16] Analysis of Big Data technologies for use in agro-environmental science, Environ. Model. Softw., Volume 84 (2016), pp. 494-504
[17] Distilling free-form natural laws from experimental data, Science, Volume 324 (2009) no. 5923, pp. 81-85
[18] Discovering conservation laws from data for control, Miami Beach, FL, Dec 17–19, 2018 (2018), pp. 6415-6421
[19] Computational homogenization of nonlinear elastic materials using neural networks, Int. J. Numer. Methods Eng., Volume 104 (2015), pp. 1061-1084
[20] A PGD-based multiscale formulation for non-linear solid mechanics under small deformations, Comput. Methods Appl. Mech. Eng., Volume 305 (2016), pp. 806-826
[21] Two-stage data-driven homogenization for nonlinear solids using a reduced order model, Eur. J. Mech. A, Solids, Volume 69 (2018), pp. 201-220
[22] A data-driven computational homogenization method based on neural networks for the nonlinear anisotropic electrical response of graphene/polymer nanocomposites, Comput. Mech., Volume 64 (2019), pp. 307-321
[23] Data-driven reduced-order models for rank-ordering the high cycle fatigue performance of polycrystalline microstructures, Mater. Des., Volume 154 (2018), pp. 170-183 | DOI
[24] Using machine learning and a data-driven approach to identify the small fatigue crack driving force in polycrystalline materials, Comput. Mech., Volume 4 (2018), p. 35 | DOI
[25] Data-driven multi-scale multi-physics models to derive process-structure-property relationships for additive manufacturing, Comput. Mech., Volume 61 (2018), pp. 521-541
[26] A multiscale multi-permeability poroplasticity model linked by recursive homogenizations and deep learning, Comput. Methods Appl. Mech. Eng., Volume 334 (2018), pp. 337-380
[27] A numerical method for homogenization in non-linear elasticity, Comput. Mech., Volume 40 (2007), pp. 281-298
[28] A priori hyperreduction method: an adaptive approach, J. Comput. Phys., Volume 202 (2005), pp. 346-366
[29] Hyper-reduction of mechanical models involving internal variables, Int. J. Numer. Methods Eng., Volume 77 (2009) no. 1, pp. 75-89
[30] Parametric PGD based computational vademecum for efficient design, optimization and control, Arch. Comput. Methods Eng., Volume 20 (2013) no. 1, pp. 31-59
[31] Proper generalized decomposition for multiscale and multiphysics problems, Arch. Comput. Methods Eng., Volume 17 (2010), pp. 351-372
[32] A PGD-based homogenization technique for the resolution of nonlinear multiscale problems, Comput. Methods Appl. Mech. Eng., Volume 267 (2013), pp. 275-292
[33] Model order reduction for real-time data assimilation through extended Kalman filters, Comput. Methods Appl. Mech. Eng., Volume 326 (2017), pp. 679-693
[34] A framework for data-driven analysis of materials under uncertainly: countering the curse of dimensionality, Comput. Methods Appl. Mech. Eng., Volume 320 (2017), pp. 633-667
[35] From virtual clustering analysis to self-consistent clustering analysis: a mathematical study, Comput. Mech., Volume 62 (2018), pp. 1143-1460
[36] A data-driven clustering method for time course gene expression data, Nucleic Acids Res., Volume 34 (2006), pp. 1261-1269
[37] Numerically explicit potentials for the homogenization of nonlinear elastic homogeneous materials, Comput. Methods Appl. Mech. Eng., Volume 198 (2009), pp. 2723-2737
[38] Computational homogenization method and reduced database model for hyperelastic heterogeneous structures, Int. J. Multiscale Comput. Eng., Volume 11 (2013), pp. 201-225
[39] Multiscale structural topology optimization with an approximate constitutive model for local material microstructure, Comput. Methods Appl. Mech. Eng., Volume 104 (2015), pp. 1061-1084
[40] Data-driven reduced-order models for rank-ordering the high cycle fatigue performance of polycrystalline microstructures, Mater. Des., Volume 154 (2018), pp. 170-183 | DOI
[41] Hyper-reduction framework for model calibration in plasticity-induced fatigue, Adv. Model. Simul. Eng. Sci., Volume 3 (2016) no. 15
[42] Materials data science: current status and future outlook, Annu. Rev. Mater. Res., Volume 45 (2015), pp. 171-193
[43] Modeling diffusion in random heterogeneous media: data-driven models, stochastic collocation and the variational multiscale method, J. Comput. Phys., Volume 226 (2007), pp. 326-353
[44] Machine learning in materials informatics: recent applications and prospects, Comput. Mat., Volume 3 (2017), p. 54 | DOI
[45] A model reduction technique based on the PGD for elastic-viscoplastic computational analysis, Comput. Mech., Volume 51 (2013), pp. 83-92
[46] Numerical characterization of uncured elastomers by a neural network based approach, Comput. Struct., Volume 182 (2017), pp. 504-525
[47] Prediction of the tensile response of carbon black filled rubber blends by artificial neural network, Polymers, Volume 10 (2018) no. 6, p. 644
[48] Numerical optimization of wear performance. Utilizing a metamodel based friction law, Comput. Struct., Volume 165 (2016), pp. 10-23
[49] Computational intelligence for efficient numerical design of structures with uncertain parameters, 2015 IEEE Symposium Series on Computational Intelligence, 2015, pp. 1824-1831
[50] A nonlinear manifold-based reduced order model for multiscale analysis of heterogeneous hyperelastic materials, J. Comput. Phys., Volume 313 (2016), pp. 635-653
[51] A model of incompressible isotropic hyperelastic material behavior using spline interpolations of tension-compression data, Commun. Numer. Methods Eng., Volume 25 (2009) no. 1, pp. 53-63
[52] WYPIWYG hyperelasticity for isotropic, compressible materials, Comput. Mech., Volume 59 (2017) no. 1, pp. 73-92
[53] Function-refresh algorithms for determining the stored energy density of nonlinear elastic orthotropic materials directly from experimental data, Int. J. Non-Linear Mech., Volume 107 (2018), pp. 16-33
[54] General solution procedures to compute the stored energy density of conservative solids directly from experimental data, Int. J. Eng. Sci., Volume 141 (2019), pp. 16-34
[55] WYPiWYG damage mechanics for soft materials: a data-driven approach, Arch. Comput. Methods Eng., Volume 25 (2018), pp. 165-193
[56] Fully anisotropic finite strain viscoelasticity based on a reverse multiplicative decomposition and logarithmic strains, Comput. Struct., Volume 163 (2016), pp. 56-70
[57] Strain-level dependent nonequilibrium anisotropic viscoelasticity: application to the abdominal muscle, J. Biomech. Eng., Volume 139 (2017) no. 10
[58] Data-driven computational mechanics, Comput. Methods Appl. Mech. Eng., Volume 304 (2016), pp. 81-101
[59] A. Leygue, M. Coret, J. Rethore, L. Stainier, E. Verron, Data driven constitutive identification, 2017, HAL ID: hal-01452492v2.
[60] A data-driven approach to nonlinear elasticity, Comput. Struct., Volume 194 (2018), pp. 97-115
[61] Data-driven non-linear elasticity: constitutive manifold construction and problem discretization, Comput. Mech., Volume 60 (2017), pp. 813-826
[62] A manifold-based methodological approach to data-driven computational elasticity and inelasticity, Arch. Comput. Methods Eng., Volume 25 (2018) no. 1, pp. 47-57
[63] Thermodynamically consistent data-driven computational mechanics, Contin. Mech. Thermodyn., Volume 31 (2019) no. 1, pp. 239-253 | DOI
[64] Data driven computing with noisy material data sets, Comput. Methods Appl. Mech. Eng., Volume 326 (2017), pp. 622-641
[65] Maximum entropy method in image processing, IEE Proc. F, Commun. Radar Signal Process., Volume 131 (1984), pp. 646-659
[66] Data-driven elastic models for cloth: modeling and measurement, ACM Trans. Graph., Volume 30 (2011) no. 4, p. 71
[67] Hybrid constitutive modeling: data-driven learning of corrections to plasticity models, Int. J. Mater. Form., Volume 12 (2019) no. 4, pp. 717-725 | DOI
[68] Data-driven reduced-order model of microtubule mechanics, Cytoskeleton, Volume 75 (2018), pp. 45-60
[69] The mobilize center: an NIH big data to knowledge center to advance human movement research and improve mobility, J. Amer. Med. Inform. Assoc., Volume 22 (2015) no. 6, pp. 1120-1125
[70] Efficient data-driven model learning for dynamical systems, ISMB 2016, Orlando, July 8-12, 2016 (2016)
[71] Data-driven modeling of biological multi-scale processes, J. Coupled Syst. Multiscale Dyn., Volume 3 (2015) no. 2, pp. 101-121 | DOI
[72] Artificial immune pattern recognition for structure damage classification, Comput. Struct., Volume 87 (2013), pp. 1394-1407
[73] Architecture for an artificial immune system, Evol. Comput., Volume 8 (2000) no. 4, pp. 443-473
[74] Learning using an artificial immune system, J. Netw. Comput. Appl., Volume 19 (1996) no. 2, pp. 189-212
[75] A bioinspired methodology based on an artificial immune system for damage detection in structural health monitoring, Shock Vib., Volume 2015 (2014)
[76] Detection and classification of structural changes using artificial immune systems and fuzzy clustering, Int. J. Bio-Inspir. Comput., Volume 9 (2017) no. 1, pp. 35-52
[77] Structural Health Monitoring of Bridges: Physics-Based Assessment and Data-Driven Damage Identification, Universidade do Porto, 2016 (Ph. D. thesis)
[78] Experimental investigation of damage evolution by data-driven stochastic subspace identification and iterative finite element model updating, Arch. Appl. Mech., Volume 77 (2007) no. 8, pp. 559-573
[79] Theory-guided data science: a new paradigm for scientific discovery from data, IEEE Trans. Knowl. Data Eng., Volume 29 (2017), pp. 2318-2331
[80] Data-driven, structure-based hyperelastic manifolds: a macro-micro-macro approach | arXiv
[81] A review on basic data-driven approaches for industrial process monitoring, IEEE Trans. Ind. Electron., Volume 61 (2014) no. 11, pp. 6418-6428
[82] Virtual, digital and hybrid twins. A new paradigm in data-based engineering and engineered data, Arch. Comput. Methods Eng. (2019) (in press) | DOI
[83] Data driven production modeling and simulation of complex automobile general assembly plant, Comput. Ind., Volume 62 (2011) no. 7, pp. 765-775
[84] Microbial engineering for the production of advanced biofuels, Nature, Volume 488 (2012) no. 7411, p. 320
[85] Data-driven soft sensors in the process industry, Comput. Chem. Eng., Volume 33 (2009) no. 4, pp. 795-814
[86] Data-driven monitoring for stochastic systems and its application on batch process, Int. J. Syst. Sci., Volume 44 (2013) no. 7, pp. 1366-1376
[87] A hybrid physics-based and data driven approach to optimal control of building cooling/heating systems, IEEE Trans. Autom. Sci. Eng., Volume 13 (2014) no. 2, pp. 600-610
[88] Harness Oil and Gas Big Data with Analytics: Optimize Exploration and Production with Data-Driven Models, Wiley, New Jersey, 2014
[89] Full field reservoir modeling of shale assets using advanced data-driven analytics, Geosci. Front., Volume 7 (2016) no. 1, pp. 11-20
[90] A physics-based data-driven model for history matching, prediction, and characterization of unconventional reservoirs, Soc. Pet. Eng. J., Volume 23 (2018) no. 4
[91] A physics-based data-driven model for history matching, prediction, and characterization of waterflooding performance, Soc. Pet. Eng. J. (2018) | DOI
[92] Combining physics-based and data-driven models for estimation of WOB during ultra-deep ocean drilling, Madrid, Spain (2018) (Paper OMAE2018-78229, V008T11A007) | DOI
[93] An advanced statistical approach to data-driven earthquake engineering, J. Earthq. Eng. (2018) (in press) | DOI
[94] New approaches in turbulence and transition modeling using data-driven techniques, AIAA SciTech Forum AIAA2015-1284 (2015), pp. 1-14
[95] A paradigm for data-driven predictive modeling using field inversion and machine learning, J. Comput. Phys., Volume 305 (2016), pp. 758-774
[96] Accelerating fluid simulation with convolutional networks, 2017 | arXiv
[97] Uncertainty and data-driven model advances for a jet-in-crossflow, J. Turbomach., Volume 139 (2016) no. 2
[98] Turbulence modeling in the age of data, Annu. Rev. Fluid Mech., Volume 51 (2019), pp. 357-377 | DOI
[99] Data-driven model reduction and transfer operator approximation, J. Nonlinear Sci., Volume 28 (2018) no. 3, pp. 985-1010
[100] Temporal extrapolation of quasi-periodic solutions via DMD-like methods, AIAA Aviation Forum (AIAA 2018-3092) (2018) | DOI
[101] Shape optimization by means of proper orthogonal decomposition and dynamic mode decomposition, 2018 | arXiv
[102] Learning slosh dynamics by means of data, Comput. Mech., Volume 64 (2019), pp. 511-523
[103] Group behavior from video: a data-driven approach to crowd simulation, SCA '07: Proceedings of the 2007 ACM SIGGRAPH/Eurographics Symposium on Computer Animation, 2007, pp. 109-118
[104] The PAG crowd: a graph based approach for efficient data-driven crowd simulation, Comput. Graph. Forum, Volume 33 (2014) no. 8, pp. 95-108
[105] Video-driven group behavior simulation based on social comparison theory, Physica A Statist. Mech. Appl., Volume 512 (2018), pp. 620-634
[106] Data-Driven Modeling for Diabetes, Lecture Notes in Bioengineering, Springer-Verlag, Berlin, 2014
[107] Data-driven estimation of cardiac electrical diffusivity from 12-lead ECG signals, Med. Image Anal., Volume 18 (2014) no. 8, pp. 1361-1376
[108] Data-driven brain MRI segmentation supported on edge confidence and a priori tissue information, IEEE Trans. Med. Imaging, Volume 25 (2006) no. 1, pp. 74-83
[109] Diagnostic reasoning, Ann. Intern. Med., Volume 110 (1989) no. 11, pp. 893-900
[110] Computer-assisted diagnosis of breast cancer using a data-driven Bayesian belief network, Int. J. Med. Inform., Volume 54 (1999) no. 2, pp. 115-126
[111] Combination data mining methods with new medical data to predicting outcome of coronary heart disease, Gyeongju, South Korea, 21-23 Nov 2007 (2007) | DOI
[112] Big data application in biomedical research and health care: a literature review, Biomed. Inform. Insights, Volume 8 (2016), pp. 1-10
[113] Hit and lead generation: beyond high-throughput screening, Nat. Rev. Drug Discov., Volume 2 (2003), pp. 369-378
[114] Deep learning for drug design: an artificial intelligence paradigm for drug discovery in the big data era, AAPS J., Volume 20 (2018), p. 58
[115] Strategic trends in the drug industry, Drug Discov. Today, Volume 8 (2003), pp. 411-420
[116] Applying computational modeling to drug discovery and development, Drug Discov. Today, Volume 11 (2006) no. 17–18, pp. 806-811
[117] From targets to leads: the importance of advanced data analysis for decision support in drug discovery, Curr. Opin. Drug Discov. Devel., Volume 8 (2005) no. 3, pp. 334-346
[118] Data integration: challenges for drug discovery, Nat. Rev. Drug Discov., Volume 4 (2005), pp. 45-58
[119] Improving the hit-to-lead process: data-driven assessment of drug-like and lead-like screening hits, Drug Discov. Today, Volume 11 (2006), pp. 175-180
[120] ConTour: data-driven exploration of multi-relational datasets for drug discovery, IEEE Trans. Vis. Comput. Graph., Volume 20 (2014) no. 12, pp. 1883-1892
[121] Data reduction and representation in drug discovery, Drug Discov. Today, Volume 12 (2007) no. 1–2, pp. 45-53
[122] Automatic chemical design using a data-driven continuous representation of molecules, ACS Cent. Sci., Volume 4 (2018) no. 2, pp. 268-276
[123] Grammar variational autoencoder, 2017 | arXiv
[124] Junction tree variational autoencoder for molecular graph generation, 2018 | arXiv
[125] Syntax-directed variational autoencoder for structured data, 2018 | arXiv
[126] Deep reinforcement learning for de novo drug design, Sci. Adv., Volume 4 (2018) no. 7
[127] Objective-reinforced generative adversarial networks (ORGAN) for sequence generation models, 2017 | arXiv
[128] Boeing's 737 Max: 1960s Design, 1990s Computing Power and Paper Manuals, 2019 (The New York Times, NY Edition, 9 Apr 2019, A1)
[129] Executable data quality models, Proc. Comput. Sci., Volume 104 (2017), pp. 138-145
[130] Towards a Context-Dependent Numerical Data Quality Evaluation Framework, 2018 (Technical report) | arXiv
[131] A framework for analysis of data quality research, IEEE Trans. Knowl. Data Eng., Volume 7 (1995) no. 4, pp. 623-639
[132] Rethinking big data: a review on the data quality and usage issues, ISPRS J. Photogramm. Remote Sens., Volume 115 (2016), pp. 134-142
[133] Quality assessment for linked data: a survey, Semant. Web, Volume 7 (2019) no. 1, pp. 63-93
[134] Data and Information Quality. Dimensions, Principles and Techniques, Springer Nature Switzerland AG, 2018
[135] Metrics for the evaluation of data quality of signal data in industrial processes, INDIN, 24-26 July 2017 (2017) | DOI
[136] Observational data patterns for time series data quality assessment, Proceedings of the IEEE 10th International Conference on eScience, vol. 1, IEEE Computer Society, 2014, pp. 271-277
[137] Data quality in time series data. An experience report, Paris, France, 31 Aug 2016 (2016), pp. 41-49
[138] Data quality evaluation and improvement for prognostic modeling using visual assessment based data partitioning method, Comput. Ind., Volume 64 (2013), pp. 214-225
[139] Solving Bayesian inverse problems from the perspective of deep generative networks, Comput. Mech., Volume 64 (2019), pp. 395-408
[140] Parametric Gaussian process regression for big data, Comput. Mech., Volume 64 (2019), pp. 409-416
[141] Conditional deep surrogate models for stochastic, high dimensional, and multifidelity systems, Comput. Mech., Volume 64 (2019), pp. 417-434
[142] Data-driven probability concentration and sampling on manifold, J. Comput. Phys., Volume 321 (2016), pp. 242-258
[143] Probabilistic learning form modeling and quantifying model-form uncertainties in nonlinear computational mechanics, Int. J. Numer. Methods Eng., Volume 117 (2019), pp. 819-843
[144] Science of science, Science, Volume 359 (2018) no. 6379 | DOI
[145] Data-driven predictions in the science of science, Science, Volume 355 (2017) no. 6324, pp. 477-480
Cited by Sources:
Comments - Policy