Analytical algorithms for ligand cone angles calculations. Application to triphenylphosphine palladium complexes

Michel Petitjean

doi:10.1016/j.crci.2015.04.004

Analytical algorithms for ligand cone angles calculations. Application to triphenylphosphine palladium complexes
[Algorithmes analytiques pour le calcul des angles coniques de ligands. Application à des complexes palladium triphénylphosphine]

Michel Petitjean ¹

¹ MTi, UMR-S 973, INSERM, Université Paris-Diderot (Paris-7), France

Comptes Rendus. Chimie, Volume 18 (2015) no. 6, pp. 678-684.

Résumés

Anglais
Français

We defined the smallest enclosing cone angle as the Tolman cone angle for null atomic spheres radii. Then we provide a simple analytical algorithm to compute the smallest enclosing cone at fixed apex, which works in the case of unsymmetrical ligand. We applied it to compute ligand cones for a family of triphenylphosphine palladium complexes, and we showed that both the angle of the cone and its resulting solid angle strongly correlate with the Tolman cone angle, thus suggesting that there is no more need for atomic radii. We also defined the best cone of fixed apex fitting a population of unit vectors. We proposed a simple analytical algorithm to compute it, which is proved to work in any d-dimensional Euclidean space. We defined the conicity index κ to evaluate quantitavely the pertinence of the best fitting cone. We used this best fit cone to define a mean ligand cone, and thus a mean cone angle and a mean cone axis. We applied it to our family of triphenylphosphine palladium complexes and we observed that the axis of the individual cones deviated from the mean cone axis by at most 13.2°. The observed conicity index was small $(κ = 0.0177)$ , indicating a very good fit for the whole family of complexes.

Nous définissons l’angle du plus petit cône englobant comme étant l’angle conique de Tolman à rayons atomiques nuls. Puis, nous fournissons un algorithme analytique simple de calcul du plus petit cône englobant à apex fixé, qui fonctionne dans le cas des ligands non symétriques. Nous l’appliquons aux cônes de ligands pour une famille de complexes palladium triphenylphosphine et nous montrons qu’à la fois l’angle du cône et l’angle solide qui en résulte sont fortement corrélés avec l’angle conique de Tolman, suggérant ainsi qu’il n’y a plus besoin des rayons atomiques. Nous définissons aussi le meilleur cône moyen d’apex fixé pour une population de vecteurs unitaires. Nous proposons un algorithme analytique simple pour le calculer, que nous prouvons être valide dans tout espace euclidien d-dimensionnel. Nous définissons l’indice de conicité κ pour évaluer quantitativement la pertinence du meilleur cône. Nous utilisons ce meilleur cône pour définir un cône moyen de ligand, et donc un angle moyen de cône et un axe moyen de cône. Nous l’appliquons à notre famille de complexes palladium triphenylphosphine et nous observons que les axes individuels des cônes dévient de l’axe moyen de cône d’au plus 13,2°. L’indice de conicité observé est faible $(κ = 0,0177)$ , indiquant un très bon ajustement à l’ensemble de la famille de complexes.

Métadonnées

Reçu le : 2015-02-17
Accepté le : 2015-04-14
Publié le : 2015-05-20

DOI : 10.1016/j.crci.2015.04.004

Keywords: Tolman cone angle, Ligand cone angle, Minimal enclosing cone, Best fitting d-dimensional cone, Least squares, Conicity index
Mots-clés : Angle conique de Tolman, Angle conique de ligand, Cône minimal englobant, Meilleur cône moyen d-dimensionnel, Moindre carrés, Indice de conicité

Affiliations des auteurs :

Michel Petitjean ¹

¹ MTi, UMR-S 973, INSERM, Université Paris-Diderot (Paris-7), France

@article{CRCHIM_2015__18_6_678_0,
     author = {Michel Petitjean},
     title = {Analytical algorithms for ligand cone angles calculations. {Application} to triphenylphosphine palladium complexes},
     journal = {Comptes Rendus. Chimie},
     pages = {678--684},
     publisher = {Elsevier},
     volume = {18},
     number = {6},
     year = {2015},
     doi = {10.1016/j.crci.2015.04.004},
     language = {en},
}

TY  - JOUR
AU  - Michel Petitjean
TI  - Analytical algorithms for ligand cone angles calculations. Application to triphenylphosphine palladium complexes
JO  - Comptes Rendus. Chimie
PY  - 2015
SP  - 678
EP  - 684
VL  - 18
IS  - 6
PB  - Elsevier
DO  - 10.1016/j.crci.2015.04.004
LA  - en
ID  - CRCHIM_2015__18_6_678_0
ER  -

%0 Journal Article
%A Michel Petitjean
%T Analytical algorithms for ligand cone angles calculations. Application to triphenylphosphine palladium complexes
%J Comptes Rendus. Chimie
%D 2015
%P 678-684
%V 18
%N 6
%I Elsevier
%R 10.1016/j.crci.2015.04.004
%G en
%F CRCHIM_2015__18_6_678_0

Michel Petitjean. Analytical algorithms for ligand cone angles calculations. Application to triphenylphosphine palladium complexes. Comptes Rendus. Chimie, Volume 18 (2015) no. 6, pp. 678-684. doi : 10.1016/j.crci.2015.04.004. https://comptes-rendus.academie-sciences.fr/chimie/articles/10.1016/j.crci.2015.04.004/

Version originale du texte intégral

Le texte intégral ci-dessous peut contenir quelques erreurs de conversion par rapport à la version officielle de l'article publié.

1 Introduction

Ligand cone angles were introduced by Tolman to measure the size of phosphine derivatives and other phosphorus ligands [1]. This size is the solid angle defined by the smallest angle cone having its apex lying at 2.28 Å from the phosphorus atom and circumscribing the ligand atoms, usually modelled by spheres. The solid angle, expressed in steradians, is $θ = 2 π (1 - \cos α)$ , where α is the angle between the generatrix of the cone and its axis (see general definitions in Section 3.1). In the case of symmetric ligands PR₃, the Tolman cone angle is easy to compute because its axis is in the direction of the mean of the three vectors defined by the PR bonds. Then, given the radius of the spherical ligands, α is retrieved by elementary geometry calculus. This approach has to be refined for unsymmetrical ligands such as MPR₁R₂R₃ and more generally for MXR₁R₂R₃, where M = H or is a metal atom, and X = P, N, CH, or is any atom having a tetrahedral hybridisation such as sp³ or sd³. It was pointed out that even symmetrically substituted bulky phosphines may offer unsymmetrical conformations [2]. The difficulty of the calculation arises (i) when the spheres radii are unequal, and (ii) when the XR_i bonds are not symmetrically arranged around the MX axis. Tolman approximated θ as $({\hat{θ}}_{1} + {\hat{θ}}_{2} + {\hat{θ}}_{3}) / 3$ , where ${\hat{θ}}_{i}$ is the acute angle between the directions of the XR_i bond and of the MX bond [3,4]. This method was criticized because the obtained values may not reflect the properties of the ligand, particularly when the substituent groups differ greatly [5]. It seems that few geometric tools are available to measure steric effects in organometallic chemistry, and that could explain why ligand cone angles were much used in this field [6–9], Recently, Bilbrey et al. [10–12] proposed an analytic solution to the ligand cone angle calculation.

It was also proposed to measure the steric size of ligands and substituents by the solid angle generated by the union of the atomic spheres [13], rather than the one generated by their enclosing cone. This approach gives rise to an analytical calculation of the solid angle, provided that the intersections of more than two spheres could be neglected [13]. There is a non-linear relationship between the cone angle and the solid angle, which was measured quantitavely [14]. To evaluate the importance of sphere overlaps, an exact analytical calculation of sphere intersections was done with the ASV freeware [15] using the atomic radii recommended by Gavezzotti [16], and showed that intersections between 6 or 7 atoms are commonly observed in organic molecules [15,18]. These atomic radii are sometimes slightly larger than those given by Bondi [17], but it is recalled that an increase in the sphere radii does not guarantee an increase of the van der Waals surfaces. Running ASV on a database of 70 diverse ligands showed that neglecting the intersections of more than two atoms induced a mean error on van der Waals surface calculations of 249%, and that neglecting the intersections of more than three atoms led to a mean error of 87%, and neglecting the ones of more than four atoms led to a mean error to 16%, the maximal observed error in this case being 37% [18]. Despite that van der Waals surfaces are not used in ligand solid angles calculations, these numbers show the importance of atomic spheres overlaps.

An improved ligand solid angle algorithm was proposed, which takes into account spheres intersections of orders 3 and 4 [19], but it needs a complex numerical integration. Recently, Bilbrey et al. [12,20] proposed an analytic solution to the solid angle calculation, based on the decomposition of the solid angle contributions between those due to spherical polygon parts and those due to the resulting truncated spherical sectors parts. This algorithm, implemented by the authors in their Mathematica FindSolidAngle package, is effective for the simple geometrical arrangements expected to be encountered in chemistry. However, it is not specified how it works in general. E.g., the detection of potential multiple connected components got by projection at the surface of the unit sphere is not evoked, the solid angle subtended by an internal spherical polygon may be not void and may even be not unique, etc. The detection and the management of such situations let the algorithm rather difficult to implement.

The impact of conformational variations was evoked early [21], leading to use weighted average cone angles [22], while it was considered that this problem was overcome by the use of the solid angle methodology [23]. At the same time, Müller and Mingos noticed also that the Tolman cone angle definition does not take into account the variations due to conformational changes [24,25], and they used the atomic centers of the ligand atoms rather than their van der Waals spheres. Then they applied their algorithm to perform statistics on thousands of phosphine structures found in the Cambridge Crystallographic Data Base [26], and observed a variation in cone angles for specific ligands, which is much larger than had previously been suspected.

This slight change in the ligand cone angle calculation, that we retain here (see Fig. 1) offers two other major advantages: (i) the cone angle can be generalized to complex polyatomic ligands R_i via the calculation of the fixed apex minimal cone enclosing any desired number of atoms, and (ii) this calculation can be done analytically, as shown in Section 3 of the present paper. We emphasize that this generalization allows us to model molecular shapes and structural fragments with cones although it is usual to work with spherical models. Despite that is easy to compute spherical shapes, the spherical model was shown to be unrealistic and a cylindrical model was preferred for drug design applications [27]. Fortunately, minimal height enclosing cylinders and minimal radius enclosing cylinders are computable analytically [28]. However, it seems that apart cylinders and cones, it is hard to find the use of non-spherical molecular shapes in the literature: it may be due to the lack of simple analytical calculation algorithms.

Fig. 1
(a) Ligand cone, defined by Tolman [1]. (b) Smallest enclosing cone, defined here.

There are several ways to take into account the conformational changes of the ligands. We propose the following one. For each conformer, we know from the present analytical minimal enclosing cone algorithm which ligand atom centers are on the surface of the cone (see Section 3.5). We mark the atomic centers of these ligand atoms. These marked ligand atoms can differ from one conformer to another conformer, even for simple ligands such as Me or Et. Then, assuming a common origin in M, we are left with the problem of finding the best cone of fixed apex fitting all the marked atomic centers. We give in Section 3.3 an analytical solution to this problem, formulated as a least squares one. Then we define in Section 3.4 the conicity index κ, which takes values in the interval [0,1], the value κ=0 meaning that all marked atomic centers lies on the surface of the cone, and the value κ=1 being reached in the worst cases, characterized in Section 3.4. It is emphasized that, compared to our best fitting cone algorithm, computing some mean cone angle such as the arithmetic mean of individual cone angles, has drawbacks: such mean cone angle does not produce a mean axis, and computing some mean axis would not be coherent with the arithmetic mean of the cone angles. Furthermore, such a method would not permit to define a conicity index, although this latter provides quantitative information about the impact of conformational changes on steric effects. The axis of the best fitting cone is of interest because it gives rise to a second quantitative parameter: the acute planar angle between the axis of the smallest enclosing cone and the axis of the best fitting cone. This parameter indicates how the ligand size of MXR₁R₂R₃ deviates from the mean ligand size of the family. At the opposite of the well-known RMS (Root Mean Square) deviation, it does not need the knowledge of a mean conformer.

A minor problem is to suppress the impact on a best fit cone calculation of free rotations around the MX axis before aligning the conformers in a common Cartesian coordinate system. When R₁, R₂ and R₃ are different, a 3D rotation performed to optimally superpose each conformer on a common reference conformer solves the problem. It is proposed to set the pivot at M and to restrict this optimal rotation to X and to the respective three atoms of R₁, R₂ and R₃ that are bonded to X, rather to involve more atoms when the R_i are polyatomic. The reason is that extending the optimal superposition to more atoms may give poor alignments in the neighborhood of X in the case of bulky ligands, while for usual applications of cone angles the neighborhood of X is assumed to be more important than the rest of the ligands. Furthermore, the restriction to X and to its neighbours permits potential extensions to superpositions of different molecules MXR₁R₂R₃ rather than to different conformers of a common molecule MXR₁R₂R₃, thus generalizing the definition of the best fitting cone. After translating the M atom of each conformer at the origin, each desired optimal 3D rotation can be found by minimizing the RMS deviation by the least squares method implemented in the ARMS freeware, which is based on quaternions (see appendix in [29], or appendix A.5 in [30] for more general results about optimal rotations). When two or three ligands are identical, there are respectively two or six pairwise correspondences between the ligands atoms bonded to X. In this situation, the one with the smallest minimized RMS is retained.

2 Results and discussion

We exemplify our minimal enclosing cone algorithm using a family of palladium triphenylphosphines complexes (Table 1). The resulting cone is equivalent to the Tolman cone for null atomic radii. The angle values we got are in the range 57.7–64.6°, and should be compared with the half of the Tolman cone angle values, which ranged in the interval 150.3–173.6°. This difference of a factor 2 is due to our mathematical definition of the cone angle, which stands in E^d (see Section 3.1).

	Palladium complex (data from ref. [10])	cos α	α (degrees)	θ (steradians)	Cone angle from ref. [10]
1	Pd(PPh₃)	0.441571	63.796	3.509	170.0
2	Pd(PPh₃)₂(SN₂C₃)₂Cl	0.532900	57.798	2.935	150.3
3	Pd(PPh₃)₂(SN₂C₃)₂Cl	0.521229	58.585	3.008	155.4
4	Pd(PPh₃)(P₂OC₁₄H₉)Cl	0.534833	57.667	2.923	151.4
5	Pd(PPh₃)(SN₄O₂C₈H₇)Cl	0.514252	59.053	3.052	155.2
6	Pd(PPh₃)(SN₃C₉H₁₀)Cl	0.501834	59.879	3.130	156.9
7	Pd(PPh₃)(SH)	0.483577	61.081	3.245	160.8
8	Pd(PPh₃)₂(S₃NO₂C₇H₅)	0.489504	60.692	3.208	160.9
9	Pd(PPh₃)₂(S₃NO₂C₇H₅)	0.434539	64.244	3.553	173.6
10	Pd(PPh₃)(SNC₅H₄)	0.474321	61.685	3.303	163.1
11	Pd(PPh₃)(NFC₁₅H₁₅)Cl	0.485598	60.948	3.232	165.9
12	Pd(PPh₃)(SN₃C₁₀H₁₁)	0.458999	62.677	3.399	167.4
13	Pd(PPh₃)(S₂N₃C₈H₉)	0.458199	62.729	3.404	167.7
14	Pd(PPh₃)(SN₄C₈H₁₀)	0.451533	63.158	3.446	170.6
15	Pd(PPh₃)(NFC₁₁H₁₅)Cl	0.429580	64.559	3.584	172.2

The observed correlation coefficient between the ligand cone angle in [10] and our minimum enclosing cone angle α is r_α = 0.9800, and with our solid angle θ is r_θ = 0.9796, while α and θ are highly correlated (0.99995). Since the ligand cone angles encountered in the literature are almost all times used for empirical correlations with physical data, it is simpler to calculate α rather than the usual ligand cone angles because there is no need for atomic radii. The correlation between α and θ is not surprising because θ is a function of α. Then, the high value of the correlation coefficient indicates that the relationship can be estimated as linear for the considered ligands.

Published values of ligand cone angles are close to 180° (see Table 1) and can be greater than 180° for some nickel or platinium complexes [10], while 2α is around 120°. This difference is due to the exclusion of the ligand atomic spheres in the calculation of α. On the other hand, modeling the molecular shape by a cone should either take in account all atomic spheres, including the one of the metal (estimated to 1.63 Å for Pd [17]), or ignore all atomic spheres. Locating the apex of the cone enclosing all atomic spheres (including the metal one) would lead to a complex algorithm, and worse, would need the knowledge of adequate atomic radii. Ignoring only the atomic sphere of the metal gives large angle cones, close to a half plane, which are not realistic from the molecular shape point of view. Ignoring all atomic spheres leads to a very simple analytical algorithm (Section 3.5), it does not require atomic radii, the angle values are still pertinent for establishing empirical calculations, and the conical molecular shape is physically more realistic than a half plane. E.g., for the tetrakis Pd(PPh₃)₄ a solid angle close to 0.5 half space per Pd(PPh₃) part is more realistic than a solid angle around one half space because for this latter the sum of the four contributions of the Pd(PPh₃) parts is around two full spaces, thus indicating excessive cone intersections.

In order to define a mean cone for the 15 complexes of Table 1 and to evaluate quantitatively the dispersion around this mean cone, we operated as follows. The mean cone has sense only in a common Cartesian coordinate system: we selected Pd(PPh₃) (see Table 1) as the reference complex to perform a 3D superposition of each of the 14 other complexes onto this reference one. We set the palladium atom as the common origin for the 15 complexes and we computed the 14 optimal rotations as indicated at the end of Section 1. There was an additional difficulty due to the differences in the atom numbering of the complexes. Thus, to retrieve each of the 14 pairwise correspondences for the phosphorus and its neighboring carbons, we used the CSR freeware, based on an automatic 3D motif recognition [31].

For each of the 15 complexes, we got 3 contact points on the surface of their individual smallest enclosing cone, all with apex at the origin. These 45 points are in a common Cartesian coordinate system and thus we computed the best cone fitting these 45 points (see Section 3.3). This best fit cone has an angle $\bar{α} = 60.616 °$ while the smallest cone enclosing the 45 points has an angle of $68.090 °$ . The resulting 15 angles between the axis $\bar{u}$ of the best fit cone and the 15 individual cone axes are given in Table 2. This angle, denoted by γ, indicates how the ligand size deviates from the mean one of the family. At the opposite of the smallest enclosing cone angle, it is poorly correlated with the Tolman cone angle (correlation coefficient: 0.511).

Complex	1	2	3	4	5	6	7	8	9	10	11	12	13	14	15
γ	4.4	1.9	6.7	2.2	8.8	6.6	7.2	8.3	10.3	4.6	6.4	8.4	13.2	12.3	5.0

We measured the global dispersion of the directions of the 45 points around the surface of the best fit cone with the conicity index $κ$ (see Section 3.4), which takes values in 0;1. We found $κ = 0.0177$ . A null value would have meant that all 45 points are on the surface on the cone, although only 2 or 3 are expected to be found on the surface, in general (see Section 3.5). It indicates that the observed differences of conformations of the phenyl groups in the input structural files have little effect on the cone calculation. The largest angle was the one of Pd(PPh₃)(S₂N₃C₈H₉).

It is emphasized that the knowledge of the mean cone leads us to define not only a mean angle, but also a mean axis and deviations from this mean axis: that was not possible with usual ligand cone angle approaches.

The minimal angle enclosing cone algorithm and the best fitting d-dimensional cone algorithm were implemented in the freeware CONE. Sources are written in portable f77. Documentation and binaries for Mac OS 10 and 64 bits Intel linux platforms are available free of charge on a software repository located at http://petitjeanmichel.free.fr/itoweb.petitjean.freeware.html.

Running CONE on Windows platforms can be done through the installation of a linux emulator such as Cygwin (free). When needed, convex hull calculations (see Section 3.5) can be done with the freeware RADI. This latter can be found on the same software repository than CONE together with the freewares ARMS and ASV mentioned in Section 1, and with the CSR freeware mentioned in Section 2.

3 Appendix: analytical results and algorithms

3.1 Definitions and notations

Definition 1. In the the Euclidean space E^d, a cone of apex x₀ is a ruled surface generated by the set of all lines intersecting x₀ and having a constant angle α with a given axis containing x₀. Each of these lines is called a generatrix.

This axis is defined by a unit vector u, and we set conventionnally $c = \cos α$ as being a non-negative value. The case c=0 corresponds to the plane orthogonal to u and containing x₀. The case c=1 corresponds to a degenerated cone reduced to its axis. Generalizations to non-constant angles (non-circular cones) are not considered here. Thus, a cone in E^d is defined by 2d free parameters: x₀, c, and u. Its equation is the set of points x so that $u^{'} (x - x_{0}) = c ∥x - x_{0}∥$ , where the quote indicates a transposition operation and where the norm of $(x - x_{0})$ is $∥x - x_{0}∥ = \sqrt{(x - x_{0})' (x - x_{0})}$ .

Remark 1: Given the axis defined by u, the word cone applies in some contexts to the points x satisfying to the additional constraint $\sin α \geq 0$ , which in fact lets to consider only a half of the cone. In E³, this latter encompasses a solid angle equal to $2 π (1 - \cos α)$ . Unless otherwise stated, we retain the definition corresponding to a full cone.

Remark 2: In some contexts the cone is defined as the convex set such that $0 \leq u^{'} (x - x_{0}) \leq ∥x - x_{0}∥ \cos α$ , still with $\cos α \geq 0$ , in which case the half cone in the sense of Definition 1 is the boundary of this latter convex set. There are other variants. For convenience, we retain Definition 1.

We consider n+1 distinct given points $x_{i}, i = 0, 1, ..., n$ , in E^d. We define the unit vectors $v_{i} = (x_{i} - x_{0}) / ∥x_{i} - x_{0}∥, i = 1,..., n$ , and the matrix W with n lines and d columns, each line i containing $v_{i}^{'}, i = 1,..., n$ .

We use the following notations. I is the identity matrix of rank n. 1 is the vector having n components, all equal to 1. $A = I - 1 1' / n$ is the centering operator $(t h u s A = A^{'} = A^{2})$ . $T = W^{'} A W$ is the inertia matrix associated with W, i.e. T is n times the covariance matrix of the v_i, $i = 1,..., n$ . $P = I - u u^{'}$ is the projection matrix on the (d – 1)-dimensional subspace generated by the vectors orthogonal to u.

3.2 Calculation of a circumscribed cone

Definition 2. A cone circumscribed to k points is a cone such that the k points lie on its surface.

We assume that the apex x₀ is fixed and we would like to find the cone circumscribed to the points $x_{1}, ..., x_{n}$ . The resulting system has n equations $u^{'} v_{i} = c$ , $i = 1,..., n$ or, in matricial form, $W u = c 1$ . It has d unknowns and thus it is underdetermined when n<d and it is overdetermined when n>d. Thus we consider the case n=d and we assume that the square matrix W is invertible, i.e. no (d – 1)-dimensional plane contains the n points.

Theorem 1. The cone of apex x₀ circumscribed to n=d points in E^d has its axis in the direction of the unit vector $u = W^{- 1} 1 / \sqrt{1' {(W W^{'})}^{- 1} 1}$ and the cosine of its angle is $\cos α = 1 / \sqrt{1' {(W W^{'})}^{- 1} 1}$ .

Proof. W is invertible, thus $u = c W^{- 1} 1$ . Because u is a unit vector and c was conventionnally set non-negative, we get $c = 1 / \sqrt{1' {(W W^{'})}^{- 1} 1}$ and $u = W^{- 1} 1 / \sqrt{1' {(W W^{'})}^{- 1} 1}$ .

Remark 3: It is checked that we have indeed the solution value $c^{2} \leq 1$ as follows. The symmetric matrix $W W^{'}$ have non-negative eigenvalues and thus its largest eigenvalue cannot exceed its trace, this latter being equal to n. Then the smallest eigenvalue of ${(W W^{'})}^{- 1}$ cannot be smaller than 1/n, and from the Courant–Fischer minimax theorem [32], the quadratic form $1' {(W W^{'})}^{- 1} 1$ cannot be smaller than $(1' 1 / n) = 1$ , so that $c^{2} \leq 1$ .

Remark 4: There are two particular situations: c = 0 and c = 1. The case c=0 arises if and only if the square matrix W is non-invertible: $W^{'} 1 = 0$ . It is such that $u^{'} v_{i} = 0$ for all $i = 1,..., n$ , which corresponds to a cone which is a plane orthogonal to u and containing x₀. The case c=1 arises if and only if all the n quantities $u^{'} v_{i} = 1$ , which means that $v_{i} = u$ for all $i = 1,..., n$ : all points are aligned on the axis and the cone is degenerated.

3.3 Least squares best fitting cone

We consider the case $n \geq d$ . In general we cannot have the n equalities $W u - c 1 = 0$ all satisfied together, but we can minimize $S = {∥W u - c 1∥}^{2}$ , i.e. we look for the values of (c,u) minimizing $S = {(W u - c 1)}^{'} (W u - c 1)$ .

Theorem 2. The optimal unit vector u is the eigenvector associated with the smallest eigenvalue λ_d of the inertia matrix $T = W^{'} A W$ and the minimized sum of the squared distances of the n points to the surface of the cone is λ_d.

Proof. The solution of the optimization problem above should satisfy to $grad [S + L (1 - u^{'} u)] = 0$ , L being the Lagrangian associated with the constraint $u^{'} u = 1$ . It follows that $2 n c - 2 (1^{'} W u) = 0$ and that $2 W^{'} W u - 2 c W^{'} 1 - 2 L u = 0$ .

We get $c = 1' W u / n$ , and then $L = u^{'} W (I - 1 1' / n) W u$ , i.e. $L = u^{'} T u$ . Then we get $S = u^{'} T u$ and $(I - u u^{'}) W^{'} A W u = 0$ , i.e. $P T u = 0$ and $T u = L u$ . The latter equation is satisfied if and only if Tu is proportional to u, which means that u is an eigenvalue of T. Let $λ_{1}, ..., λ_{d}$ be the eigenvalues of T sorted in decreasing order. From the Courant–Fischer minimax theorem, the minimum of S is $S^{*} = λ_{d}$ and the maximum of S is λ₁.

Remark 5: When n = d, W is a square matrix so that AW is not of full rank because $1^{'} A = 0$ and thus $1^{'} A W = 0$ . Then, $T = W^{'} A W$ , which can be written $T = {(A W)}^{'} (A W)$ because $A^{2} = A$ , cannot be of full rank. We deduce that $λ_{d} = 0$ and we find again that S=0 and $W u = c 1$ .

Remark 6: For any n and discarding whether or not u is optimal, it is checked that we have indeed the optimal value $c^{2} \leq 1$ as follows. We have $n^{2} c^{2} = (1' W) u u^{'} (W^{'} 1)$ , which is a quadratic form, which cannot exceed $(1' W) (W^{'} 1)$ because the largest eigenvalue of $u u^{'}$ is 1. But $(1' W W^{'} 1)$ cannot exceed $1' 1 = n$ times the largest eigenvalue of $W W^{'}$ , this latter being not greater than the trace of $W W^{'}$ , i.e. n. Thus $n^{2} c^{2} \leq n^{2}$ and $c^{2} \leq 1$ . The sense of the eigenvector u is set to have a non-negative c value.

Remark 7: Another least squares method would be to minimize the sum of n squared distances of the points x_i to their orthogonal projection on the cone, rather than the sum of n squared distances of the v_i to their orthogonal projection on the cone. In this situation, a numerical minimizer is required.

3.4 Conicity index

Definition 3. The conicity index is $κ = d \cdot S^{*} / n$ .

Theorem 3. κ takes values in [0;1].

Proof. We look for the upper bound of the minimal $S^{*} = λ_{d}$ given d and n. The maximal value for λ_d is λ₁. In this situation all the d eigenvalues of T are equal to $T r (T) / d$ , i.e. $d \cdot S^{*} = T r (W^{'} W) - T r (W^{'} 1 1' W) / n$ . Thus $d \cdot S^{*} = n (1 - \bar{v}' \bar{v})$ , where the mean of the n unit vectors is $\bar{v} = W^{'} 1 / n$ . Since $∥\bar{v}∥ \leq 1, S^{*} \in [0, n / d]$ .

The value κ=0 indicates that all points lie at the surface of the cone. The value κ=1 indicates the worst possible fitting by the cone. It is reached when T is proportional to the identity matrix I and when $\bar{v} = 0$ . This extreme value is reached, e.g., for the vertices of a regular d-simplex with the apex at its center [33].

3.5 Smallest enclosing cone

Having a fixed apex x₀ and n input data points $x_{1}, ..., x_{n}$ , we propose in the case d=3 a minimal enclosing cone algorithm in the sense of a minimal angle α. Such a cone has 3 free parameters: the unit vector u and the angle α. A point x_i is enclosed in the cone if $\cos (x_{i} - x_{0}, u) \geq \cos α$ .

The smallest enclosing cone is sought among the minimal cones circumscribed to successively k = 1, 2, and 3 points and containing the n−k other points.

The trivial case k = 1 corresponds to n points aligned with x₀: if it is the case the algorithm stops.

The case k = 2 is solved via enumerating the $n (n - 1) / 2$ pairs of input points and computing for each pair its minimal circumscribed cone (the minimal cone circumscribed to 2 points x_i and x_j is such that its axis is bisecting the angle $x_{1} - x_{0} - x_{2}$ ). If the latter encloses all points, it is retained. If there is one or several retained cones, the one with the smallest angle is the solution and the algorithm stops.

If not, we enumerate the $n (n - 1) (n - 2) / 6$ triplets of input points and we compute for each triplet its circumscribed cone as shown in Section 3.2. The one with the smallest angle is retained and the algorithm terminates.

Remark 8: For some applications the minimal enclosing half cone of fixed apex x₀ needs to be considered. If it happens that the algorithm above does not output such a half cone, another algorithm is required, based on circumscribed half cones. If we add at each step of the above algorithm the constraint that all n points are enclosed in the same half cone, either a valid optimal half cone is returned, or no half cone is found. When x₀ is an extreme point of the convex hull of ${x_{0}, x_{1}, ..., x_{n}}$ , there is at least one enclosing half cone because a half cone is a convex set, and therefore the minimal enclosing half cone necessarily exists. When x₀ is in the interior of the convex hull of ${x_{0}, x_{1}, ..., x_{n}}$ , or equivalently, x₀ is interior to the convex hull of ${x_{1}, ..., x_{n}}$ , no enclosing half cone exists.

The convex hull of a finite set of points can be computed by standard methods such as the beneath-beyond method [34]. It is pointed out that, for large n values, the computation of the smallest enclosing half cone can be much faster if it is applied to the vertices of the convex hull of the n points rather than to the n points.

Remark 9: When a finite-volume revolution cone is needed, it is proposed to orthogonally project the n points x_i on the axis, then to close the conic solid by two circular disks orthogonal to the axis and intersecting it at the two extreme projected points. If a half cone was considered, only one disk is needed.

Acknowledgements

The author is grateful to one of the reviewers for his/her encouraging comments and for a pertinent suggestion.

Bibliographie

[1] C.A. Tolman J. Am. Chem. Soc., 92 (1970), p. 2956

[2] A. Immirzi; A. Musco Inorg. Chim. Acta, 25 (1977), p. L41

[3] C.A. Tolman; W.C. Seidel; L.W. Gosser J. Am. Chem. Soc., 96 (1974), p. 53

[4] C.A. Tolman Chem. Rev., 77 (1977), p. 313

[5] T.L. Brown; K.J. Lee Coord. Chem. Rev., 128 (1993), p. 89

[6] N.J. Coville; K. du Plooy; W. Pickl Coord. Chem. Rev., 116 (1992), p. 1

[7] N.J. Coville; M.S. Loonat; D. White; L. Carlton Organometallics, 11 (1992), p. 1082

[8] D. White; L. Carlton; N.J. Coville J. Organomet. Chem., 440 (1992), p. 15

[9] D. White; N.J. Coville Adv. Organomet. Chem., 36 (1994), p. 95

[10] J.A. Bilbrey; A.H. Kazez; J. Locklin; W.D. Allen J. Comput. Chem., 34 (2013), p. 1189

[11] J.A. Bilbrey; W.D. Allen Ann. Rep. Comp. Chem., 9 (2013), p. 3

[12] J.A. Bilbrey Doctoral thesis dissertation, sections 2 & 3, Georgia, Athens, 2014

[13] D. White; B. Craig Taverner; P.G.L. Leach; N.J. Coville J. Comput. Chem., 14 (1993), p. 1042

[14] I.A. Guzei; M. Wendt Dalton Trans., 33 (2006), p. 3991

[15] M. Petitjean J. Comput. Chem., 15 (1994), p. 507

[16] A. Gavezzotti J. Am. Chem. Soc., 105 (1983), p. 5220

[17] A. Bondi J. Phys. Chem., 68 (1964), p. 441

[18] M. Petitjean Distance geometry: theory, methods, and applications. Chap. 4 (A. Mucherino; C. Lavor; L. Liberti; N. Maculan, eds.), Springer, 2013, pp. 61-83

[19] B. Craig Taverner J. Comput. Chem., 17 (1996), p. 1612

[20] J.A. Bilbrey; A.H. Kazez; J. Locklin; W.D. Allen J. Chem. Theory Comput., 9 (2013), p. 5734

[21] J.T. DeSanto; J.A. Mosbo; B.N. Storhoff; P.L. Bock; R.E. Bloss Inorg. Chem., 19 (1980), p. 3086

[22] M. Chin; G.L. Durst; S.R. Head; P.L. Bock; J.A. Mosbo J. Organomet. Chem., 470 (1994), p. 73

[23] D. White; B. Craig Taverner; N.J. Coville; P.W. Wade J. Organomet. Chem., 495 (1995), p. 41

[24] T.E. Müller; D.M.P. Mingos Transit. Met. Chem., 20 (1995), p. 533

[25] D.M.P. Mingos Modern coordination chemistry: the legacy of Joseph Chatt. Section 3 (G.J. Leigh; N. Winterton, eds.), RSC, Cambridge, UK, 2002, pp. 69-78

[26] F.H. Allen Acta Cryst. B58 (2002), p. 380

[27] L. Benkaidali; F. André; B. Maouche; P. Siregar; M. Benyettou; F. Maurel; M. Petitjean Bioinformatics, 30 (2014), p. 792

[28] M. Petitjean Appl. Algebra Engrg. Comm. Comput., 23 (2012), p. 151

[29] M. Petitjean J. Math. Phys., 40 (1999), p. 4587

[30] M. Petitjean J. Math. Phys., 43 (2002), p. 4147

[31] M. Petitjean Comput. Chem., 22 (1998), p. 463

[32] P. Lascaux; R. Théodor Analyse numérique appliquée à l’art de l’ingénieur, tome 1, section 1.4.2, Masson, Paris, 1986

[33] M. Petitjean J. Math. Chem., 22 (1997), p. 185

[34] H. Edelsbrunner Algorithms in Combinatorial Geometry (W. Brauer; G. Rozenberg; A. Salomaa, eds.), Section 8.4, Springer-Verlag, Berlin, 1987, p. 147

Cité par

Derek J. Durand; Natalie Fey Computational Ligand Descriptors for Catalyst Design, Chemical Reviews, Volume 119 (2019) no. 11, p. 6561 | DOI:10.1021/acs.chemrev.8b00588
Michel Petitjean A Fast Algorithm to Compute Conical Pockets in Proteins. Application to the Structural Characterization of γ‐Carbonic Anhydrases, Molecular Informatics, Volume 36 (2017) no. 10 | DOI:10.1002/minf.201600155

Cité par 2 documents. Sources : Crossref

Commentaires - Politique

Il n'y a aucun commentaire pour cet article. Soyez le premier à écrire un commentaire !

Publier un nouveau commentaire:

Publier une nouvelle réponse: