Tetrahedral Homonuclear Tetrameric Species: Occurrence, Forms, Structures, Properties, and Perspectives

Tetrahedral homonuclear tetrameric species are neutral or ionic tetrahedra of chemical elements, true molecular tetrahedra in strict geometrical sense. The aim of this work was to find all true elemental tetrahedra experimentally determined as stable and capable of being transferred from one medium/condition to another, to rationalize their structures and properties


Introduction
In the strict geometrical sense, a tetrahedron is a three-dimensional polyhedron consisting of four triangles (triangular faces) joined at their edges (six edges), and the edges are joined into vertices (four vertices). 1Therefore, geometry, as a mathematical discipline, is concerned with the shape of a tetrahedron, not with its internal structure.Geometry supposes that the entire tetrahedron is made of homogeneous matter.In the world of small molecules, the closest to a geometrical tetrahedron are tetrahedral homonuclear tetrameric species (neutral molecules or ions), whose four atoms are situated at the vertices, and covalent chemical bonds are the edges of the triangular faces.Within such a tetrahedron, there are no atoms, and these molecules can be called true tetrahedral molecules.They are not so much known among chemists.Chemistry textbooks at the university level usually illustrate a tetrahedral molecule of white phosphorus P 4 , [2][3][4] while molecules As 4 and Sb 4 are just mentioned as existing in the vapour phase, 2,3 and As 4 as molecules of yellow arsenic. 3What in chemistry is usually called a tetrahedron (tetrahedral structure 5,6 ), is a molecule having four atoms at the four vertices and the fifth atom at the centre of a fictive geometrical tetrahedron, while the chemical bonds exist only inside the tetra-hedron, between the central atom and the vertex atoms.In such a case, edges of the tetrahedron do not exist, which is why the tetrahedron is fictive; in fact, the edges should be imagined as directional non-bonding interactions between two atoms bound to the central atom.Such tetrahedral molecules, starting with the tetrahedral carbon atom, were gradually accepted in chemistry during the 19 th and 20 th centuries, [7][8][9] especially in organic chemistry and inorganic stereochemistry.Molecules CH 4 , CCl 4 , and CMe 4 are didactical examples of such tetrahedra. 6,10,11is work deals with true tetrahedra of chemical elements: explores their existence and occurrence in various forms, rationalizes their structures and properties, and seeks whether materials containing such tetrahedra have some technological importance.In the methodological sense, the research follows the sequence: database -datasetanalysis -interpretation. 12,13Firstly, the database of true tretrahedra was formed based on reliable literature sources.An appropriate dataset of molecular descriptors was then generated.Finally, the dataset was analysed by means of various statistical and chemometric methods, and interpreted in terms of both statistics and chemistry.An important issue resulting from the analyses is the set of conditions the chemical elements must satisfy to form stable tetrahedral species, capable of surviving for a significant period as independent species, and being able to transfer from one medium to another.

Database and dataset formation for tetrahedral homonuclear tetramers
A database for tetrahedral homonuclear tetramers was created through an extensive literature search for known tetrahedral homonuclear tetrameric species (both neutral molecules and ions) that are stable and may exist independently from other species.The existence of such tetrameric species (regular or irregular tetrahedra) has been determined experimentally, and in special cases, theoretical calculations were supported by strong experimental evidence that indicated the existence of the species in question.This way, the set of periodic groups of chemical elements was defined as the block of elements whose members may exist as tetrahedral species.
Once the database had been formed, a corresponding dataset was generated from steric and electronic descriptors for chemical elements of the aforementioned groups and two neighbouring groups.The descriptors were from relatively novel and relevant literature.The dataset was created with the purpose of rationalising the existence of stable tetrahedral homonuclear tetramers throughout the periodic table, by means of statistical and chemometric analysis of relevant descriptors.

Statistical and chemometric analyses
The descriptors dataset was analysed for mutual correlations and their statistical significance using programming, numeric computing and graphics software Scilab 14 , and online statistical software Value from Pearson (R) Calculator. 15The dataset, in matrix form, was normalised using programmed autoscaling in Scilab, 14 where the values for each descriptor were centred by subtracting its mean value and then divided by its standard deviation.Additionally, the values were divided by the square root of n-1 (n is the number of data, i.e. the number of tetrahedral species).When such an autoscaled matrix is multiplied by its transpose, the correlation matrix is obtained, 16 and this fact was used in further programming in Scilab. 14Each correlation coefficient, r, from the correlation matrix was then tested for its statistical significance using the online two-tailed t-test, 15 with degrees of freedom df = n-2.The t-parameter was defined by Eq. (1).The online software for the t-test 15 automatically calculates the t-parameter and then the probability for the null-hypothesis, the hypothesis stating that the correlation coefficient is not statistically significantly different from zero.
A sub-dataset in matrix form was made from moderately to highly correlated descriptors: the matrix X had n rows (samples, tetrahedral species) and m columns (variables, selected descriptors).It was then processed using data compression methods, hierarchical cluster analysis (HCA) and principal component analysis (PCA), 17,18 which were carried out using chemometrics package Pirouette, 19 a multivariate data analysis software.The software performs automatic calculations once methods and conditions are selected, and the results are presented in the form of tables and graphics.For the data preprocessing method, autoscaling was selected, both for HCA and PCA, meaning that each column of the data matrix was mean-centred and divided by its respective standard deviation.HCA for samples (tetrahedral species) was conducted for all available connectivity schemes to determine which one would yield the best chemical clustering.Autoscaled data matrix is an m-dimensional space in which the objects (tetrahedral species) are points, among which Euclidean distances are calculated using Pirouette. 19The samples are then linked into clusters according to a selected connectivity scheme (linking criterion), starting from two-membered clusters and finishing with all the samples as the final cluster.The graphics output, the dendrogram of samples, is based on the calculation of the similarity index, S AB , 17,18 among any pair of clusters A and B, as defined by Eq. ( 2): where d AB is the distance between the two clusters and d max is the maximum distance among all clusters.
PCA is a data compression method in which m original variables (columns of the data matrix X) are transformed into m principal components (PCs) through a series of matrix equations.The new variables are linear combinations of the old variables, and are arranged in descending order of importance, indicating their contribution to the total variance.PCA in Pirouette 19 is based on SVD (singular value decomposition) and NIPALS (nonlinear iterative partial least squares) algorithms. 17The final results of these procedures are variances and percent variances for all PCs, and coordinates of the old variables (loadings) and samples (scores) in the space of the PCs, in tabular and graphic form.However, Pirouette 19 also generates matrix X as the reconstruction of the original matrix X, using a definite number k of PCs, so that the residual matrix Ê is defined by Eq. ( 3).
Furthermore, the residuals for the i-th sample form a row vector according to Eq. ( 4).
The final measure of residuals for all n samples and k PCs is the predicted residual error sum of squares (PRESS), calculated as the sum of products of these vectors with their respective transpose vectors, as defined by Eq. ( 5).

T ˆˆ PRESS
PRESS and percent variance can be used to determine the optimal number of PCs, but they are not as sensitive as VPRESS (validated predicted residual error sum of squares), an analogue of PRESS.For this purpose, leave-one-out cross-validation was selected in Pirouette 19 and VPRESS values were obtained for all numbers of PCs.Leave-oneout cross-validation is a procedure in which n reconstructions of the matrix X are obtained.In this process, the i-th sample is deleted from X, PCA is carried out, and the reconstructed cv X is stored.At the end of leave-one-out cross-validation, the average cv X is calculated, and the re- siduals for the i-th sample form a row vector obtained using Eq.(6).

VPRE ˆŜS
∑ e e (7)   HCA and PCA, also known as exploratory data analysis, 17,18 are very useful for chemical interpretation of chemometric and statistical results.Some examples of the application of HCA and PCA include quantitative relationships between molecular structure and measured biological activity 20,21 or physicochemical properties, 22 structural features of molecules, 12,13,23 molecular modelling problems, 24 and analysis of statistical results, 25 among others.
Other analyses for nonlinear relationships between some variables included parabolic (second-order polynomial) regressions, tested graphically by online software Polynomial Regression Data Fit, 26 which uses the system of normal equations in matrix form, and the equations are solved via Gauss-Jordan elimination.There are no estimated standard deviations calculated for regression coefficients.That was the reason why the regression analyses with matrix inversion were carefully carried out by programming in Scilab. 14The vector y for a dependent variable and the matrix X for an independent variable (first column -ones, second column -values of the independent variable, third column -squares of these values) were defined for each regression.The matrix equation y = Xc, where c is the vector of regression coefficients, was solved with the set of matrix operations, and predicted ŷ was obtained, using Eqs.( 8)- (12).
= DA c (11)   ˆ= X y c (12)   Estimated standard deviations for regression coefficients σ(c i ) were calculated as diagonal elements of the variance -covariance matrix.This matrix is the product of matrix D with the estimated variance of residuals σ 2 when the residuals follow a normal distribution. 27Starting with the column vector of residuals and with k = 2 (number of variables in parabolic regression), the following expressions were programmed: (15) The coefficient of determination r 2 and the F-ratio were determined for F-test for a regression equation, and t-parameter was calculated for each regression coefficient for t-test, using expressions (Eqs.( 16)-( 19)): (16) where Δ is the deviation vector for y, and y is the vector containing n times the mean value y of y.The statisti- cal significance of the obtained regression equations was checked via F-test using online software P-Value from F-Ratio Calculator (ANOVA), 28 with degrees of freedom equal to k for the numerator, and equal to n-k-1 for the denominator of the F-ratio.Two-tailed t-tests with n-k-1 degrees of freedom for all regression coefficients were carried out in online package P Value Calculator (GraphPad). 29Statistical significance level for all statistical tests in this work was α = 0.05.In this manner, the null hypothesis was tested for each regression coefficient, asserting that a regression coefficient was not statistically significantly different from zero.This approach is more rigorous than the F-test, which tests the entire equation under the null hypothesis that the F-ratio is not statistically significantly different from zero.Confirmation of any null hypothesis means that the regression equation is not reliable.
3 Results and discussion

Database of stable tetrahedral homonuclear tetramers
Table 1 presents a list of tetrahedral molecules and ions as observed for Groups 13, 14, and 15 of chemicals elements, with relevant references.  No ements in other groups were found to form stable tetrahedra that could exist independently and be transferred to another medium.Transition metals, for example Mo, 80 Ru, 81 Pt, 82 and Au, 83 form tetrahedral units (clusters), but these are parts of more complicated species and cannot be isolated.Europium deposits on graphene exist in the form of separated tetrahedral clusters, 84 but these tetrahedra cannot be transferred to other media.Among the elements of Groups 13-15, the smallest atoms (B, C, and N) do not form tetrahedra (Table 1).Therefore, only 12 elements exist in tetrahedral form: in Group 13 -Al, Ga, In, and Tl as anions E 4

8−
; in Group 14 -Si, Ge, Sn, and Pb as anions E 4 4− ; and in Group 15 -P, As, Sb, and Bi as neutral molecules E 4 or cations E 4 + .
Tetrahedral species of Group 13 exist as Zintl phase anions in crystalline state.There is more diversity among tetrahedral anions of Group 14: they exist in crystalline Zintl phases, the anions are found in ammonia solution (ammoniates), in molten Zintl phases, and in some other forms (Table 1).Fig. 1 shows two examples of Zintl phase anions, based on crystal structure data retrieved from the Access Structures service of the Cambridge Structural Database (CSD) and Inorganic Crystal Structure Database (ICSD), 85 and presented by means of a molecular graphics software ViewerLite. 86ong the Zintl phases cited in Table 1, one compound exhibits semi-metallic properties, 34 a few Zintl phases are useful in synthesis of diverse materials, [36][37][38] and some Zintl phases may serve as precursors for nanocrystalline elements, such as Si, Ge, and Pb. 37,56Zintl phases have generally gained a great deal of attention due to their versatile properties and reactivity, leading to diverse applications: [87][88][89][90] synthesis of new materials (compounds, crystal structures, allotropes; and nanostructured materials); obtaining new materials through solution chemistry; catalysis; use as suitable thermoelectric, magnetic, insulator and semi-metal materials for various purposes; use in electrochemistry and photovoltaic technology, in energy storage and energy conversion technologies; surface modification of solids, and more.
Tetrahedral species of Group 15 include tetrahedra of Sb and Bi mainly in gas phase, and diverse tetrahedra of P and As (Table 1).Molecules P 4 and As 4 exist in several forms besides the gas phase form: in crystalline phase, in thin films, in cage or intercalated compounds, and some other forms.Capturing P 4 molecules by organic container molecules (supramolecular and polymeric) 61,63,65,66 transforms white phosphorus into non-toxic forms that are air-stable and water-soluble.In other words, this is an efficient way to store white phosphorus, especially for various chemical syntheses.Encapsulation of white phosphorus molecules P 4 by nanostructures, such as single-wall carbon nanotubes 67 and C 60 fullerene molecules 69 , is another way to obtain a stable form of phosphorus.Figure 2 shows two tetrahedral forms of phosphorus, in crystal structures obtained from the Access Structures for the CSD and ICSD databases, 85 and presented using the software ViewerLite: 86 P 4 in white phosphorus 58 and P 4 captured by an organic tetrahedral cage, 65 are both shown with their coordination spheres within van der Waals contacts.okružena drugim molekulama P 4 (narančasto) u kristalnoj strukturi bijelog fosfora (ICSD: 68326). 58Desno: molekula P 4 (narančasto) okružena polifenolno-poliaminskom supramolekulom (C -sivo, N -plavo, H -bijelo) u kristalnom stanju (CSD: PELSOR). 65pturing As 4 molecules of yellow arsenic by organic supramolecular containers 63 resolves the problems of arsenic use in synthesis: toxicity, instability, and rapid polymerisation to grey arsenic which is unsuitable for chemical reactions.Single-wall carbon nanotubes have also shown to be effective in encapsulating As 4 molecules of yellow arsenic. 63It is interesting to note that the carbon nanotubes act as catalysts in obtaining other, even new allotropes of arsenic or phosphorus. 67,73trahedral forms of the twelve chemical elements (Table 1) are of technological importance.Studying them is not just a matter of theoretical considerations; it implies gaining a better understanding of the structures, properties, and reactivity of materials containing tetrahedral homonuclear species, which can have potential applications in chemical technology.While the octet rule may appear quite reasonable, it is not adequate to rationalise the existence of stable elemental tetrahedra across the periodic table, including Groups 13-15.This is why descriptors of chemical elements for the following five groups were collected, and a dataset was created (Table 2): Q -charge of tetrahedral species (in electron units), χ -Allen electronegativity, 92 R at -atomic radius 93 (at density cutoff 0.001 e Bohr −3 ), χ V -valence-state electronegativity, 94 D e -experimental bond dissociation energy of diatomics, 94 R 0 -ionisation radius, 94 R vdW -van der Waals radius, 95 R cov -covalent radius, 96 χ P -Pauling electronegativity, 97 and Class -the frequency and diversity class of tetrahedral species (carefully derived from the data in Table 1).Therefore, these variables should undergo procedures of data compression via dimensionality reduction.

Hierarchical Cluster Analysis
Selecting the best linkage in Hierarchical Cluster Analysis (HCA) is usually a trial-and-error method. 98Among various methods (single, incremental, complete, centroid, median, group average, and flexible) available in Pirouette 19 , the best clustering pattern of samples (i.e., of the tetrahedra from Table 2) was obtained using the complete linkage clustering, known also as the farthest neighbour method (Fig. 3).Distances between cluster pairs in the complete linkage are calculated as the distances between the two farthest samples in the two clusters.The complete linkage has two peculiar characteristics: 17 generation of more compact clusters, and the maximum sensitivity to outliers.
4][25] In Fig.There are seven main clusters in the HCA dendrogram (clusters A-F, Fig. 3).Red clusters and sub-clusters are for elements having true tetrahedral species (Table 1), blue sub-clusters are for elements that are not well separable from the red sub-clusters (B, S, Se, Te, Po), and the black clusters are for other elements (C, N, O and Group 2).The blue sub-clusters and black clusters are fictive tetrahedra.
The more similar the tetrahedra are in terms of their d xx values, the more defined sub-clusters they tend to form indicated by higher similarity indices.This is why several two-membered sub-clusters appear at similarity indices above 0.9.Tetrahedra of Group 13 and Group 14 elements are mixed, so that heavier elements appear in cluster B as sub-clusters B1 (Group 13) and B2 (Group 14), and lighter elements are in cluster C as sub-clusters C1 (Group 13) and C2 (Group 14).This fact aligns with Table 1, where two groups are known for their Zintl phase anions.However, Group 15 elements are divided into two parts within clusters A and D: sub-cluster A1 contains heavier elements, while lighter elements are found in sub-cluster D2.
Why are blue sub-clusters not well-separated from the red ones, contrary to expectations (Fig. 3)?Firstly, HCA dendrogram is only a one-dimensional presentation of similarities and differences among samples, meaning that, in some cases, distinction among samples cannot be visualised.Secondly, there are certain structural similarities between the red and blue sub-clusters within the same cluster.Group 16 contains stable homonuclear tetrameric species: S 4
Clusters A, B, C, and D make a macro-cluster at similarity index 0.61 (Fig. 3).The d XX values for existing tetrahedral species (the red sub-clusters) range from 2.2 to 3.2 Å, which can be considered as a condition for existence of stable true tetrahedral species.The second empirical role is that small two-membered red sub-clusters have differences in d XX values ranging from 0 to 0.2 Å, while blue sub-clusters have somewhat greater differences (0.3 and 0.4) The black clusters are well separated from the macro-cluster and from each other (Fig. 3).The smallest d XX values belong to cluster F, to the smallest atoms of C, N, and O. Tetrameric homonuclear species are known for these elements: tetracarbon C 4 (square planar and linear), 108,109 C 4 − (square planar), 109 tetranitrogen N 4 and N 4 + (bent structures), 110 metastable tetraoxygen O 4 and O 4 + (square planar structures), 111 and octaoxygen O 8 which appears to consist of two parallel square planar structures. 112Carbon tetrahedron exists within a molecule of C 4 t Bu 4 . 113Group 2 is divided into cluster E (light elements, hexagonal crystal structures 109 ), and cluster G (heavy elements, face-centred and body-centred cubic structures 109 ).The only "tetrahedra" to be discussed in such cases, are numerous imaginary species formed from four atoms belonging to two or three different layers of the crystal structure of a pure metallic element.
Group 17 was not included in this study.Two heavier elements, Br and I, are known for their tetrameric species: Br for Br 42− (linear), 114 and I for I 4 2+ (square planar) 115,116 and I 4 2− (trigonal). 117If included in the dataset, fictive tetrahedra of Br and I would probably mix with the red sub-clusters from Fig. 3.

Principal component analysis
Table 4 contains the basic statistics for Principal Component Analysis (PCA).For each number of principal components (PC) used, five quantities are shown: variance, percent variance, cumulative percent variance, VPRESS for validation, and PRESS for data reconstruction, using the selected number of PCs.[22][23][24][25] The scores plot in three dimensions (Fig. 4) features tetrahedra with their Class value (Table 2): red (1 -for non-existing or fictive tetrahedra), green (2), blue (3), pink (4), and black (5).A higher Class value indicates greater frequency and diversity of materials in which the tetrahedra were identified.The scores plot reveals significant differences compared to the HCA dendrogram (Fig. 3).Firstly, all real tetrahedra (green, blue, pink, and black) are separated from all fictive tetrahedra (red), while in the HCA dendrogram, some mixing occurs.Secondly, the space of the real tetrahedra may be contoured by an ellipse-like shape in the plane of the projection; the real tetrahedra are close to Group 16 and boron.Thirdly, mixing within the ellipse-like shape is minimal -Groups 13 and 14 are not mixed as in HCA, except in one case -Pb is positioned among Group 13 elements.
The ellipse-like shape in Fig. 4 exhibits several distinctions among the tetrahedra, which are not apparent in HCA, and cannot be explained by the simple octet rule.There are three clusters in Fig. 4 separated by straight brown lines: Group 13 with Pb, Group 14 without Pb, and Group 15.Furthermore, Group 15 contains tetrahedra with no charge or small positive charge (1+), and the other two clusters comprise tetrahedra with high negative charges (4− and 8−).With respect to the Class values, there is some regularity within the ellipse-like shape: low Class values (green and blue) occupy mainly the central part of the shape, and high Class values (pink and black) are divided into left and right branches.There is also some intermixing of these areas around species of As and P. It is interesting to note that the metal -metalloid -non-metal character can also be observed within the ellipse-like shape and its surroundings.Species of metalloids As, Sb, Ge, and Si form a diagonal line in purple which continues to include red Te, Se, and B. To the right of the metalloid species, the non-metallic P species create a continuous space with other red non-metal species.To the left of the metalloid species are metal species of Sn, Pb, and metals of Group 13, and outside of the ellipse-like shape are species of other red metals of Group 2 and Po.
The scores plot (Fig. 4) and its chemical meaning confirm that indeed three principal components are required to describe the data from Table 2. How can we understand the PCA scores in terms of the original descriptors used?Table 4 contains the loadings matrix for the first three principal components.Loadings for each PC are divided into two groups: PC1 -electronic (italics), and steric (bold) descriptors; PC2 -valence state (bold-underlined), and other (plain) descriptors; PC3 -non-bonding (plain-underlined), and bonding interaction (italics-bold) descriptors.PC1 is a general PC, a stereoelectronic property, taking into account atomic size and electronic content.PC2 is a property which distinguishes valence state from ground state descriptors; in other words, PC2 accounts for electron acceptance/release or reduction/oxidation property.
The meaning of PC3 according to the loadings (Table 5) is rather uncertain, due to its small contribution to the total variance (3.4 %).Two-dimensional scores plots may aid in understanding the nature of this PC.Two-dimensional PCA scores plots (Figs.5-7, with the same colouring as in Fig. 4), illustrate the meanings of the PCs in terms of positions and grouping of samples (tetrahedral species).Green confidence ellipses represent the 95 % confidence level.Trends observed in Fig. 4 are not so well visible in Figs.In the PC1-PC2 space (Fig. 5), real tetrahedral species are well grouped within a central brown ellipse, clearly separated from all fictive tetrahedra.Upon a detailed analysis of Fig. 5, one can derive a condition for real tetrahedra, PC2 ≤ 1.25, and another condition derived from the inequality of the ellipse: PC1 + 4PC2 ≤ 6.34.Although the ellipse was drawn arbitrarily, it is useful for qualitative considerations.For a more rigorous and quantitative approach, many chemical elements should be included in PCA to construct an exact ellipse.When analysing the positions of samples with respect to the coordinate axes, it is evident that smaller atoms, which are more electronegative and non-metallic elements, are located at high positive values of PC1.The positive end of PC1 is also characterised by positive contributions of electronegativity descriptors, charge Q, and dimer dissociation energy D e to PC1 (Table 5).In other words, high values of these descriptors describe the elements at the positive end of PC1.In contrast, at high negative values of PC1, there are larger atoms, more electropositive metallic elements, because this end of PC1 is determined by negative contributions of radii descriptors to PC1 (Table 5), meaning that atoms with small radii are placed at this end of PC1.Practically the same trend along PC1 exists within the central ellipse, clearly visible because PC1 contains the largest fraction of total variance (74.5 %).Tetrahedra of metalloids are positioned in the central and upper part of the small ellipse.Samples (tetrahedral species) at negative values of PC2 are usually of electron-defi-cient elements that either receive electrons in forming covalent bonds (B, C, Be) or tetrahedral anions (Al, Ga, In), or ionise (Mg, Ca, Sr, and Be as an anomalous sample).This trend is caused by the high negative contribution of the valence-state electronegativity χ V to PC2 (Table 6); thus, the mentioned elements are well-described by greater values of χ V .Tetrahedra at the positive end of PC2 are of elements which are mainly richer in valence electrons, so that they do not receive electrons to form tetrahedra (Sb, As, P) or other covalent structures (Group 16).The loadings matrix (Table 6) shows positive contribution of most descriptors, especially of charge Q to PC2.In other words, high values of these descriptors characterise the mentioned species at the positive end of PC2.PC2 contains a smaller fraction of total variance (16.5 %), which is why the samples' trends along it are somewhat less clear than in the case of PC1.
Interesting trends can be noticed when inspecting whether the chemical elements along PC1 and along PC2 follow the same order as in their respective groups in the periodic table, i.e., in the sense of increasing atomic numbers.Along PC1, from its positive end toward its negative end, all elements follow the same order as in the periodic table.The same can be said for PC2, when observing the elements from the negative end of PC2 to its positive end.Such arrangement of the elements means that Groups 2, 13-16 are positioned in diagonal directions, almost mutually parallel.Existing tetrahedral species in the PC1-PC3 space (Fig. 6) form a retort-shaped space, which is obviously less regular than the ellipse in Fig. 5.It is impossible to determine a condition for real tetrahedra from this figure.In Fig. 7, the samples are situated in a complicated kidney-shaped space, with mixing of real and red tetrahedra to its right.Trends along PC2, as mentioned previously for Fig. 5, are somewhat clearer in Fig. 7, because PC2 is now the horizontal axis defining the direction of the major axis of the confidence ellipse.Notably, descriptors that account for non-bonding interactions (D e and R vdW ) show high positive contributions to PC3, while other descriptors show small positive and negative contributions to this PC (Table 6).PC3 contains a small fraction of total variance (only 3.4 %), and trends along it are not clear in the scores plots.It may be said that PC3 discriminates elements with respect to the size of species that the elements form in compounds, in a very general sense.The size of species is always related to the intensity of non-bonding interactions.Samples at high positive values of PC3 represent two types of elements which form greater species (Figs. 6 and 7): very large cations (Ba, Ca, Sr), and complicated molecules with several homonuclear covalent bonds (B, C, N).At negative values of PC3 are elements that form smaller species, smaller cations (Mg, Tl, Po, In, Pb, Ga).Group 16 elements build less complicated molecules (only two covalent bonds per atom); thus, corresponding samples are placed at less negative values of PC3.Contrary to PC1 and PC2, PC3 introduces some distinction among the five groups of chemical elements.Moving from the positive to the negative end of PC3, Groups 13-15 maintain their order as in the periodic table, with the exception of Sb and As, which have swapped places.However, Group 2 and Group 16 are oriented contrary to their order in the periodic table, exhibiting some irregularities such as Mg and Be swapping places and O appearing between Se and Po.Therefore, PC3 can serve as a discriminating factor between Groups 13-15 (where real tetrahedra exist) and Groups 2 and 16 (where real tetrahedra do not exist).

Parabolic regression modelling
Table 3 shows that variable Class has statistically no significant correlations with the other nine variables.The relationships between Class and other descriptors were reinvestigated for eventual non-linearities, with special emphasis on Groups 13-15.The same was done with the relationship between Class and the selected PCs.The results are summarised in Table 6, and they are surprising and unexpected.Parabolic (second-order polynomial) regression models y = a + bx + cx 2 with estimated standard deviations σ a , σ b , and σ c for regression coefficients, were constructed, and showed some statistical significance for relationships between Class and eight descriptors and PC1, only for   All nine parabolic models are statistically very or extremely significant in the F-test (Table 6).However, six out of nine models are statistically very or extremely significant in the t-test for all regression coefficients: the models for three electronegativity variables and the models for PC1, atomic radius R at and for bond dissociation energy D e .The models for the other three radii descriptors fail in the t-test for only one regression coefficient.These facts indicate that non-linear relationships between Class and the nine descriptors are important and thus, cannot be ignored.All radii descriptors as functions of Class have maxima, while other descriptors and PC1 have minima (Fig. 8).Scilab 14 was used to plot the data points for Class, PC1, and for the squared function from regression analysis.Upon detailed analysis of Fig. 8, one obtains a condition for PC1 for real tetrahedra: PC1 ≤ 1.69, meaning that Class > 1.
Class was generated from data in Table 1, and was based on literature: Class is a measure of both frequency and diversity of tetrahedral forms reported in literature.However, its modelled relationships (Table 6) indicate that Class reflects some intrinsic, basic property for Groups 13-15, well-correlated to the general principal component PC1 and eight descriptors, and not correlated to PC2 and PC3.The Class -PC1 relationship includes in fact three quanti-ties -Class, square of Class, and PC1, which again confirms that the real tetrahedra are well-described in a three-dimensional space.

Conclusion
This work deals with identification of chemical elements that occur as stable tetrahedral homonuclear tetramers (neutral molecules and ions) and their forms of appearance.Furthermore, rationalisation of tetrahedral structures and properties with certain limits to distinguish these elements from the other elements is given.A database for tetrahedral homonuclear species was formed with a dataset of elemental descriptors.The dataset was then analysed using correlation analysis, hierarchical cluster analysis, principal component analysis, and parabolic regression with appropriate statistical tests.The results can be summarised as follows.
1) In literature, tetrahedral species were experimentally identified for 12 chemical elements.These elements are from Groups 13-15 and from Periods 3-6: Al, Ga, In, Tl, Si, Ge, Sn, Pb, P, As, Sb, and Bi. 2) These elements compose 39 forms of tetrahedral occurrence (gas phase, solid Zintl phase, thin films, Zintl phase ammoniates, cage compounds, and other).3) A simple octet rule for p-elements explains that only Groups 13-15 with 3-5 valence electrons may form such tetrahedra.4) Hierarchical cluster analysis distinguishes well the 12 elements from Group 2, C, N, and O, but does not separate them from Group 16 and B. 5) The average tetrahedral bond length should be 2.2 Å < d XX < 3.2 Å. 6) Principal component analysis with parabolic regression yield approximate conditions for principal components: PC1 ≤ 1.69, PC2 ≤ 1.25, PC1 + 4PC2 ≤ 6.34.7) Principal component analysis with two and three principal components explains rather well various properties and groupings of the tetrahedra.PC3 discriminates Groups 13-15 from Groups 2 and 16 in terms of the elements' order with respect to their order in the periodic table.8) The tetrahedra are shown to be a three-dimensional phenomenon, because three quantities are required to describe the set of tetrahedra.9) The frequency and diversity Class is shown to reflect a basic property on which PC1 and eight descriptors depend in parabolic regression, which should be further investigated.10) Class, its square, and PC1 reconfirm that real tetrahedral species are a three-dimensional phenomenon.
Many of the tetrahedral forms belong to materials of diverse technological importance.Therefore, gaining a deeper understanding of the structures and properties of these tetrahedra at a more advanced level is crucial, and more such studies should be conducted in the future.

Table 1
91ntains the average tetrahedron d XX bond lengths for a particular tetrahedral form, based on the references used.The d XX values may aid in understanding how size of tetrahedra correlates with other properties of the elements across the periodic table.Notably, these values increase along the groups, and decrease along the periods, as expected.The trends are not always regular (compare Al and Ga).Small variations of the d XX values occur among different forms of the same element.needforthree chemical bonds per atom in a tetrahedron species: 5 valence electrons in Group 15 are enough to form neutral tetrahedra, but only 4 valence electrons in Group 14 are not sufficient -4 more electrons are necessary so that anions with charge 4− are formed.Furthermore, Group 13 has 3 valence electrons, meaning that 8 more electrons must be gained to form a tetrahedral anion with charge 8−.What about the neighbouring groups?The right neighbour, Group 16, contains more electronegative elements that should lose electrons so that a tetrahedral cation with charge 4+ may be formed, which is very unlikely.Vernon91has shown various relationships in the periodic table.For example, he pointed out the Be-Al diagonal relationship and the placement of Mg directly in front of Al.He also suggested the possibility of treating the main group elements as one block.As a result, for the left neighbour, comparable with Group 13, Group 2 of alkaline earth metals, an electropositive group, was considered.This group is not capable of capturing enough electrons to form a highly negative anion of charge 12−.
3.2 Dataset of selected molecular descriptorsAccording to Table1, it is clear that the first three groups of the p-block are characterised by tetrahedral species: all post-transitional metals (Al, Ga, In, Tl, Sn, Pb, Bi), some metalloids (Si, Ge, As, Sb), and a non-metal (P).What do all these elements have in common?A simple octet rule may explain the

Table 2 -
Descriptors of tetrahedral species: charges, elemental properties, and frequency and diversity class Tablica 2 -Deskriptori tetraedrijskih vrsta: naboji, elementarna svojstva te razred učestalosti i raznolikosti XX values from Table1were used with less precision in Fig.3, and for other elements, known bond lengths 99 were added with the same precision.

Table 6 -
Parabolic regression modelling of PC1 and descriptors as functions of variable Class Tablica 6 -Parabolično regresijsko modeliranje PC1 i deskriptora kao funkcija varijable Class Groups 13-15.Using other PCs and Groups 2 and 16 did not yield sensible non-linear models.It was not expected that Class would be the independent variable, but on the contrary, the modelled variable.F-test parameters are: r -Pearson correlation coefficient, F -F-ratio, and p -corresponding probability.Student test parameters: t a , t b , and t c are t-parameters for a, b, and c, respectively, and p a , p b , and p c are the corresponding probabilities.Values of t-parameters and probabilities in Table6are typed bold in case of statistical significance, and in italics in case of borderline statistical significance.
(at density cutoff 0.001 e Bohr −3 ) -atomski radijus (kod granice gustoće od 0.001 e Bohr −3 ) i-th row vector of the data matrix X in PCA i-ti redčani vektor matrice podataka X u PCA ˆi x i-th row vector of the reconstructed data matrix X ˆ in PCA i-ti redčani vektor rekonstruirane matrice podataka X ˆ u PCA -vector containing n times the mean value of y -vektor koji n puta sadrži srednju vrijednost od y ŷ -vector of predicted values of the dependent (modelled) variable in regression analysis -vektor proračunatih vrijednosti zavisne (modelirane) varijable u regresijskoj analizi cvX-average data matrix from crossvalidation in PCA -prosječna matrica podataka iz unakrsne validacije u PCA cv ˆi x i-th row vector of the matrix cv X from crossvalidation in PCA i-ti redčani vektor matrice cv X iz unakrsne validacije u PCA i x