Skip to main content

Quantitative shape analysis with weighted covariance estimates for increased statistical efficiency



The introduction and statistical formalisation of landmark-based methods for analysing biological shape has made a major impact on comparative morphometric analyses. However, a satisfactory solution for including information from 2D/3D shapes represented by ‘semi-landmarks’ alongside well-defined landmarks into the analyses is still missing. Also, there has not been an integration of a statistical treatment of measurement error in the current approaches.


We propose a procedure based upon the description of landmarks with measurement covariance, which extends statistical linear modelling processes to semi-landmarks for further analysis. Our formulation is based upon a self consistent approach to the construction of likelihood-based parameter estimation and includes corrections for parameter bias, induced by the degrees of freedom within the linear model. The method has been implemented and tested on measurements from 2D fly wing, 2D mouse mandible and 3D mouse skull data. We use these data to explore possible advantages and disadvantages over the use of standard Procrustes/PCA analysis via a combination of Monte-Carlo studies and quantitative statistical tests. In the process we show how appropriate weighting provides not only greater stability but also more efficient use of the available landmark data. The set of new landmarks generated in our procedure (‘ghost points’) can then be used in any further downstream statistical analysis.


Our approach provides a consistent way of including different forms of landmarks into an analysis and reduces instabilities due to poorly defined points. Our results suggest that the method has the potential to be utilised for the analysis of 2D/3D data, and in particular, for the inclusion of information from surfaces represented by multiple landmark points.


The introduction of geometric morphometrics has laid the foundations for a quantitative description of shapes and shape differences, thus revolutionising the century old quest for comparing anatomical features of organisms [1]. It is now also increasingly used to link quantitative descriptions of shape with developmental processes and associated genetic factors [2]. This process generally involves the construction of a parametric model based upon exemplar biological shape specimens, and the most popular of these are linear models. These are used to quantify and predict the correlations in shape variation between and within species. The objectives of this paper are to improve the statistical efficiency of analysis techniques used in the genetic interpretation of shape variation (morphometrics) and to broaden the scope of problems which can be tackled with shape analysis tools. In particular we believe that much shape data is not suitable for use in current approaches, and ‘semi-landmarks’ (those poorly localised in one direction and the majority of measurements for smooth 3D shape) cannot be appropriately utilised [3, 4].

Over a decade during the 70’s, bio-mathematical and biometrical aspects of biological shape studies were treated separately. This early work was later criticised during the 80’s by Bookstein [5], Goodall [6] and Kendall [7]. Later, Bookstein [8] worked towards converging notations from Goodall, Kendall and himself, for the biometric analysis of landmark data in a bio-mathematically interpretable framework of shape. As a consequence of these efforts, the standard method for analysis of variation in landmark position is generally regarded as ‘Procrustes’. It comprises a least-squares alignment of a set of landmark features to a mean shape, and this is often followed by eigenvector analysis of the linear correlations in variation around that mean. While the technique is now very popular the approach has several limitations with regard to the types of variation with which it can deal. One of these limitations is due to the assumption associated with taking least-squares differences and eigenvector summaries of distributions. Though many regard these as simply definitional, and in particular associated with ‘shape’, any statistical interpretation suggests that data are measures with homogeneous noise. On the other hand, the Mantel test [9, 10] has sometimes been used as an alternative to Procrustes distance to compute correlation between distance matrices (usually symmetric). Though many papers have been published in this area, we are aware of no work in this, or any related, area of point distribution modelling that has provided a framework to allow data to be analysed according to a measurement process.

Although landmarks are generally carefully chosen in order to allow accurate measurements of position within the image, problems will occur if ‘semi-landmarks’, measured from smooth curves or surfaces and only accurately localised in one dimension, are input to the analysis. Landmarks with a high degree of variability can act as outliers in the alignment stage, generating correlated compensating shifts and rotations of the other points. As PCA aims to describe the main sources of variation, high levels of such correlated movement will then necessarily contaminate the extraction of eigenvectors [11]. This contamination cannot be considered a generic variation, as it has occurred purely due to the uncertainty in the measurement. This in turn follows from the subjective definition of the landmark leading to the view that problems can be avoided via appropriate definition. The mathematical concept of homology (and mapping) underlies many of the considerations behind much theoretical work that is described with the mathematical formalisms of isomorphism. Because of such restrictions on the definition of landmarks, semi-landmarks were introduced [12] in order to allow inclusion of other points which are not homologous among the specimens. By this we mean that a unique corresponding location can not be defined. Measurement at these locations must be regularised by a constraint, such as bending energy [12, 13], in order to recover the information missing due to the nature of local structure.

From a statistical perspective a homology (in this context) must be augmented by distributions indicative of the extent to which a correspondence can be established. The standard way to deal with inappropriate weighting of data in a least-squares fit is to generalise the least-squares cost to a Mahalanobis distance, computed using measurement covariances. By avoiding the requirement of specifying a unique homologous location, this has the advantage of accommodating varying precision in measured data without having to try to re-create missing data. There have been several attempts in the literature to include measurement errors for landmark points. For example, Fitzpatrick et al. [14] worked on the relationship between localisation error and registration error in rigid-body, point-based registration. Chui and Rangarajan [15] proposed a general framework for non-rigid point matching, where outliers are effectively rejected. Rohlf and Slice [16], and Walker [17] investigated how to estimate measurement covariances in forms. However, Richtsmeier et al. [18], Adams et al. [1] and Rohlf [19] all stated that further research was needed in this area. Also, Walker [17] and Lele [20] concluded that generalised Procrustes analysis (GPA) estimators of the variance-covariance matrix are flawed. Despite the fact that some biologists have noticed these problems, they seem to know of no available alternatives and continue to use GPA to estimate covariances [21].

Text books [22] state that using weighted Procrustes does not lead to a Kendall’s shape space. Claiming that “statistical analysis cannot employ parametric models”, they suggested that resampling-based methods must be used instead. Another reason for rejecting the idea of a weighted Procrustes was said to be a “lack of clear criteria for determining appropriate weighting of semi-landmarks”. These criticisms can only really be interpreted once a method for weighting is specified. Goodall [23] suggested a method in which the same covariance was used for all landmarks. By this we mean there was no separate description of the perturbation of individual landmarks. It has been noted that such a matrix is inestimable [24]. Goodall himself acknowledged that “as a model of measurement error this is a drawback, as the direction of greatest variation may vary considerably between landmarks”. Despite this problem, later work [25] generalised this idea to a Bayesian framework. We believe that it makes sense instead to suggest an approach which can support the process of landmark location as measurement, with a covariance describing the localisation of each landmark separately (see [26, 27] for example). Specifically, Rohr et al. [26] used covariance matrices in a Mahalanobis distance form for non-isotropic data, where covariances were estimated from image data through landmark localisation, i.e. using grey-value information from local pixels around each landmark for matching an image area/volume structure to another through optimisation of a cost function. The minimal localisation uncertainty for each point were estimated using the Cramer-Rao Bound (CRB). Also, smoothness was included as the second term in their functional and controlled using a regularisation parameter. To our knowledge, they have been the first to provide a relatively comprehensive approach for incorporating anisotropic covariances into image registration using splines. However, here we only deal with pre-defined landmark data and, unlike their method (and our recently published method [28]), do not attempt to extract landmarks and their corresponding covariances from image data. Specifically, in [28], we have applied smoothing to local edge data (where information is) prior to optimisation in order to remove the effects of spatial noise and obtain meaningful CRB estimates. However in our current study, the only input data fed to our method are a number of shapes represented by fixed landmark points. Hence, we do not take into account any information about the local structure surrounding each landmark. This way, the task of covariance estimation may be seen even more challenging. We are aware that in biological studies it is now commonly accepted that for point-based shapes, extra information about the local/global pixels in the image plane/volume (for 2D/3D data) is usually available using modern imaging equipment. However, here our observation is that geometric morphometrics should originally be capable of dealing with the study of 2D/3D forms [18] even for non-biological data or cases where information about the local structure around each landmark is missing or difficult to access or process. It is worth mentioning here that one reason why Procrustes still is popular is that apart from the forms (shapes) represented by landmark points it does not require any further data such as images from which the points have been originated. Hence, even though the datasets we use in our experiments are biological and one could also feed in the image data, in this study we chose to start the process from pre-defined landmarks only. Ideally, covariances extracted from image data (using other methods such as ours [28]) could be fed to our current method and be used, for instance, as initial estimates. This is however a subject for future investigation.

There have been further publications on anisotropic weighting, for instance in [29, 30]. Mathematically, these methods are all equivalent to our approach, in that they use a Mahalanobis distance based upon anisotropic distributions of individual points. However, they do not have a well-defined mechanism for the estimation of these distributions. This is a key issue when applying these ideas to shape samples. Our work provides such a mechanism while incorporating corrections for estimation bias [7]. The basic concept can be implemented via a standard technique used in pattern recognition, often referred to as whitening [31]. For instance, in the context of shape analysis, the whitening transform and shape de-correlation were used as a preprocessing step in PCA/ICA analysis [32, 33]. However, there is a difference between using whitening methods to model the signal variation of data (as used in these papers) and using the same technique to better construct a likelihood function that accounts for correlation in measured data (as we do here). Recently, the technique has been applied to the within group biological covariances [34], but again not to the process of noise on measurements. Here, we shall investigate possible generalisations of Procrustes along these lines, and the different ways such a measurement covariance may be estimated. As a key issue here is the computability of these covariances, the stability of the resulting analysis is an important question for investigation. The theory presented here can thus be classified in the same category as both Procrustes based shape analysis [35] and active shape models [36]. The main difference, however, being that our model is for a realisable system and self-consistent estimation of the associated model parameters.

There has been an ongoing discussion in the biology literature regarding appropriate ways to deal with non-homologous landmarks (points defined on smooth curves and surfaces) during statistical analysis. For instance, Klingenberg [37] has objected to Polly’s conclusions [38] regarding the benefits of existing homology-free approaches. He believes that these approaches all depend critically on some sense of homology since they are not really free of assumptions about the correspondence of parts. Oxnard and O’Higgins [39] have recommended that it is biology that has to inform morphometrics in planning the landmark configuration (mainly mathematical landmarks, i.e. those computed using geometric constraints based on the neighbouring true landmarks) in relation to the hypothesis available. The approach to dealing with semi-landmarks in the morphometric analysis of shape currently seems to be divided between two alternatives, both of which aim to adjust the position of these landmarks by optimising a specific metric, before constructing a linear model of variation about the mean. These metrics are bending energy (BE) and Procrustes distance (PD) [3]. Arguments for and against these approaches are based upon specific examples in biology. Although evidence has been reported of utility [40], Slice [41] has stated that the application of the BE approach to biomedical and anthropological problems has been minimal. Vignon and Pierre [4], and Prez et al. [42] have shown concern regarding the observation that different methods for handling semi-landmarks could result in different conclusions in a discriminant analysis study. Gomez-Robles et al. [43] have examined the advantages and disadvantages of different novel methods in geometric morphometric analyses including homology-free approaches, landmark-based approaches, and combinations of both techniques.

Comparison between results from shape analysis and genetics is an important research topic in evolutionary biology. For instance, Frederich et al. [44] have attempted to estimate the statistical correlation between morphological, genetic and geographical distances. We offer an alternative shape analysis method that tackles the existing problem in the literature, so that well defined comparisons become statistically valid and informative.


Suppose that there are K shapes in our data-set and each shape vector w k contains N landmark points, i.e. w k =[w1x,w1y,w2x,w2y,...,w N x ,w N y ] k for the case of 2D data. We then apply a scale s k , a rotation R k and a translation t k to the original data to get an aligned version of the data called z k , where z k =[z1x,z1y,z2x,z2y,...,z N x ,z N y ] k and z k =s k R k (w k t k ).

The mathematical description of the model so far can accommodate any value of scale or orientation for the definition of mean model. We therefore define the orientation of mean shape so that the line between a specified pair of points is horizontal. This also has the benefit that initial estimates of alignment for sample k can be set according to the relative positions of these points. We also use the average distance between these same landmarks to rescale the mean shape at each iteration so that scale remains numerically defined.

For 2D data, we assume a different but fixed 2×2 covariance matrix for each landmark derived from the measurement process. These are composed into the matrix C. This is a tri-diagonal matrix, the diagonal line of which contains data for individual landmarks. Outside of the 2×2 covariances, the off diagonal elements of C are zero, i.e. there are no correlations between landmarks. The use of a fixed data covariance cancels out when taking the weighted mean, to regenerate the conventional formula for the mean;

m= 1 K k = 1 K z k

where m=[m1x,m1y,m2x,m2y,...,m N x ,m N y ]. This definition for mean shape has previously been shown to provide unbiased estimates using Monte-Carlo re-sampling studies [19], which is to be expected for a valid likelihood estimate of parameters.

The points z k do not have uniform independent noise distributions, which is one of the assumptions for the application of PCA. However, this property can be obtained via a whitening transformation. Although transformation of data can be considered as a new space, it can also be interpreted as an affine re-projection. The points obtained by applying a whitening transformation are referred to here as ‘ghost points’. Ghost points are accordingly defined in the original coordinate system and, being scaled projections relative to the shape centroid, are an alternative way to summarise the original measurement relative to the observable structure. This is an important philosophical issue for those who believe that the original co-ordinate system is somehow more meaningful as a description of biological variation than any linear re-projection (see Discussions). The process amplifies the spatial variation in directions which are well measured relative to those which are not so that the resulting locations have isotropic errors (as required). In turn, this allows accurately measured structure to be encoded in the most significant eigenvectors (those with largest eigenvalues) of the linear model. We transform z k to ghost points g k using the matrix W so that g k T =W ( z k m ) T .

By applying singular value decomposition to C−1, i.e. C−1=UTV U, and making it equivalent to WTI W, we find that the required whitening matrix is W=V1/2U. Application of PCA to g k follows for construction of the shape covariance, giving the eigenvectors e j and eigenvalues μ j for the whitened space of ghost points as those which minimise the unexplained variance for fixed J<N, where J is the number of eigenvectors used in the model. Hence, for any specific shape example k, linear factors λ j k = e j .g k can be computed to best approximate z k with the model z k ;

F= k = 1 K g k T g k j = 1 J μ j e j T e j , z k =m+ W 1 j = 1 J λ jk e j

A genuine likelihood should be based upon the variation of the data around the assumed model. Failure to do this results in residuals which cannot be meaningfully interpreted a. Using this argument, if we wish to align to the mean shape we should use a covariance that is consistent with the distribution around the model. In order to find the best R k ,t k ,s k parameters for each k, we minimise a Mahalanobis distance which is given by

log( P k z )= ( z k z k ) T C 1 ( z k z k )

This is simply the likelihood estimate for the location of the shape given the linear model and the assumed measurement covariances and can be interpreted directly as a χ2 statistic. By replacing C with I and z k with m this reduces to the least-squares function for standard Procrustes. We can therefore interpret this as a generalisation of the standard approach. However, we do not wish to generalise further by using for example PPCA (probabilistic principal component analysis) [45], as an additional assumption of a Gaussian distribution over derived variables is generally invalidated in morphometric data sets.

Use of Eq. (3) requires an initial estimate of the model and transformed data z k . By setting the initial estimates of the measurement covariance C to an identity matrix, these parameters are given by the Procrustes result. We can therefore use Procrustes to set up the initial transformation estimates. To reach the best possible alignment using our new method (anisotropic C), we iteratively re-estimate R k , t k and s k using the assumed e j , m, C−1 and W−1. This gives us a new z k , and so a new m and F for construction of e j . For fixed covariances, convergence can be monitored via construction of the total likelihood log(P)= k = 1 K log( P k ). One may use the final estimates of z k and z k to construct the sample covariance;

C = 1 K k = 1 K ( z k z k ) ( z k z k ) T

For a well defined likelihood method this covariance should be consistent with the assumed distribution C. However, the use of free parameters during alignment and model construction introduces biases that must be addressed in an iterative analysis in order to avoid instabilities, which will now be described.

Covariance correction

When attempting to estimate C, the use of free parameters during model fitting reduces the sample covariance obtained from residuals. This mechanism is precisely that identified in [7], whereby the observable variation in any single shape sample is reduced onto a manifold in 2N−4 dimensions for a 2D shape defined for N points. This has generally been considered as a bias in the overall data distribution rather than being associated specifically with a statistical estimation error (as here). A possible outcome of this is the over weighting of landmarks leading to a runaway convergence on one landmark, during iterative estimation of C. However, this bias effect is estimable and therefore correctable, as will be illustrated by Monte-Carlo simulation (below). For a single scale parameter associated with an approximate linear vector f we can use error propagation to estimate the expected average reduction Δ in the covariance for each 2×2 landmark component of the matrix arising from errors in parameter estimates Δ n C as

Δ n C= f n T f n f T C 1 f ,ΔC= n = 1 N Δ n C

Note that the denominator is the change in χ2 expected due to a unit change in f, and f n = D n f is the 2D component of f corresponding to landmark n (where D n is an operator which zeros all but those quantities associated with the n th landmark and is for outer product between two vectors). For an eigenvector e j defined in the whitened ghost space, this would suggest a total correction of

Δ C e j = W 1 e j T W 1 e j W 1 e j T C 1 W 1 e j = W 1 e j T W 1 e j

The known structure of the covariance can be enforced by zeroing relevant off-diagonal terms. The parameters of the linear model, including scale, rotation, translation and linear model weightings can also be treated in this way. If Θ i represents one of the direction vectors of these parameters (with 2N elements for 2D data), it follows that direction vectors corresponding to translation in x and y directions Θ1=[1,0,1,0,...] and Θ2=[0,1,0,1,...] are orthogonal, i.e. Θ12=0. Similarly, direction vectors Θ3=m=[m1x,m1y,m2x,m2y,...] and Θ4=[−m1y,m1x,−m2y,m2x,...] corresponding to scaling and rotation are orthogonal, and so Θ34=0. Note that m is identical to the mean vector defined in Eq. (1).

Strictly, Kendall’s definition of shape explicitly removes aspects of object transformation before model construction. Joint estimation of shape and alignment parameters is potentially unstable as estimated linear shape parameters can correlate with transformation parameters. Here we stabilise this process by removing first order correlations from the data covariance F prior to model construction.

Hence, to orthogonalise the model, we modify ghost points as follows.

g k = g k ( g k . Ŷ i T ) Ŷ i T , Ŷ i T =W Θ ̂ i T

where the unit vector Θ ̂ i is the normalised form of Θ i . The new g k is computed iteratively using each Θ ̂ i so that any variation about the mean that could have been described by an alignment parameter is removed from the correlation matrix F prior to model construction. The corresponding measurement covariance correction term is hence given by

Δ C Θ i = ( Θ ̂ i T C 1 Θ ̂ i ) 1 ( Θ ̂ i T × Θ ̂ i )

Therefore, the measurement covariance is estimated using

C= C + j = 1 J Δ C e j + i = 1 I = 4 Δ C Θ i

Using the above formula, the contribution to the χ2 lost by using a scaling parameter associated with each vector e j and Θ i contributes a value of unity to the χ2 for each additional independent degree of freedom, totalling J+4. Our method for covariance correction is therefore consistent with a degree of freedom correction as described in conventional analysis approaches [46]. As a consequence the covariance estimation process can be considered equivalent to the Expectation-Maximisation (EM) algorithm, both in operation and parameter estimates, so that the conventional proof of convergence is applicable [47].

Extension from 2D to 3D

Here we outline the mechanism we use to extend 2D shape rotation analysis, and the extraction of corrected anisotropic measurement covariances, to 3D. The methods are demonstrated in the analysis of 3D mouse skull data, both as a test of the theory/software implementation and as an illustration of use for the identification of outlier landmarks.

The extension to 3D data is mainly involved with the mechanism of representing and estimating 3D shape rotations. We define a fixed orientation co-ordinate system from a set of 3D data-points based upon a selection of three landmark points. We then represent a rotation matrix in terms of three separate rotations about the co-ordinate axes. Finally we compute the linear vectors which approximate the first order shifts seen in the 3D points due to these rotations. These are then used in the linearised approximation for sample covariance correction, as described earlier. These extensions are enough to support a quantitative analysis of 3D landmark data, for the estimation of landmark accuracy and identification of outlier data. The mathematical model used is described in detail here and in Appendix A. We provide quantitative tests in the Results and discussion section which demonstrate the numerical stability of the algorithms using Monte-Carlo data.

Rotation matrix

Our first task is to define a co-ordinate system for a 3D data-set, from which we can define certain basic properties of orientation for the mean shape, and so that individual data samples can be approximately oriented prior to optimisation during linear model construction. In the 2D case this is done by defining the line between two landmark points in the mean model as horizontal. In 3D, in order to stay consistent with the 2D, we define 2 points to establish a horizontal, and then a third to define the vertical relative to the first two.

Given a 3D shape, we take three points P1, P2, P3, with relatively large distance from each other (Figure 1) to define the orientation plane for the shape. The rotation matrix RT is hence found based on basic vector calculations (see Appendix A).

Figure 1
figure 1

The geometry shows how to define a base-plane in 3D (consistent with the base-line in 2D) using three landmark points.

Roll, pitch and yaw angles

Given the rotation matrix that brings a data set into alignment with the preferred co-ordinate system it is possible to represent the rotation as a sequence of rotations about three orthogonal axes. According to basic 3D rotation formulas, and using α, β, and γ as yaw, pitch, and roll respectively, the 3D rotation matrix is defined as three consecutive rotations around the z, y, and x coordinate axes.

R xyz = R x (γ) R y (β) R z (α)

By making the rotation matrix RT equivalent to R xyz , we find the yaw, pitch, and roll angles (see Appendix A). Thus we can convert easily between the rotation matrix and rotation parameters.

Orientation adjustments

We initialise the rotation angles, by computing the RT matrix for every original shape in the data set based upon the three identified landmark points and extracting the corresponding α, β, and γ angles. These are then further adjusted during iterative alignment via optimisation of the anisotropic measurement-based Mahalanobis distance. We perform orientation adjustment on the mean shape following every iteration over the set of shape samples. In this case the set of yaw, pitch, and roll angles corresponding to the mean shape are subtracted from the corresponding rotation angles for each shape sample, so that the computed mean shape complies with the three-point orientation constraint.

Direction vectors

In order to correct the covariances due to alignment parameters in 3D, we need the approximate linear direction vectors corresponding to translation, rotation and scale. Computing these for translation and scale is straightforward. If m=[m1x,m1y,m1z,m2x,m2y,m2z,...] is the vector corresponding to the 3D mean shape (with 3N elements), then the direction vectors due to translation in x, y and z directions are simply given by Θ1=[1,0,0,1,0,0,...], Θ2=[0,1,0,0,1,0,...] and Θ3=[0,0,1,0,0,1,...]. Also, the direction vector due to scaling is Θ4=m.

For rotation, we compute the direction vector corresponding to each individual rotations R z , R y and R x . For the mean shape m rotated by the yaw angle α around the z axis, we have m=R z (α)m. As α becomes very small, the tangential direction of movement in landmark point n due to this rotation is

u α 0 =[ m ny , m nx ,0]

By applying the same method, one can find the direction vectors due to rotation by the pitch angle β around the y axis and by the roll angle γ around the x axis (see Appendix A). Hence we have

u β 0 = [ m nz , 0 , m nx ]
u γ 0 = [ 0 , m nz , m ny ]

It follows that Θ5=[−m1y,m1x,0,−m2y,m2x,0,...], Θ6=[m1z,0,−m1x,m2z,0,−m2x,...] and Θ7=[0,−m1z,m1y,0,−m2z,m2y,...]. The set of vectors Θ1, Θ2 and Θ3 on one hand, and the set of vectors Θ5, Θ6 and Θ7 on the other hand are mutually orthogonal and orthogonal to the vector Θ4 due to scaling. These direction vectors now constitute the linearised parameterisations needed for corrections to the sample covariance (where I=7 in Eq. 9).


Here, in Table 1, we provide the step-by-step procedure for our new shape analysis method that involves linear model construction, data alignment and anisotropic covariance estimation and correction. Note that there is an arbitrary order for the application of the transformation parameters which remains consistent throughout the whole process. In fact, whatever this order, the net effect of the covariance correction (Eqs. 7-9) is to subtract the same total linear subspace.

Table 1 The algorithmic procedure for our new method

Model selection

A method is needed to select appropriate linear model order based upon the outputs from our analysis. If the linear model is valid then estimated measurement covariances will combine two processes of statistical fluctuation. The first of these will be measurement precision σ r (our ability to define homologous points reliably), and the second will be due to random (unmodellable) biological variation σ b . So that the observed statistical variation seen in a given direction v for any landmark σ v is

σ v 2 = σ r 2 + σ b 2

Unfortunately we cannot know the expected value of σ b in advance. However, the first of these terms can be estimated via reproducibility experiments and compared to the measured directional covariances, using the observation that σ v σ r . Thus if we observe individual estimates of measurement covariance which begin to surpass the limiting accuracy known to be set by reproducibility tests, then the model must be over-fitting the data and therefore has too many parameters. We check that for a given model order this inequality is satisfied within statistical limits by considering the principle axes of each landmark measurement distribution. We use a 1% confidence level to set the hypothesis test for over-fitting. This test is expected to be most reliable for the largest variances.

Monte-Carlo tests and outlier identification

As our method is based on likelihood, we require that the assumed distribution matches the corrected covariance. The standard way to validate this is through generating Monte-Carlo (MC) data using the known distributions. In what follows we experiment with MC data and display a number of informative scatter plots for two forms of test; Test A: When applying our method to the MC data, the mean shape, eigenvectors and measurement covariances used are identical to the ones used when generating the simulated data. Test B: All parameters are estimated using the MC data in order to compare the measurement covariances estimated using the simulated data with those expected, i.e. the ones assumed when generating the MC data.

For Test A the covariances estimated using our method are expected to be within statistical sampling limits of the ones used when generating the MC data. Failure to do so is taken as an indication of a problem with the data sample (i.e. outliers). Outliers can be identified at early stages of analysis as those points which have the largest normalised residual errors.

We use 2.8 standard deviations of the error on the sample variance (or being allowed to have 1% of data falling outside the limits), where the error on the standard deviation σ is σ/ 2 ( K 1 ) with K being the number of samples [48]. Additional variance is expected for Test B (beyond that seen in Test A), where the linear model must also be estimated. Therefore, having excluded the possibility of outliers using Test A, we can interpret variations beyond the statistical limits as due to instability in linear model construction (specifically the mean and eigenvectors).

χ2 test

A test is needed to confirm the equivalence of measurement covariances computed during repeatability experiments, in order to confirm that our methods generate estimates which are consistent. This can also be done by splitting the data into two separate groups if there are a sufficient number of samples. We perform a modified χ2 test based upon the construction of corrected covariances on one data set and then used for the calculation of χ2 for the second set. For large numbers of samples ( K>30) the resulting statistic when applied to each 2D landmark is expected to be approximately Gaussian with mean 2K and variance 4K. We set the statistical test for significant difference on the basis of an allowable range of χ2/D o F corresponding to ±2.8 S.D., i.e. [0.8, 1.2] for 200 samples. The corresponding plot would confirm the stability of the method if 99% of the χ2/D o F values fall inside the range expected.

Fisher information

Fisher information (FI) is a concept for quantifying the constraint on an estimated value associated with data. It has the useful property that the amount of estimated information is linear in the quantity of data. It is generally defined according to the second derivative of a log-likelihood function, but from the association of this function and the CRB we can also observe that, for good model fits, it is proportional to the inverse variance. An empirical estimate of the FI contained in data, and associated with a particular model, can therefore be obtained from the residual distributions following parameter estimation.

We use this idea here to summarise the amount of information that has been extracted from data for a specific analysis. As this quantity scales linearly with the quantity of data it allows us to make comparative statements regarding the statistical efficiency associated with the estimation process. For example, if the FI is seen to double on the same dataset when applying an alternative analysis then this is statistically equivalent to having four times as much data to begin with. A poor analysis method might need a lot more data to reach the same level of statistical equivalence in a hypothesis test than a good method.


The animal datasets used in this paper have been approved according to German ethical standards. They were registered under number V312-72241.123-34 (97-8/07) and approved by the ethics commission of the Ministerium für Landwirtschaft, Umwelt und ländliche Räume on 27.12.2007.

Results and discussion

We have used example datasets to investigate the stability of covariance weighted shape analysis and to compare quantitative performance figures to the standard approach using Procrustes. We have selected several datasets in order to demonstrate behaviour with different quantities of data, data dimensionality (i.e. 2D and 3D) and model order.

As standard methods, even those including landmark weighting, are not conventionally used in a way that would support estimation of landmark variability we have made some assumptions regarding what would be the most straightforward approach. As mentioned earlier, in this paper we are interested in analysing point-based shape datasets without seeking to obtain extra knowledge about local structures surrounding each landmark. Hence, in conjunction with our method, we have not used methods that estimate localisation errors from the original image data such as those described in [26, 28]. For Procrustes we use the residuals from the fitted models to make an estimate of landmark measurement error (although this is widely concluded in the literature not to work [22]). For methods that would support anisotropic weighting, we use a variation of our own method (incorporating iterative re-weighted alignment) to estimate the resulting residuals during iterative analysis. The difference between this and our preferred method is the lack of correction for degree of freedom biases, we therefore refer to this as the “uncorrected” method.


We experiment with two 2D data sets of manual mark-ups (Figure 2). The first data set, called MM1, corresponds to mouse mandible micro-CT images and consists of 337 samples with 14 landmarks per sample. We also have a repeat data set, called MM2, for which same mandible images have been used to mark-up the points.

Figure 2
figure 2

Typical landmarks corresponding to sample images of fly wings (left) and mouse mandibles (right); for the fly wing data, landmarks 1-15 correspond to the original data sets FL1, FL2, FR1 and FR2, while landmarks 16-19 were added later (to FL1) in order to experiment with semi-landmarks (P-FL1).

Next, we use some fly wing data in order to test the performance of our method on semi-landmarks and also to test the statistical stability of our method. There are four original data sets available from left and right wings(L and R) of 200 female flies, called FL1, FL2, FR1 and FR2 [49]. Two images of each wing were taken from slightly different viewing positions (1 and 2), and used for marking-up in order to perform reproducibility tests [49]. Each of these four data sets has 200 samples with 15 landmarks per sample. Further, as we had access to the fly wing images, we have added four semi-landmarks to each sample of the original data set FL1. Once finished, we removed 5% outliers and stored 189 samples with 19 landmarks per sample. This resulting data set, which is called P-FL1, plays an important role in our experiments with semi-landmarks. In order to be able to test the repeatability with these added semi-landmarks, we have repeated the marking-up process only for the four new landmark points and using a subset of the left fly wing images.

We also experiment with the mouse skull (MS) 3D data of semi-automatic mark-ups (Figure 3) produced based on training examples and the corresponding micro-CT images. This makes a typical 3D data set of interest in evolutionary biology research. We have used our automatic tool to localise landmarks on these mouse skulls based on few given manual mark-up examples [50, 51]. This is based on landmark localisation technique recently described in [28] (where more details may be found). The automatic tool also identifies outliers for manual correction and so we do not expect any outlier in this data set. The mark-up data set obtained this way (MS) consists of 42 samples with 50 landmarks per sample. Further, there are two sub-sets of repeat data based on manual mark-ups (on the mouse skulls) each consisting of 12 samples to be used in repeatability tests.

Figure 3
figure 3

Typical landmarks for a sample volume (top-left) from the 3D mouse skull (MS) data when projected on the xy (top-right), zy (bottom-left) and xz (bottom-right) planes (the 50 points are too close to display full numbering).

Model selection

In Figures 4, 5 and 6, we plot the eigenvalues corresponding to the errors estimated against those computed from the repeat data. These are the magnitudes of the errors in the direction of major eigenvectors. It can be seen that while for the fly wing data the errors are comparable (Figure 4), for the mouse mandibles there are several landmarks for which the error estimates are much larger than expected (Figure 5). We cannot argue for an increased model order as this then reduces other values to well below the observed repeatability (over fitting). As the additional variance seen is due to the inability of the model to predict correlations in the data, our conclusion must be that either this data is not well described by a linear model, or the repeatability estimate systematically underestimates the true accuracy with which points can be meaningfully located. This can happen if local image features (which are themselves not well biologically related to the main structures, such as the brightest pixel) are used to identify locations. The plot for the more complex mouse skull data (Figure 6) suggests that 14 components is about the number needed by the linear model. Hence, when experimenting with the 3D MS data we use 14 model components, while using 6 components with the 2D MM data and 2-3 components with the 2D FL and P-FL data. The 1% allowable range is set in accordance with 12 repeat data samples.

Figure 4
figure 4

Fly wing data (P-FL1): major eigenvalues of the error using our 1, 2 and 3 component models against those computed using a repeatability test on four new semi-landmarks placed manually on a subset of data; the 2 component model gives closest agreement to the expected localisation values; the two dashed lines show the ±2.8σ range.

Figure 5
figure 5

Mouse mandible data: major eigenvalues of the error estimated using our 5, 6 and 7 component models on MM1 data against those computed using the corresponding repeatability test (MM1 and MM2); the two dashed lines show the ±2.8σ range.

Figure 6
figure 6

Mouse skull data: major eigenvalues of the error estimated using our 8, 10, 12 and 14 component models for the 3D MS data against those computed using the corresponding repeatability test; the two dashed lines show the ±2.8σ range.

Monte-Carlo tests

We show the Monte-Carlo plots for the Test A in Figures 7, 8 and 9, while the Figures 10, 11 and 12 show those for the Test B. The results for the Test A on 2D data sets (Figures 7 and 8) indicate very little difference for the low parameter fly wing data, and a more obvious systematic underestimate of covariance (as expected) for the 6 dimensional mouse mandible data (prior to correction).

Figure 7
figure 7

Fly wing data (P-FL1): error eigenvalues estimated using the Monte-Carlo data (where mean shape, eigenvectors, and measurement covariances are identical to the model which generated the simulated data) against the expected ones (Test A); using 2 model components; there is only marginal evidence of estimation bias before correction;the two dashed lines show the ±2.8σ range.

Figure 8
figure 8

Mouse mandible data (MM1): error eigenvalues estimated using the Monte-Carlo data (where mean shape, eigenvectors, and measurement covariances are identical to the 6-component model which generated the simulated data) against the expected ones (Test A); for this number of parameters there is now evidence of a systematic underestimate of covariance (prior to correction); the two dashed lines show the ±2.8σ range.

Figure 9
figure 9

Mouse skull data (MS): error eigenvalues estimated using the Monte-Carlo data (where mean shape, eigenvectors, and measurement covariances are identical to the 14-component model which generated the simulated data) against the expected ones (Test A); the two dashed lines show the ±2.8σ range.

Figure 10
figure 10

Fly wing data (P-FL1): error eigenvalues estimated using the Monte-Carlo data against the expected ones (estimated using the original data) which were used when generating the simulated data; independent models (Test B); using 2 model components; the two dashed lines show the ±2.8σ range.

Figure 11
figure 11

Mouse mandible data (MM1): error eigenvalues estimated using the Monte-Carlo data against the expected ones (estimated using the original data) which were used when generating the simulated data; independent 6-component models (Test B); for this number of linear model components there is considerable error in the uncorrected estimates; the two dashed lines show the ±2.8σ range.

Figure 12
figure 12

Mouse skull data (MS): error eigenvalues estimated using the Monte-Carlo data against the expected ones (estimated using the original data) which were used when generating the simulated data; independent 14-component models (Test B); the two dashed lines show the ±2.8σ range.

Further, the results for the Test B (Figures 10 and 11) indicate that even for the mouse mandible data, the values of covariance are significantly different, due to the amplification of initial estimation bias during the process of iterative linear model estimation. The correction process now removes these instabilities bringing estimated covariances back close to the expected sampling limits and symmetrically around the expected correlation line.

Turning to the 3D MS data, for the Test A (Figure 9) the eigenvalues fall inside the allowable range (dashed lines). However for the Test B (Figure 12), the eigenvalues appear to fall under the lower bound. The under-estimation seen is in accordance with a correction factor based upon the number of samples and model complexity (KJ)/K. Unlike the earlier biases this under-estimation does not destabilise the analysis, as a common multiplicative change on all variance estimates leaves the estimated model parameters unaffected.

Note that the equivalent residual distributions estimated here from the conventional Procrustes analysis have no associated correction process and (along with uncorrected estimates from our own algorithm) are probably indicative of anything which could be attempted based upon estimating sample covariances for existing weighted methods.

Here, we compare the results obtained using Procrustes (Figures 13, 14 and 15) to those shown earlier (Figures 10, 11 and 12) using our likelihood-based method. To produce such quantitative results, we apply Procrustes to the same Monte-Carlo data which were generated based on our corrected covariances. Following Procrustes alignment, eigenvalues are computed using the remaining error residuals. These eigenvalues are then plotted against the expected ones, where values on horizontal axis are identical to those used in Figures 1012. Clearly, (and in contrast to Figures 1012) in all plots corresponding to Procrustes the measured values are not within the predicted statistical limits (dashed lines). By inference, the linear model vectors constructed using Procrustes are contaminated by random errors associated with poorly measured landmarks, as expected. When compared to the plots from our weighted method with the correction process switched off (e.g. Figure 13 compared to Figure 10), eigenvalues extracted from Procrustes residuals are further away from the expected values. As seen in the figures, changing the number of degrees of freedom of the model is also not sufficient to correct this issue. We can conclude that Procrustes generates a linear model which is a less efficient description of the true information contained in the data.

Figure 13
figure 13

Fly wing data (P-FL1): error eigenvalues computed using the residuals after Procrustes alignment on the Monte-Carlo data, against the expected ones which were used when generating the simulated data; for 2, 3 and 4 model components; the two dashed lines show the ±2.8σ range.

Figure 14
figure 14

Mouse mandible data (MM1): error eigenvalues computed using the residuals after Procrustes alignment on the Monte-Carlo data, against the expected ones which were used when generating the simulated data; for 4, 6 and 8 model components; the two dashed lines show the ±2.8σ range.

Figure 15
figure 15

Mouse skull data (MS): error eigenvalues computed using the residuals after Procrustes alignment on the Monte-Carlo data, against the expected ones which were used when generating the simulated data; for 12, 14 and 16 model components; the two dashed lines show the ±2.8σ range.

Shape analysis

In Figures 16, 17 and 18, we show the anisotropic error bars computed using the eigenvectors and eigenvalues of the 2×2 covariance matrices for the 2D data sets. All error bars are rescaled for visualisation purposes (see captions). Error bars for each landmark show the extent of an elliptical (non-isotropic) distribution around the corresponding point in the mean shape. Such distributions estimated using our method show exactly why we cannot assume isotropic distributions for the data as assumed in Procrustes. The P-FL1 data used in Figure 16 consists of 19 landmarks (15 + 4) while the FL1 data used in Figure 17 consists of 15 common landmarks only. Using these plots, one can see that the semi-landmarks have anisotropic covariances which match the expected localisation stability. Also we can see how after adding the 4 semi-landmarks the anisotropic errors estimated using our method remain stable, while with Procrustes some change both in orientation and in size, e.g. landmark points 11 and 15 (see Figure 2 for landmark numbers). These error bars are shown again in Figures 19 and 20 with the corresponding aligned data superimposed. In these figures, the extent of error distributions illustrated by the error bars are not expected to match to those illustrated by the alignment, as general biological shape variation and measurement error are independent processes. Although, localisation is determined by local shape characteristics and measurement accuracy plays a role in the overall distribution of landmarks around the mean shape. As a consequence poorly measured landmarks may have a variation about the mean that is dominated by noise.

Figure 16
figure 16

Fly wing data (P-FL1): error bars ( ×20 ) estfc(left), and computed from the residuals left using Procrustes (right); 2-component models.

Figure 17
figure 17

Fly wing data (FL1): error bars ( ×20 ) estimated using our method (left), and computed from the residuals left using Procrustes (right); 2-component models.

Figure 18
figure 18

Mouse mandible data (MM1): error bars ( ×20 ) estimated using our method (left), and computed from the residuals left using Procrustes (right); 6-component models.

Figure 19
figure 19

Fly wing data (P-FL1): error bars ( ×20 ) estimated using our method (left), and computed from the residuals left using Procrustes (right), with the corresponding aligned data superimposed; 2-component models.

Figure 20
figure 20

Mouse mandible data (MM1): error bars ( ×20 ) estimated using our method (left), and computed from the residuals left using Procrustes (right), with the corresponding aligned data superimposed; 6-component models.

Turning to 3D data, in Figure 21, we have shown the anisotropic error bars estimated using our method (sub-figures on the top row) and those computed using Procrustes residuals (sub-figures on the bottom row). In Figure 22, we have shown the corresponding aligned data using our method only, as these dense aligned data are visually quite similar for the two methods. In order to display the 3D results we have used their projections on three 2D planes in the original coordinate system.

Figure 21
figure 21

Mouse skull data: error bars ( ×30 ) estimated using our covariance-based method (top); and those computed using Procrustes residuals (bottom); 14-component models; projection planes: zy (left), xy (middle) and xz (right).

Figure 22
figure 22

Mouse skull data: aligned data obtained using a 14-component model based on our covariance-based method; projection planes: zy (left), xy (middle) and xz (right).

In both Figures 21 and 22, from the left to the right, we show the projected results on zy, xy and xz planes respectively. Using the mouse skull volume shown in Figure 3, one can see how these viewing planes (zy, xy and xz) correspond to the coronal, sagittal and transverse planes respectively. In these 42 data sets, five had a marked asymmetry of the nasal bones (affecting landmark 1), three had a partially open frontal suture (affecting landmark 3), and one exhibited both of these effects. In Figure 21, one can observe that the largest error bars estimated using our method are for the landmarks 3 and 1. This is consistent with the data clouds corresponding to these landmarks in Figure 22 where in each case some points stand away from the main cloud due to the deformations mentioned above. This is not the case for Procrustes where the error residuals left after alignment for landmarks 3 and 1 show severe underestimation. This is due to the fact that Procrustes translates strong shifts in one landmark position into smaller shifts in all landmarks. However, in this example the observed variation is largely restricted to deformations of the nasal bones (landmark 1) and partially open frontal suture (landmark 3) without displaying noticeable shape changes in other parts of the skull. Hence the larger error bars of our method give in this case a more accurate representation of the observed biological variation. This is in agreement with the results shown earlier in Figure 15 where for two landmarks Procrustes residuals are much smaller than the expected error values (standard deviations) with which the Monte-Carlo data were generated. For our method, however, estimated errors are all comparable to the expected ones as shown earlier in Figure 12. In order to compare the magnitude of errors estimated using our method to those suggested by the repeat data, one should revisit Figure 6. The figure again suggests comparable error estimations. Finally, it is clear from the zy and xz projection planes that expected symmetry is achieved to a large extent in orientation and size for most corresponding error bars (in either method).

Now we turn to further comparing our method to Procrustes in a quantitative manner. The inconsistency observed earlier in error bars corresponding to the residuals left after applying Procrustes to the fly wing data (Figures 16 and 17) is displayed more clearly using a scatter plot in Figure 23. Here we plot the eigenvalues corresponding to the 15 common landmarks after the 4 semi-landmarks are added against those without any additional landmarks. We plot this for both the likelihood and Procrustes methods. It is clear here that there are departures from the permitted scatter region when Procrustes is used. This indicates a significant change in the unexplained variance following linear model construction, which itself implies differences in the linear model itself, i.e. the Procrustes model is unstable following the addition of poorly measured landmarks.

Figure 23
figure 23

Fly wing data: error eigenvalues estimated using the likelihood method and those computed using the residuals after Procrustes alignment, when each method is applied to P-FL1 (4 semi-landmarks (16-19) added to FL1) against those when applied to FL1; the plot is for the 15 common landmarks (1-15); the Procrustes results have many points which are well beyond the expected limits, suggesting that model parameters are not consistently determined upon inclusion of semi-landmarks; the two dashed lines show the ±2.8σ range.

Further, we performed a χ2 test based upon the construction of corrected covariances on one data set (FL1/FR1) and then used for the calculation of χ2 for a second data set (FL2/FR2). The corresponding plot for χ2 test in Figure 24 confirms the stability of our method, as all χ2/D o F values fall in the range expected. Further χ2 tests (not shown here) with different numbers of data samples and combinations of data sets indicate the appropriateness of the assumed linear model for the fixed number of components.

Figure 24
figure 24

Fly wing data (15 original landmarks): the χ2/DoF ratios when our method is applied to two sets of repeat fly wing data, FL2 and FR2, using a 3-component model and fixed covariances (Figure 16: top-left) estimated earlier from the FL1 and FR1 data sets respectively; the two dashed lines show the ±2.8σ range.Fly wing data (15 original landmarks): the χ2/DoF ratios when our method is applied to two sets of repeat fly wing data, FL2 and FR2, using a 3-component model and fixed covariances (Figure16: top-left) estimated earlier from the FL1 and FR1 data sets respectively; the two dashed lines show the ±2.8σ range.

In Table 2, we list the Fisher Information (FI) value for the two methods and the three data sets studied. Again, for Procrustes, variances used to compute the FI value are obtained from the residuals left after alignment between the data and the simulated linear model. Our method, which is based on likelihood and measurement covariance, gives FI values roughly between two and four times those obtained using Procrustes. The largest difference corresponds to the 3D MS data with 14 model components. As FI is proportional to the quantity of data, this demonstrates that the changes away from the isotropic assumption inherent to Procrustes/PCA has a significant effect on the efficacy of the model, equivalent to having defined only a third as many landmarks from the outset. We can also see in this table the effect of adding 4 semi-landmarks to the 15 original landmarks. The numbers in the parentheses show the contributions from the 4 added landmarks to the total FI values. The reason for the decrease in the FI values after adding 4 landmarks is that we are using the same number of degrees of freedom (DoF) to describe correlations between more points. Also these values are computed for uncorrected covariance estimates, because correction is not available when using Procrustes.

Table 2 Fisher information (FI) values: listed for the Procrustes and our method when applied to the fly wing data (P-FL1), mouse mandible data (MM1) and 3D mouse skull data (MS)

Finally, the PCA analysis shows that in fly wing data 3 components can account for about 65% of variance, while for mouse mandibles 6 components are needed to achieve the same level. In both cases, the model order preferred by our analysis is significantly less than the heuristic limit of 90% used by some researchers.


Our analysis approach has been driven by the requirements of statistical estimation, quantitation and self consistency, i.e. distributions assumed during likelihood construction match the data and estimated parameters match those generating the data. From a more philosophical standpoint we can consider what we are doing when we identify landmark locations and attempt to compare them between sets (shapes). We do not expect that biology manipulates the locations of our chosen landmarks directly, they simply appear to move around as the net effect of distributed developmental and evolutionary influences. Recent considerations of biology have introduced the phrase “palimpsest” [52], as an analogy with repeatedly erasing and rewriting text in an ancient parchment, to describe the way that structures develop. Notice that the initial choice of landmarks is subjective, not only in terms of the features selected but also how we chose to define their locations. A landmark is the result of a localisation procedure (partly influenced by multiple biological considerations) which has an associated positional uncertainty. In this work we have associated the problems of working with semi-landmarks in biological shape analysis as being a consequence of the statistical assumptions implicit to analysis techniques such as Procrustes/PCA. We have implemented a new method which takes appropriate account of measurement and landmark localisation stability in order to obtain a new form of analysis which is consistent with a likelihood-based definition of the alignment and model building tasks. This method can be equivalently interpreted as a redefinition of the landmark location as ghost points.

The conventional interpretation of Procrustes is that the resulting linear model is a pure shape description which can be directly associated with biological processes. Some may argue that extending the approach to weight data, even to accommodate semi-landmarks, breaks with this tradition. However, it is our belief that any distinction between the original landmark and our definition of a ghost point, as locations which are somehow true measurements of biology in one case but not the other, is arbitrary. Re-weighting of data using a covariance is statistically equivalent to modifying the information available by changing the specified set of landmarks. Use of a least-squares measure (which assumes isotropic errors) does not introduce some absolute measurement of biology. Both approaches need to be calibrated using known samples with identifiable biological cause in order to make any scientific interpretation.

Now that we have a specific definition for how to weight landmark data, we can see that using ghost points does not invalidate use of Kendall’s statistics as suggested in [22]. The use of these approaches follows due to scale normalisation of the shape data, it is not an intrinsic property of the use of the original landmarks co-ordinates per-se. We can also re-project scaled (whitened) shapes onto the tangent space defined in the transformed ghost space if we wish, in order to remove local curvature arising from scale normalisation.

Far from there being no objective way to define these covariances [22, 24], there are at least three; a) one can estimate them directly from repeatability of measurements (e.g. see [48]); b) they can be directly estimated via conventional statistical means when using likelihood-based landmark location (CRB) (e.g. see [25, 26, 28]); c) they can be estimated as the unexplained stochastic variation (residuals) in fitted data (as in this paper and e.g. [27]). For the latter, when estimated using residuals of the fitted shape model, we will see contributions additional to the measurement process; this is the stochastic (therefore unmodellable) behaviour of the biology itself. Our results indicate that measurement covariances can be reliably estimated in our data for sample sizes at least as small as 40.

Our result indicate that the new method summarises the information content of the measured data better (improved FI scores), and with more stability than Procrustes/PCA (consistent models are generated following the addition of new points). Although we have not provided empirical evidence in this paper, the expected theoretical advantages of this approach are several; a) as all landmarks of fixed local structure have an associated measurement covariance, the approach described provides a consistent way of incorporating qualitatively different forms of landmark (type I, type II, semi-landmarks, geometric landmarks, etc.) into the analysis; b) provided that landmark stability is well described by a Gaussian distribution, our method removes the instabilities inherent in the analysis due to poorly determined points; c) as the parameters for the linear model are now self-consistently estimated for an identifiable generative scheme (embodied here via Monte-Carlo simulation) it affords the application of an eigenvector analysis statistical rigour; d) it offers the possibility of interpreting the linear modelling process as a statistical approximation, with consequent interpretations of the requirement for the number of linear model components; e) finally, generalisation of the approach would seem to be possible which would support the analysis of dense landmarks on surfaces and curves.

We have also demonstrated how linear model order selection can be performed by comparing baseline reproducibility errors with those estimated from the model. Finally, we have shown how the use of repeated analysis on matched samples can be used to confirm the stability of the estimated anisotropic error. We believe that these tools are sufficient to allow use of this technique in biological studies. More study is needed in order to develop an understanding of the value of our new technique in a greater range of biological analyses.

The methods described in this paper are freely available from the TINA web site [51] via the Geometric Morphometric toolkit, as a system for quality assessment and validation of output data.


aBookstein [53]: “Wherever there is partial registration the true value of a (vector deformation) is inaccessible.”

Appendix A

Rotation matrix

Based on the geometry shown in Figure 1, we first calculate the vectors v ̂ a , v ̂ b and v ̂ c .

v ̂ a = P 2 P 1 P 2 P 1 , v b = ( P 3 P 1 ) [ ( P 3 P 1 ) . v ̂ a ] v ̂ a , v ̂ c = v ̂ a × v b v b

The rotation matrix RT is hence given by:

R T = v ̂ ax v ̂ ay v ̂ az v ̂ bx v ̂ by v ̂ bz v ̂ cx v ̂ cy v ̂ cz

Roll, pitch and aw angles

The multiplication of the rotation matrices R x (γ), R y (β) and R z (α) gives


R xyz = cos β cos α cos β sin α sin β cos γ sin α + sin γ sin β cos α cos γ cos α sin γ sin β sin α sin γ cos β sin γ sin α cos γ sin β cos α sin γ cos α + cos γ sin β sin α cos γ cos β

Hence by enforcing RT=R x y z , it is straightforward to find the rotation angles α, β and γ.

Direction vectors

At each landmark point n with m n x , m n y and m n z as the mean coordinates, the rotated vector by angle α around the z axis is

m = ( cos α ) m nx ( sin α ) m ny ( sin α ) m nx + ( cos α ) m ny m nz

The first derivatives of this vector with respect to α gives

u α = ( sin α ) m nx ( cos α ) m ny ( cos α ) m nx ( sin α ) m ny 0


  1. Adams DC, Rohlf FJ, Slice DE: Geometric morphometrics: ten years of progress following the ‘revolution’. Ital J. Zool. 2004, 71: 5-16. 10.1080/11250000409356545.

    Article  Google Scholar 

  2. Klingenberg CP: Evolution and development of shape: integrating quantitative approaches. Nat Rev Gen. 2010, 11: 623-635.

    CAS  Google Scholar 

  3. Mitteroecker P, Gunz P: Advances in geometric morphometrics. Evol Biol. 2009, 36 (2): 235-247. 10.1007/s11692-009-9055-x.

    Article  Google Scholar 

  4. Vignon M, Sasal P: The use of geometric morphometrics in understanding shape variability of sclerotized haptoral structures of monogeneans (Platyhelminthes) with insights into biogeographic variability. Parasitol Int. 2010, 59 (2): 183-191. 10.1016/j.parint.2010.01.006.

    Article  PubMed  Google Scholar 

  5. Bookstein FL: Tensor biometrics for changes in cranial shape. Ann Human Biol. 1984, 11: 413-437. 10.1080/03014468400007321.

    Article  CAS  Google Scholar 

  6. Goodall CR: The Statistical Analysis of Growth in Two Dimensions. 1983, USA. Harvard University: Department of Statistics

    Google Scholar 

  7. Kendall DG: Shape-manifolds, procrustean metrics, and complex projective spaces. Bull London Math Soc. 1984, 16 (2): 81-121. 10.1112/blms/16.2.81.

    Article  Google Scholar 

  8. Bookstein FL: Size and shape spaces for landmark data in two dimensions. Stat Sci. 1986, 1: 181-242. 10.1214/ss/1177013696.

    Article  Google Scholar 

  9. Mantel NA: The detection of disease clustering and a generalized regression approach. Cancer Res. 1967, 27: 209-220.

    CAS  PubMed  Google Scholar 

  10. Peres-Neto PR, Jackson DA: How well do multivariate data sets match?The advantages of a procrustean superimposition approach over the mantel test. Oecologia. 2001, 129: 169-178. 10.1007/s004420100720.

    Article  Google Scholar 

  11. Hubert M, Rousseeuw PJ, Branden K: ROBPCA: A new approach to robust principal component analysis. Technometrics. 2005, 47: 64-79. 10.1198/004017004000000563.

    Article  Google Scholar 

  12. Gunz P, Mitteroecker P, Bookstein FL: Semilandmarks in three dimensions. Modern Morphometrics in Physical Anthropology. Edited by: Slice DE. 2005, New York: Kluwer Academic/Plenum Publishers, 73-98.

    Chapter  Google Scholar 

  13. Bookstein FL: Landmark methods for forms without landmarks: morphometrics of group differences in outline shape. Med Image Anal. 1997, 1 (2): 225-243.

    Article  CAS  PubMed  Google Scholar 

  14. Fitzpatrick JM, West JB, Maurer CR: Predicting error in rigid-body point-based registration. IEEE Trans Med Imaging. 1998, 17 (5): 694-702. 10.1109/42.736021.

    Article  CAS  PubMed  Google Scholar 

  15. Chui H, Rangarajan A: A new point matching algorithm for non-rigid registration. Comput Vision Image Underst. 2003, 89 (2-3): 114-141. 10.1016/S1077-3142(03)00009-2.

    Article  Google Scholar 

  16. Rohlf FJ, Slice DE: Extensions of the procrustes method for the optimal superimposition of landmarks. Syst Zool. 1990, 39: 40-59. 10.2307/2992207.

    Article  Google Scholar 

  17. Walker JA: Ability of geometric morphometric methods to estimate a known covariance matrix. Syst Biol. 2000, 49 (4): 686-696. 10.1080/106351500750049770.

    Article  CAS  PubMed  Google Scholar 

  18. Richtsmeier JT, Deleon VB, Lele SR: The promise of geometric morphometrics. Yearbook Phys Anthropol. 2002, 45: 63-91.

    Article  Google Scholar 

  19. Rohlf FJ: Bias and error in estimates of mean shape in geometric morphometrics. J Human Evol. 2003, 44: 665-683. 10.1016/S0047-2484(03)00047-2.

    Article  Google Scholar 

  20. Lele S: Euclidean distance matrix analysis (EDMA): estimation of mean form and mean form difference. Math Geol. 1993, 25 (5): 573-602. 10.1007/BF00890247.

    Article  Google Scholar 

  21. Martinez-Abadias N, Heuze Y, Wang Y, Jabs EW, Aldridge K, Richtsmeier JT: FGF/FGFR signaling coordinates skull development by modulating magnitude of morphological integration: evidence from Apert syndrome mouse models. PLoS One. 2011, 6 (10): 1-9.

    Article  Google Scholar 

  22. Zelditch ML, Swiderski DL, Sheets HD, Fink WL: Geometric Morphometrics for Biologists, A Primer. 2004, New York: Elsevier Academic Press

    Google Scholar 

  23. Goodall CR: Procrustes methods in the statistical analysis of shape. J R Stat Soc, Ser B (Methodological). 1991, 53 (2): 285-339.

    Google Scholar 

  24. Lele S, Richtsmeier JT: Statistical models in morphometric: are they realistic?. Syst Zool. 1990, 39: 60-69. 10.2307/2992208.

    Article  Google Scholar 

  25. Theobald DL, Wuttke DS: Empirical bayes hierarchical models for regularizing maximum likelihood estimation in the matrix gaussian procrustes problem. Proc Natl Acad Sci USA. 2006, 103 (49): 18521-18527. 10.1073/pnas.0508445103.

    Article  PubMed Central  CAS  PubMed  Google Scholar 

  26. Rohr K, Stiehl HS, Sprengel R, Buzug TM, Weese J, Kuhn MH: Landmark-based elastic registration using approximating thin-plate splines. IEEE Trans Medical Imaging. 2001, 20 (6): 526-534. 10.1109/42.929618.

    Article  CAS  PubMed  Google Scholar 

  27. Ragheb H: Morphometric shape analysis with measurement covariance estimates. Proc. British Machine Vision Conference. 2011, Dundee, UK

    Google Scholar 

  28. Ragheb H, Thacker NA: Quantitative localisation of manually defined Landmarks. Proc. Medical Image Understanding and Analysis. 2011, London, UK., 221-225.

    Google Scholar 

  29. Balachandran R: Iterative solution for rigid-body point-based registration with anisotropic weighting. Proc. of SPIE (Medical Imaging),. vol. 7261. 2009, 72613D-1–72613D-10.

    Google Scholar 

  30. Beinat A, Crosilla F: A generalized factored stochastic model for the optimal global registration of LIDAR range images. Proc. of IAPRS,. Vol. 34. 2002, 36-39.

    Google Scholar 

  31. Hyvarinen A, Karhunen J, Oja E: Independent Component Analysis. 2001, New York: John Wiley & Sons

    Book  Google Scholar 

  32. Ruto A, Lee M, Buxton B: Comparing principal and independent modes of variation in 3D human torso shape using PCA and ICA. ICA Research Network International Workshop. 2006, Liverpool, UK, 1-4.

    Google Scholar 

  33. Uzumcu M, Frangi RF, Reiber JHC, Lelieveldt BPF: Independent component analysis in statistical shape models. SPIE Med Imaging,. Volume 5032. 2003, 375-383.

    Google Scholar 

  34. Mitteroecker P, Bukstein F: Linear discrimination, ordination, and the visualization of selection gradients in Modern Morphometrics. Evol Biol. 2011, 38: 100-114. 10.1007/s11692-011-9109-8.

    Article  Google Scholar 

  35. Luo B, Hancock ER: Iterative procrustes alignment with the EM algorithm. Image Vision Comput. 2002, 20 (5): 377-396.

    Article  Google Scholar 

  36. Cootes TF, Taylor CJ, Cooper DH, Graham J: Active shape models-their training and application. Comput Vision Image Underst. 1995, 61: 38-59. 10.1006/cviu.1995.1004.

    Article  Google Scholar 

  37. Klingenberg CP: Novelty and “homology-free” Morphometrics: What’s in a name?. Evol Biol. 2008, 35 (3): 186-190. 10.1007/s11692-008-9029-4.

    Article  Google Scholar 

  38. Polly PD: Developmental dynamics and G-Matrices: Can morphometric spaces be used to model phenotypic evolution?. Evol Biol. 2008, 35 (2): 83-96. 10.1007/s11692-008-9020-0.

    Article  Google Scholar 

  39. Oxnard C, O’Higgins P: Biology clearly needs morphometrics. Does morphometrics need biology?. Biol Theory. 2009, 4: 84-97. 10.1162/biot.2009.4.1.84.

    Article  Google Scholar 

  40. Bookstein FL, Slice DE, Gunz P, Mitteroecker P: Anthropology Takes Control of Morphometrics. 2004, Vienna, Austria: Institute for Anthropology: University of Vienna

    Google Scholar 

  41. Slice DE: Geometric morphometrics. Annual Rev Anthropol. 2007, 36: 261-281. 10.1146/annurev.anthro.34.081804.120613.

    Article  Google Scholar 

  42. Perez SI, Bernal V, Gonzalez PN: Differences between sliding semi-landmark methods in geometric morphometrics, with an application to human craniofacial and dental cariation. J Anat. 2006, 208 (6): 769-784. 10.1111/j.1469-7580.2006.00576.x.

    Article  PubMed  Google Scholar 

  43. Gomez-Robles A, Olejniczak AJ, Martinon-Torres M, Prado-Simon L, de Castro JMB: Evolutionary novelties and losses in geometric morphometrics: a Practical approach through Hominin molar morphology. Evolution. 2011, 65 (6): 1772-1790. 10.1111/j.1558-5646.2011.01244.x.

    Article  PubMed  Google Scholar 

  44. Frederich B, Liu SYV, Dai CF: Morphological and genetic divergences in a coral reef damselfish, pomacentrus coelestis. Evol Biol. 2012, 39: 359-370. 10.1007/s11692-011-9158-z.

    Article  Google Scholar 

  45. Tipping ME, Bishop CM: Probabilistic principal component analysis. J R Stat Soc B. 1999, 61 (3): 611-622. 10.1111/1467-9868.00196.

    Article  Google Scholar 

  46. Akaike H: A new look at the statistical model identification. IEEE Trans Automatic Control. 1974, 19 (6): 716-723. 10.1109/TAC.1974.1100705.

    Article  Google Scholar 

  47. Dempster AP, Laird NM, Rubin DB: Maximum likelihood from incomplete data via the EM algorithm. J R Stat Soc B (Methodological). 1977, 39: 1-38.

    Google Scholar 

  48. Barlow RJ: Statistics: A Guide to the Use of Statistical Methods in the Physical Sciences. 1989, Wiley-Blackwell

    Google Scholar 

  49. Fly Wing Data. Klingenberg Lab, The University of Manchester. []

  50. Schunke AC: TINA manual landmarking tool: software for the precise digitization of 3D landmarks. Front Zool. 2012, 9 (6):

  51. Bromiley PA: The TINA Geometric Morphometrics Toolkit. []

  52. Halgrimsson B, Lieberman DE, Young NM, Parsons T, Wat S: Evolution of covariance in mammalian skull. Novartis Foundation Symposium-Tinkering: The Microevolution of Development,Volume 284. Edited by: Hall BK, Lieberman DE. 2007, New York: Wiley-Liss, 164-184.

    Google Scholar 

  53. Bookstein FL: Registration error and functional image analysis. Workshop on Biomedical Statistics. 2001, Leeds: UK

    Google Scholar 

Download references


This work was funded by institutional resources of the Max-Planck Society. The authors would like to thank Chris Klingenberg (at the University of Manchester) and Louis Boell (at the Max-Planck Institute for Evolutionary Biology) for providing the fly wing data and the mouse mandible data respectively.

Author information

Authors and Affiliations


Corresponding author

Correspondence to Hossein Ragheb.

Additional information

Competing interests

The authors declare that they have no competing interests.

Authors’ contributions

HR undertook software and methods development, performed experiments, and produced the final version of the manuscript and responses to reviews. NAT conceived the new statistical methods for shape analysis and provided technical project coordination. PAB developed the automatic landmarking software and provided maintenance of software libraries, web pages and infrastructure. DT provided overall scientific management and coordination of the project. ACS participated in acquisition of datasets including manual/automatic landmark identification. All authors read and approved the final manuscript.

Authors’ original submitted files for images

Below are the links to the authors’ original submitted files for images.

Authors’ original file for figure 1

Authors’ original file for figure 2

Authors’ original file for figure 3

Authors’ original file for figure 4

Authors’ original file for figure 5

Authors’ original file for figure 6

Authors’ original file for figure 7

Authors’ original file for figure 8

Authors’ original file for figure 9

Authors’ original file for figure 10

Authors’ original file for figure 11

Authors’ original file for figure 12

Authors’ original file for figure 13

Authors’ original file for figure 14

Authors’ original file for figure 15

Authors’ original file for figure 16

Authors’ original file for figure 17

Authors’ original file for figure 18

Authors’ original file for figure 19

Authors’ original file for figure 20

Authors’ original file for figure 21

Authors’ original file for figure 22

Authors’ original file for figure 23

Authors’ original file for figure 24

Rights and permissions

This article is published under license to BioMed Central Ltd. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (, which permits unrestricted use, distribution, and reproduction in any medium, provided the original work is properly cited.

Reprints and permissions

About this article

Cite this article

Ragheb, H., Thacker, N.A., Bromiley, P.A. et al. Quantitative shape analysis with weighted covariance estimates for increased statistical efficiency. Front Zool 10, 16 (2013).

Download citation

  • Received:

  • Accepted:

  • Published:

  • DOI: