### Implementation of shape based assignment

Landmark-based geometric morphometric methods were used to capture information about shape, by obtaining the x and y coordinates of homologous landmarks in the configuration shown in Figure 3. Differences among specimen in the sets of coordinates due to scaling, rotation and translation were removed using the typical geometric morphometric approach [22–25, 12] of placing the specimens in Partial Procrustes Superimposition [24–26] on the iteratively estimated mean reference form, using the Generalized Procrustes Analysis procedure. This procedure places the shapes of specimens in a linear tangent space to Kendall's shape space [27], allowing the use of linear multivariate statistical methods [23, 28, 24, 12]. After superimposition, the data were converted from Cartesian coordinate form into components along the eigenvectors of the bending energy matrix (Principal Warp axes) of the thin-plate spline model of deformations of the reference [29, 22] and along the uniform axes of deformation due to shear and dilation [30]. Use of these linearly transformed variables (referred to as Partial Warp plus Uniform Component scores), produces a convenient set of variables (using a basis set called the Principal Warp axes) for use with standard multivariate statistical methods, since the Partial Warp and Uniform component scores have the same number of variables per specimen as degrees of freedom. No information is lost during this linear transformation of variables.

A canonical variates analysis (CVA) is then used to determine the set of axes which best discrimate among pre-defined groups of specimens, by determining the linear combinations of the original variables which display the greatest variance between groups relative to the variance within the groups [31, 12]. Fisher's linear discriminant function was used, which makes no particular assumption about the parametric form of the distribution of the data used, but simply determines the linear combination of the original variables that results in the greatest ratio of the between groups sum of squares to the within groups sum of squares [32]. A simple distance-based approach is then used to determine which group each specimen belongs to, based on the canonical variate scores. The predicted group membership of each specimen based on the CVA scores is determined by assigning each specimen to the group whose mean is closest (measured as the square root of summed squared distances along the CV axes, see [32]) to the specimen. To obtain a measure of the quality of the assignment of each specimen to a group, an assignment test was developed. The CVA axes can always be used to assign any given specimen to some group, since a minimum distance can always be found but a measure of whether the quality of the assignment is similar to that expected for specimens known to be in that particular group is desirable. The assignment test presented here is modeled on the genetic distance-based assignment test [33, 18]. The distribution of distances produced by a Monte Carlo simulation (see discussion in [34]) is used to determine if the observed distance of a given specimen is consistent with the null model of random variation around the mean of the group to which the specimen is assigned to. The distance from a specimen to a group mean can then be assigned a p-score which describes how likely it would be for a specimen from the original population to be as far from the mean specimen as the observed specimen is (under the null model used in the Monte-Carlo simulation). If the p score is smaller than 5%, then we can assert that there is a less than 5% chance that random variation could have produced a distance as large as that of the particular specimen from the group mean, and hence that the assignment of that specimen to the group is in doubt.

It should be noted that in a study with many specimens, a number of them will have low p-scores by chance, and so to assess the validity of the assignments of the set as a whole, the researcher should assess the number of specimens expected to have p values less than 5%. It will then be possible to determine if the observed number of low p values exceeds that expected by chance. The model used in the Monte Carlo simulation of the distribution of distances of specimens within a group around the group mean (the average specimen within the group) is based on a normal model of the distribution of the CVA scores of each group about the mean of that group. For a given group, it appears probable that the CVA scores along each CVA axes for the specimens within the group are correlated, thus there exists within each group a covariance structure to the CVA scores of specimens within the group. In carrying out the Monte-Carlo simulation of the distribution of specimens within the group about the mean, it is necessary to preserve this covariance structure in order to produce a valid model of the distribution. An eigenvalue decomposition of the variance-covariance matrix of within group CVA scores is used to find the principal component axes of the within group variance. This yields the same number of variables as the CVA scores, but now with uncorrelated axes (the eigenvectors), each of which has a variance given by the corresponding eigenvalue. The model used for the distribution of the CVA scores of the specimens assumes the group has an independent random normal distribution along each of these eigenvectors (principle component axes), with amplitudes given by the square roots of the eigenvalues (the eigenvalues are the variances of the group along the corresponding eigenvectors), so that the square root of the eigenvalue is the standard deviation of the population along that eigenvector.

An independent, normal distribution with known amplitude (the square root of the eigenvalue) is assumed along each eigenvector. This allows generation of a Monte Carlo population of specimens, assuming the independent normal distribution along these principal component axes. Each simulated specimen is generated using a random number generator to compute locations along the eigenvectors, which are then translated back into CVA axes scores (a simple linear translation of basis vectors). The Monte Carlo generated CVA axis scores will have the same mean and variance-covariance structure as the original population did. Using an independent multinomial distribution model of the CVA axes scores in the Monte-Carlo simulation (by using the random number generater to directly generate the CVA scores) would not preserve any covariation structure in the data. If there is no significant covariation structure to the CVA scores, the use of the PCA axes is not necessary, but will not induce covariations.

The distance from the group mean is then calculated for each simulated specimen. The Monte Carlo distribution of distances of specimens about the group mean under the null model of random variation can then be used as an estimate of the distribution of distances about the group mean in the original data. Based on the estimated distributions of distances about the group means expected for specimens in the group, p-values may be determined for the assignment of specimens, with either initially known or unknown group affinities. Based on an alpha level of p = 0.05, all assignments of specimens to groups can be scored as either statistically significant or not.

As a test of the performance of the assignment test, a cross validation or jackknife procedure [34, 35] was implemented. Unlike a standard jackknife where only one specimen at a time is omitted, a "delete-d" jackknife [35] was used in which d specimens at a time were omitted. Under the delete-d jackknife, a variable percentage of individuals from a dataset are left out during the CVA procedure, and then assigned to groups as "unknown" specimens. The specimens treated as unknowns during the jackknife procedure are also assigned an assignment p-value during this procedure. High success rates under the delete-d jackknife resampling indicate that the differentiation among the involved groups is sufficient to be diagnostic. This implies that the discriminant axes capture enough information to assign individuals of the given groups, and form a reasonable estimate of the distribution of distances based on the Monte Carlo procedure. The jackknife procedure also allows estimation of the number of individuals needed to obtain meaningful CVA axes, and distance distribution estimates.

### The sculpin data set

Here we employ the methods outlined above to study the differentiation in shape that occurs among divergent populations and their naturally occurring hybrids. Population affinity and hybrid status are independently derived from genetic markers. Note that the specimen are taken from natural populations and occur syntopically in the studied hybrid zones (Table 3). Sculpins were sampled across an area of secondary contact of invasive and stream sculpins (*C. perifretum* and *C. rhenanus*) in the Lower Rhine basin, which is situated at the confluence of the Stream Broel with the River Sieg (Table 3). An extra population of stream sculpins was sampled from the stream Naaf (also a tributary to the River Sieg drainage).

All specimens were genotyped for 45 microsatellite loci [36]. The loci were chosen for their particularly high information content for our study system following the approach of Shriver et al. [37] using Whichloci [38]. We have used a preliminary genetic map of *Cottus* [39], to verify that our set of microsatellite markers does not contain pairs of loci that are tightly physically linked. The genotypic data allow to unambiguously classify individuals to belong to pure populations or to identify them as hybrids with a mixed ancestry using the methods outlined in Falush et al. [16]. The program Structure 2.1 [40] yielded consistent results in independent runs (burnin: 20000; sampling iterations 100000, correlated allele frequency model allowing for an individual alpha and different F_{ST} for each subpopulation) according to which the genetic ancestry of individual could be determined (see Additional file: 1, for genotypic data of those specimen included here). The classification based on genotypic data was highly congruent with data from distribution and morphology.

Two independent populations of stream sculpins confined to separate streams (Stream Naaf, Stream Broel) both being devoid of skin prickling were recovered. A third population can be identified, which represents the invasive sculpin. Invasive sculpins generally occur within the main channel of the River Sieg and display pronounced skin prickling [10]. Hybrids among Stream Broel sculpins with the invasive sculpins were only found at the confluence where the Broel merges with the River Sieg (Table 3). A detailed study of these hybrid zones, particularly on the geographic extension, is currently in preparation (Nolte *et al.* in prep). Of particular relevance in this context is the fact that hybrid sculpins occur syntopically with their parental populations within the hybrid ones (Table 3; Sites 2, 3, 4). For the morphometric analysis we grouped specimens that were found to belong to pure populations from the genotypic data into those corresponding to the invasive sculpins (invasive) and to the two stream sculpin populations (Streams Naaf and Broel). To allow for some uncertainty in the estimates we used those specimens that were found to be at least 97% pure in the structure analysis. These populations are represented here by 117 Stream Broel sculpins, 76 Stream Naaf sculpins and 40 Invasive sculpins (Table 3). In contrast, hybrids represent a somewhat inhomogeneous group consisting of various degrees of ancestry. To restrict this analysis to those specimens that have a pronounced hybrid genotype and to exclude later generation backcrosses that might have been subject to repeated rounds of natural selection (as this could blur possible transgressive effects) we decided to exclude hybrids with less than 25% ancestry in one of the ancestral populations. Based on genotypic data, we were able to identify 62 BI hybrids (mixed Stream Broel/Invasive ancestries, less than 75% pure ancestry).

Images of specimens were taken with a digital camera fixed on a stage so that the midsagittal body plane was as much as possible perpendicular to the image plane. Fourteen morphological landmarks were used to capture the shape of each individual (suppl. Table 2) using TPSdig [41]. The positions of the tip of the nasal (#1), nares (#2), interorbital pores (#3), dorsal fin I origin (#4), dorsal fin II origin (#5), dorsal fin II end (#6), upper caudal fin origin (#7), lower caudal fin origin (#8), anal fin end (#9), anal fin origin (#10), ventral fin origin (#11), upper origin of the gill opening (#12), opercular spine (#13) and posterior end of the maxilla (#14) were used as landmarks (Figure 3). The morphometric analysis was conducted using the IMP package according to the methods outlined above [42]. Shape based assignment tests were conducted with CVAgen6N (part of IMP). In order to estimate possible confounding effects of allometric growth and sexual dimorphism these variables were determined individually. A scale bar was photographed besides each specimen as a size reference and sex was determined for individuals larger than 45 mm by examination of the genital papilla (see Additional file: 2).