Program DISTPCOA
Pierre Legendre Département de sciences biologiques Université de Montréal C.P. 6128, succursale Centre-ville Montréal, Québec H3C 3J7, Canada |
Marti J. Anderson Centre for Research on Ecological Impacts of Coastal Cities and School of Biological Sciences Marine Ecology Laboratories, A11 University of Sydney Sydney, NSW 2006, Australia MJAnders@bio.usyd.edu.au |
- Lingoes method: d’(i,j)2 = d(i,j)2 + 2c1 where c1 is the absolute value of the largest negative eigenvalue of the first PCoA run. Note that d(i,i) = 0.
- Cailliez method: d’(i,j) = d(i,j) + c2 where c2 is the largest eigenvalue of a special non-symmetric matrix. Note that d(i,i) = 0. The eigenvalues of the special matrix are found using a QR algorithm for real Hessenberg matrices. The subroutines (BALANC, ELMHES, and HQR) are from Chapter 11 of Numerical Recipes (Press et al., 1986).
- It may contain a raw data file, where objects (i.e. sites, replicates, etc.) are rows and variables (i.e. species or other descriptors) are columns. There is no identifier of any sort at the beginning of the file. Neither columns nor rows should have any labels whatsoever. The program asks the user how many objects and variables there are before reading the file.
- It may contain a square distance or similarity matrix computed using some other program; the diagonal is included in the matrix. There is no identifier of any sort at the beginning of the file or at the beginning of the rows. The only values in the file must be distances. The program asks the user how many objects there are in the distance matrix before reading the file.
- Input data file: a square distance or similarity matrix, or a raw data file.
- A variety of preliminary data transformations are available for the analysis of raw data files, if desired: square root (i.e. y’ = y1/2), double square root (i.e. y’ = y1/4), as well as four logarithmic transformations: y’ = ln(y), y’ = ln(y + 1), y’ = log10(y), and y’ = log10(y + 1).
- For raw input data files, users may choose to compute one of the following distances:
- Bray-Curtis distance
- sqrt(Bray-Curtis distance)
- Chi-square distance
- Hellinger distance
- Euclidean distance
- Correction for negative eigenvalues:
- Method 1 (Lingoes)
- Method 2 (Cailliez)
- No correction
Legendre, P. & M. J. Anderson. 1998b. Program DISTPCOA. Département de sciences biologiques, Université de Montréal. 10 pages.Technical notes The program is distributed in a variety of forms:
- FORTRAN source code for Macintosh (file DISTPCOA.f), which can be compiled using a FORTRAN compiler. The user may modify the Parameter statement at the beginning of the program, which fixes the size (pmax) of the largest data matrix which may be analysed.
- FORTRAN source code for DOS or Windows (file DISTPCOA.FOR), which can be compiled using a FORTRAN compiler. The user may modify the Parameter statement at the beginning of the program, which fixes the size (pmax) of the largest data matrix which may be analysed.
- Compiled version for PowerPC processors for Macintosh (file DISTPCOA/PPC). The maximum size of the data matrix is 400 objects and 400 variables. The program requires 8.4 Mb RAM for running.
- Compiled version for 68xxx processors for Macintosh, with or without co-processor (FPU) (files DISTPCOA/68k and DISTPCOA/FPU). The maximum size of the data matrix is 250 objects and 250 variables. The program requires 3.4 Mb RAM for running.
- Compiled version for IBM compatible PC (file DISTPCOA.EXE). The maximum size of the data matrix is 400 objects and 400 variables. The program has been compiled for 32-bit operating systems (i.e. Windows95 or WindowsNT) and requires 8 Mb RAM for most situations. It is preferable to have 16 Mb RAM available for calculations on very large matrices.
Gower, J. C. 1966. Some distance properties of latent root and vector methods used in multivariate analysis. Biometrika 53:325-338. Gower, J. C. & P. Legendre. 1986. Metric and Euclidean properties of dissimilarity coefficients. Journal of Classification 3:5-48. Legendre, P. & M. J. Anderson. 1999. Distance-based redundancy analysis: testing multi-species responses in multi-factorial ecological experiments. Ecological Monographs 69 (1): 1-24. Legendre, P. & Legendre, L. 1998. Numerical Ecology, 2nd English edition. Elsevier Science BV, Amsterdam. xv + 853 pages. Press, W. H., B. P. Flanery, S. A. Teukolsky & W. T. Vetterling. 1986. Numerical recipes - The art of scientific computing. Cambridge Univ. Press, Cambridge. xx + 818 p.Appendix: Test runs Consider the following input data matrix, called “test, 7x3”. It has 7 rows (sites) and 3 columns (species):
3 4 5 3 2 5 3 6 4 7 5 7 6 8 9 3 6 3 4 5 7
The output in the dialogue window is the following, using the Bray-Curtis distance and correction method 1.
File PCOORD.TXT contains the new coordinates of the 7 sites in 5 dimensions:Principal coordinate analysis with correction for negative eigenvalues, if any. Maximum size of matrix: 400 objects and descriptors Do you have a file with (1) a square Distance or Similarity matrix, or (2) raw data ? (Type -1 or -2 to get intermediate matrices printed.) 2 Name of input file with raw data? (in which columns are variables and rows are replicates) Input file name (raw data): test,7x3 How many objects? 7 How many variables? 3 Transform the raw data before computing distances? (0) No transformation (1) y’ = sqrt(y), i.e. y’ = y^0.5 (2) y’ = double sqrt(y), i.e. y’ = y^0.25 (3) y’ = ln(y) (4) y’ = ln(y + 1) (5) y’ = log10(y) (6) y’ = log10(y + 1) 0 Options: (1) Bray-Curtis distance (2) sqrt(Bray-Curtis) (3) Chi-square distance (4) Hellinger distance (5) Euclidean distance 1 Correction for negative eigenvalues, if any: 1) Method 1 (Lingoes): d’(i,j) = sqrt(d(i,j)**2 + 2*c1) 2) Method 2 (Cailliez): d’(i,j) = d(i,j) + c2 3) No correction: yields coordinates corresponding to positive eigenvalues only 1 18:02:17 *** Results of PCoA on the original distance matrix *** Trace of Gower-centred matrix = 0.15814 PCoA eigenvalues 0.10936 0.04657 0.00673 0.00017 0.00000 -0.00152 -0.00318 The largest negative eigenvalue is -0.0031792355 Sum of computed eigenvalues = 0.15814 *** Results of PcoA on corrected distance matrix *** Trace of Gower-centred matrix = 0.17721 PCoA eigenvalues 0.11254 0.04975 0.00991 0.00335 0.00166 0.00000 0.00000 Sum of computed eigenvalues = 0.17721 The number of non-zero eigenvalues is: 5 Non-zero Principal coordinates have been written to output file: “Pcoord.txt” 18:02:18 Real time spent: 0.13 seconds End of program.
-0.09732 0.03677 0.01757 -0.00996 -0.02045 -0.16516 0.11596 -0.03876 0.00110 0.00089 -0.06308 -0.08861 -0.02175 -0.01839 0.02695 0.13589 0.06345 0.05297 -0.02800 0.00535 0.21189 -0.02103 -0.05983 0.00077 -0.01204 -0.07513 -0.14534 0.02514 0.00929 -0.01342 0.05291 0.03880 0.02464 0.04518 0.01272
For Bray-Curtis distance and correction method 2, the output in the dialogue window is the following.
File PCOORD.TXT contains the new coordinates of the 7 sites in 5 dimensions:Principal coordinate analysis with correction for negative eigenvalues, if any. Maximum size of matrix: 400 objects and descriptors Do you have a file with (1) a square Distance or Similarity matrix, or (2) raw data ? (Type -1 or -2 to get intermediate matrices printed.) 2 Name of input file with raw data? (in which columns are variables and rows are replicates) Input file name (raw data): test,7x3 How many objects? 7 How many variables? 3 Transform the raw data before computing distances? (0) No transformation (1) y’ = sqrt(y), i.e. y’ = y^0.5 (2) y’ = double sqrt(y), i.e. y’ = y^0.25 (3) y’ = ln(y) (4) y’ = ln(y + 1) (5) y’ = log10(y) (6) y’ = log10(y + 1) 0 Options: (1) Bray-Curtis distance (2) sqrt(Bray-Curtis) (3) Chi-square distance (4) Hellinger distance (5) Euclidean distance 1 Correction for negative eigenvalues, if any: 1) Method 1 (Lingoes): d’(i,j) = sqrt(d(i,j)**2 + 2*c1) 2) Method 2 (Cailliez): d’(i,j) = d(i,j) + c2 3) No correction: yields coordinates corresponding to positive eigenvalues only 2 18:10:21 *** Results of PCoA on the original distance matrix *** Trace of Gower-centred matrix = 0.15814 PCoA eigenvalues 0.10936 0.04657 0.00673 0.00017 0.00000 -0.00152 -0.00318 Sum of computed eigenvalues = 0.15814 *** Create Special matrix and find its largest eigenvalue *** The largest eigenvalue of the Special matrix is 0.0380438751 *** Results of PcoA on corrected distance matrix *** Trace of Gower-centred matrix = 0.21088 PCoA eigenvalues 0.13191 0.06090 0.01325 0.00351 0.00131 0.00000 0.00000 Sum of computed eigenvalues = 0.21088 The number of non-zero eigenvalues is: 5 Non-zero Principal coordinates have been written to output file: “Pcoord.txt” 18:10:21 Real time spent: 0.15 seconds End of program.
-0.10669 0.04391 -0.01393 0.01163 -0.02278 -0.17486 0.13057 0.04492 -0.00032 0.00661 -0.07177 -0.10046 0.01498 0.00893 0.02273 0.14993 0.06591 -0.05857 0.03115 0.00733 0.22728 -0.02391 0.07344 -0.00071 -0.00801 -0.08399 -0.15847 -0.02214 -0.00253 -0.00993 0.06009 0.04243 -0.03869 -0.04815 0.00405
For Bray-Curtis distance without any correction for negative eigenvalues, the output in the dialogue window is the following.
File PCOORD.TXT contains the new coordinates of the 7 sites in the 4 dimensions corresponding to the positive eigenvalues:Principal coordinate analysis with correction for negative eigenvalues, if any. Maximum size of matrix: 400 objects and descriptors Do you have a file with (1) a square Distance or Similarity matrix, or (2) raw data ? (Type -1 or -2 to get intermediate matrices printed.) 2 Name of input file with raw data? (in which columns are variables and rows are replicates) Input file name (raw data): test,7x3 How many objects? 7 How many variables? 3 Transform the raw data before computing distances? (0) No transformation (1) y’ = sqrt(y), i.e. y’ = y^0.5 (2) y’ = double sqrt(y), i.e. y’ = y^0.25 (3) y’ = ln(y) (4) y’ = ln(y + 1) (5) y’ = log10(y) (6) y’ = log10(y + 1) 0 Options: (1) Bray-Curtis distance (2) sqrt(Bray-Curtis) (3) Chi-square distance (4) Hellinger distance (5) Euclidean distance 1 Correction for negative eigenvalues, if any: 1) Method 1 (Lingoes): d’(i,j) = sqrt(d(i,j)**2 + 2*c1) 2) Method 2 (Cailliez): d’(i,j) = d(i,j) + c2 3) No correction: yields coordinates corresponding to positive eigenvalues only 3 18:13:09 *** Results of PCoA on the original distance matrix *** Trace of Gower-centred matrix = 0.15814 PCoA eigenvalues 0.10936 0.04657 0.00673 0.00017 0.00000 -0.00152 -0.00318 The negative eigenvalues, if any, are being ignored in this analysis. Sum of computed eigenvalues = 0.15814 The number of positive eigenvalues is: 4 Principal coordinates corresponding to positive eigenvalues only have been written to output file: “Pcoord.txt” 18:13:09 Real time spent: 0.08 seconds End of program.
-0.09594 0.03558 -0.01448 0.00225 -0.16281 0.11219 0.03194 -0.00025 -0.06219 -0.08574 0.01792 0.00416 0.13396 0.06139 -0.04366 0.00633 0.20888 -0.02034 0.04930 -0.00017 -0.07406 -0.14062 -0.02072 -0.00210 0.05216 0.03754 -0.02031 -0.01022