Algorithms: Spectroscopic Redshift and Type Determination

Emission-Line Redshifts
Cross-Correlation Redshifts
Final Redshifts and Spectrum Classification (from lines)
Spectrum Classification Using Eigenspectra (eClass)
Manual Inspection of Spectra

The spectro1d pipeline analyzes the combined, merged spectra output by spectro2d and determines object classifications (galaxy, quasar, star, or unknown) and redshifts; it also provides various line measurements and warning flags. The code attempts to measure an emission and absorption redshift independently for every targeted (nonsky) object. That is, to avoid biases, the absorption and emission codes operate independently, and they both operate independently of any target selection information.

The spectro1d pipeline performs a sequence of tasks for each object spectrum on a plate: The spectrum and error array are read in, along with the pixel mask. Pixels with mask bits set to FULLREJECT, NOSKY, NODATA, or BRIGHTSKY are given no weight in the spectro1d routines. The continuum is then fitted with a fifth-order polynomial, with iterative rejection of outliers (e.g., strong lines). The fit continuum is subtracted from the spectrum. The continuum-subtracted spectra are used for cross-correlating with the stellar templates.

Emission-Line Redshifts

Emission lines (peaks in the one-dimensional spectrum) are found by carrying out a wavelet transform of the continuum-subtracted spectrum f_c(

where g(x; a, b) is the wavelet (with complex conjugate

) with translation and scale parameters a and b. We apply the à trous wavelet (Starck, Siebenmorgen, & Gredel 1997). For fixed wavelet scale b, the wavelet transform is computed at each pixel center a; the scale b is then increased in geometric steps and the process repeated. Once the full wavelet transform is computed, the code finds peaks above a threshold and eliminates multiple detections (at different b) of a given line by searching nearby pixels. The output of this routine is a set of positions of candidate emission lines.

This list of lines with nonzero weights is matched against a list of common galaxy and quasar emission lines, given in this line list, many of which were measured from the composite quasar spectrum of Vanden Berk et al.(2001; because of velocity shifts of different lines in quasars, the wavelengths listed do not necessarily match their rest-frame values). Each significant peak found by the wavelet routine is assigned a trial line identification from the common list (e.g., Mg II) and an associated trial redshift. The peak is fitted with a Gaussian, and the line center, width, and height above the continuum are stored in HDU 2 of the spSpec*.fits files as parameters wave, sigma, and height, respectively. If the code detects close neighboring lines, it fits them with multiple Gaussians. Depending on the trial line identification, the line width it tries to fit is physically constrained. The code then searches for the other expected common emission lines at the appropriate wavelengths for that trial redshift and computes a confidence level (CL) by summing over the weights of the found lines and dividing by the summed weights of the expected lines. The CL is penalized if the different line centers do not quite match. Once all of the trial line identifications and redshifts have been explored, an emission-line redshift is chosen as the one with the highest CL and stored as z in the spSpec*.fits emission line HDU. The exact expression for the emission-line CL has been tweaked to match our empirical success rate in assigning correct emission-line redshifts, based on manual inspection of a large number of spectra from the EDR.

The "measured lines" HDU 2 also gives the errors, continuum, equivalent width, 2, spectral index, and significance of each line. We caution that the emission-line measurement for H should only be used if < 2.5. The "found" lines in HDU1 denote only those lines used to measure the emission-line redshift, while "measured" lines in HDU2 are all lines in the emission-line list measured at the redshifted positions appropriate to the final redshift assigned to the object.

A separate routine searches for high-redshift (z > 2.3) quasars by identifying spectra that contain a Ly forest signature: a broad emission line with more fluctuation on the blue side than on the red side of the line. The routine outputs the wavelength of the Ly emission line; while this allows a determination of the redshift, it is not a high-precision estimate, because the Ly line is intrinsically broad and affected by Ly absorption. The spectro1d pipeline stores this as an additional emission-line redshift.

If the highest CL emission-line redshift uses lines only expected for quasars (e.g., Ly, C IV, C III], then the object is provisionally classified as a quasar. These provisional classifications will hold up if the final redshift assigned to the object (see below) agrees with its emission redshift.

Cross-Correlation Redshift

The spectra are cross-correlated with stellar, emission-line galaxy, and quasar template spectra to determine a cross-correlation redshift and error. The cross-correlation templates are obtained from SDSS commissioning spectra of high signal-to-noise ratio and comprise roughly one for each stellar spectral type from B to almost L, a nonmagnetic and a magnetic white dwarf, an emission-line galaxy, a composite LRG spectrum, and a composite quasar spectrum (from Vanden Berk et al. 2001). The composites are based on co-additions of 2000 spectra each. The template redshifts are determined by cross-correlation with a large number of stellar spectra from SDSS observations of the M67 star cluster, whose radial velocity is precisely known. See the cross-correlation templates.

When an object spectrum is cross-correlated with the stellar templates, its found emission lines are masked out, i.e., the redshift is derived from the absorption features. The cross-correlation routine follows the technique of Tonry & Davis (1979): the continuum-subtracted spectrum is Fourier-transformed and convolved with the transform of each template. For each template, the three highest cross-correlation function (CCF) peaks are found, fitted with parabolas, and output with their associated confidence limits. The corresponding redshift errors are given by the widths of the CCF peaks. The cross-correlation CLs are empirically calibrated as a function of peak level based on manual inspection of a large number of spectra from the EDR. The final cross-correlation redshift is then chosen as the one with the highest CL from among all of the templates.

The cross-correlation redshift is stored as z in the cross-correlation redshift HDU.

Final Redshifts and Spectrum Classification

The spectro1d pipeline assigns a final redshift to each object spectrum by choosing the emission or cross-correlation redshift with the highest CL and stores this as z in the spSpec*.html primary header. A redshift status bit mask (zStatus) and a redshift warning bit mask (zWarning) are stored. The CL is stored in zConf spSpec*.fits primary header. Objects with redshifts determined manually (see below) have CL set to 0.95 (MANUAL_HIC set in zStatus), or 0.4 or 0.65 (MANUAL_LOC set in zStatus). Rarely, objects have the entire red or blue half of the spectrum missing; such objects have their CLs reduced by a factor of 2, so they are automatically flagged as having low confidence, and the mask bit Z_WARNING_NO_BLUE or Z_WARNING_NO_RED is set in zWarning as appropriate.

All objects are classified as either a quasar, high-redshift quasar, galaxy, star, late-type star, or unknown. If the object has been identified as a quasar by the emission-line routine, and if the emission-line redshift is chosen as the final redshift, then the object retains its quasar classification. If the object has a final redshift z > 2.3 (so that Ly is or should be present in the spectrum), and the Ly redshift extimator agrees on this, then it is classified as a high-z quasar. If the object has a redshift cz < 450 km s^-1, then it is classified as a star. If the final redshift is obtained from one of the late-type stellar cross-correlation templates, it is classified as a late-type star. If the object has a cross-correlation CL < 0.25, it is classified as unknown.

There exist among the spectra a small number of composite objects. Most common are bright stars on top of galaxies, but there are also galaxy-galaxy pairs at distinct redshifts, and at least one galaxy-quasar pair, and one galaxy-star pair. Most of these have the zWarning flag set, indicating that more than one redshift was found.

The zWarning bit mask mentioned above records problems that the spectro1d pipeline found with each spectrum. It provides compact information about the spectra for end users, and it is also used to trigger manual inspection of a subset of spectra on every plate. There is a zWarning bits table. Users should particularly heed warnings about parts of the spectrum missing, low signal-to-noise ratio in the spectrum, significant discrepancies between the various measures of the redshift, and especially low confidence in the redshift determination. In addition, redshifts for objects with zStatus = FAILED should not be used.

Spectral Classification Using Eigenspectra

In addition to spectral classification based on measured lines, galaxies are classified by a Principal Component Analysis (PCA), using cross-correlation with eigentemplates constructed from SDSS spectroscopic data. The 5 eigencoefficients and a classification number are stored in eCoeff and eClass, respectively, in the spSpec files. eClass, a single-parameter classifier based on the expansion coefficients (eCoeff1-5), ranges from about -0.35 to 0.5 for early- to late-type galaxies.

A number of changes to eClass have occurred since the EDR. The galaxy spectral classification eigentemplates for DR1 are created from a much larger sample of spectra than were used in the Stoughton et al. EDR paper, and now number approximately 200,000. The eigenspectra used in DR1 are an early version of those created by Yip et al. (in prep). The sign of the second eigenspectrum has been reversed with respect to that of EDR; therefore we recommend using the expression
atan(-eCoeff2/eCoeff1)
rather than eClass as the single-parameter classifier.

Manual Inspection of Spectra

A small percentage of spectra on every plate are inspected manually, and if necessary, the redshift, classification, zStatus, and CL are corrected. We inspect those spectra that have zWarning or zStatus indicating that there were multiple high-confidence cross-correlation redshifts, that the redshift was high (z > 3.2 for a quasar or z > 0.5 for a galaxy), that the confidence was low, that signal-to-noise ratio was low in r, or that the spectrum was not measured. All objects with zStatus = EMLINE_HIC or EMLINE_LOC, i.e., for which the redshift was determined only by emission lines, are also examined. If, however, the object has a final CL > 0.98 and zStatus of either XCORR_EMLINE or EMLINE_XCORR, then despite the above, it is not manually checked. All objects with either specClass = SPEC_UNKNOWN or zStatus = FAILED are manually inspected.

Roughly 8% of the spectra in the EDR were thus inspected, of which about one-eighth, or 1% overall, had the classification, redshift, zStatus, or CL manually corrected. Such objects are flagged with zStatus changed to MANUAL_HIC or MANUAL_LOC, depending on whether we had high or low confidence in the classification and redshift from the manual inspection. Tests on the validation plates, described in the next section, indicate that this selection of spectrafor manual inspection successfully finds over 95% of the spectra for which the automated pipeline assigns an incorrect redshift.

Last modified: Fri Apr 4 13:04:54 CST 2003

Send your questions to sdss-helpdesk@fnal.gov