Sloan Digital Sky Survey

Observing Operations | Reviews | Survey Management

Overview of Pipeline Integration at Fermilab

Tools and mechanisms for integration
1. Data model and FITS/ASCII parameter file standard -- Web page with exact:
  - Header keywords
  - Table fields and format
  - Documentation on meaning of keywords and field contents
  - File naming conventions
  - Usage (who creates the file, who uses the file)
2. Source code control software: "CVS"
  - Public domain -- multi-platform support
  - Complete logging of changes -- ability to revert if bugs introduced
  - Branching capability -- however branch should be used only in certain situations as appears to take expert users talking to each other to work without confusion
3. UPS/UPD binary distribution control
  - Controls naming of executables on all supported platforms
  - Allows easy distribution of executables on all supported platforms
  - Initial setup/configuration of UPS/UPD system itself is somewhat difficult
4. Tags in headers / filenames indicate version of software used and software dependencies
  - Which version of software was run on this input/output file
  - Example -- Apply Calibrations, need inputs from photo, astrom, mtpipe -- outputs: tsObj files
  - The 'rerun' tag
5. Interface change mechanism
6. Multi-platform UNIX code compilation / execution
7. Bug Database 'Gnats'
  - Excellent overall control of local bugs and enhancement requests
  - Difficult to track 'cross system bugs' or 'bugs of unknown origin'
  - Old change requests and non-critical bugs tend to pile up
8. Regression Tests -- Testbed data
  - Ensures that when pipelines are updated things don't break
  - As bugs are fixed a test can be added to ensure that they stay fixed
  - Hard work to add them -- so not always done
How Processed Data Gets into the OPDB
1. Upstream Imaging Processing: Ops-prepare, MTPIPE, Stamp Collection, Astrometry, PSP, Frames, Photometric Calibration pipelines run, individual pipeline Q/A checks pass.
2. Outputs of pipeline (but not corrected frames, atlas images, binned sky, masks) are stuffed into OPDB. (1.5 day per nights' data)
3. Merge with existing overlapping runs (2 hours per nights' data)
4. Cross run Q/A checks run (2 hours)
5. Completed rectangular chunks on sky 'resolved' for Target Selection (4 hours)
6. Target Selection Run -- handed to plates (6 hours)
7. Imaging data files exported for import into SX. (6 hours)
8. Upstream Spectroscopy Processing: Ops-prepare, Spectro-2D, spectro-1D pipelines Run
9. Links made in OPDB between object spectrum and identical imaging object. (2 hours)
10. Spectro objects exported to SX (science archive).
Human Intervention Steps
1. Q/A at end of each pipeline
2. During Final photometric calibrations/MTpipe reductions (hope to reduce)
3. Q/A during overlapping runs
4. Reprocessing of old data which no longer conforms to current data model
5. Feedback to mountain
  - File/Tape format problems, iop
  - Calibration files
  - Confirmation of 'done' for imaging stripe or plate -- There is a lag
6. Feedback to pipeline developers
  - Bug reports, wish requests, file format problems
7. Feedback to future observing schedulers
  - Which data is really really good, which needs to be redone
  - Imaging: Which sections of which stripes ok -- Time Lag
  - Spectroscopy: Which plates are complete -- Time Lag
How Data is determined to be Good
1. Imaging -- 1.5'' seeing, matchups between columns (but in practice, put everything through to FWHM 3'', crashes if sky is too variable)
2. Spectroscopy -- On mountain S/N per fiber plot
Move data to the output 'Data Products'
1. Creation of preliminary 'calibration object files' tsObj files for early science and early problem diagnosis on FNAL machines
2. SCP/FTP shove of tsObj files to collaboration members (data volume, disk on remote end, version control)
3. Loading of SX with preliminary 'calibrated object catalog' -- No atlas yet -- calibrated version control
4. Success judged number of quality science papers written and by feedback of important bugs to pipeline developers
5. Creation of 'final' 'calibrated object files'
6. SCP shove of final calibrated object files
7. Loading of SX with final calibrated object catalog, access to atlas images and related spectra
8. Creation of general public distribution CD-ROMs and/or Internet access site.
  - Example: large area color GIF image maps -- also used internally
9. Corrected frames moved to tape and/or tape robot, as are data which cannot be kept spinning for lack of disk space