Algorithms: Tiling of spectroscopy plates

Tiling is the process by which the spectroscopic plates are designed and placed relative to each other. This procedure involves optimizing both the placement of fibers on individual plates, as well as the placement of plates (or tiles) relative to each other.

Introduction

Spectroscopy Survey Overview

What is Tiling?

Fiber Placement

Dealing with Fiber Collisions

Tile Placement

Technical Details

Introduction

Because of large-scale structure in the galaxy distribution (which form the bulk of the SDSS targets), a naive covering of the sky with equally-spaced tiles does not yield uniform sampling. Thus, we present a heuristic for perturbing the centers of the tiles from the equally-spaced distribution to provide more uniform completeness. For the SDSS sample, we can attain a sampling rate of >92% for all targets, and >99% for the set of targets which do not collide with each other, with an efficiency >90% (defined as the fraction of available fibers assigned to targets).

Much of the content of this page can be found as a preprint on astro-ph.

The Spectroscopic Survey

The spectroscopic survey is performed using two multi-object fiber spectrographs on the same telescope. Each spectroscopic fiber plug plate, referred to as a ``tile,'' has a circular field-of-view with a radius of 1.49 degrees, and can accommodate 640 fibers, 48 of which are reserved for observations of blank sky and spectrophotometric standards.Because of the finite size of the fiber plugs, the minimum separation of fiber centers is 55''. If, for example, two objects are within 55'' of each other, both of them can be observed only if they lie in the overlap between two adjacent tiles. The goal of the SDSS is to observe 99% of the maximal set of targets which has no such collisions (about 90% of all targets).

What is Tiling?

Around 2,000 tiles will be necessary to provide fibers for all the targets in the survey. Since each tile which must be observed contributes to the cost of the survey (due both to the cost of production of the plate and to the cost of observing time), we desire to minimize the number of tiles necessary to observe all the desired targets. In order to maximize efficiency (defined as the fraction of available fibers assigned to tiled targets) when placing these tiles and assigning targets to each tile, we need to address two problems. First, we must be able to determine, given a set of tile centers, how to optimally assign targets to each tile --- that is, how to maximize the number of targets which have fibers assigned to them. Second, we must determine the most efficient placement of the tile centers, which is non-trivial because the distribution of targets on the sky is non-uniform, due to the well-known clustering of galaxies on the sky. We find the exact solution to the first problem and use a heuristic method developed by Lupton et al. (1998) to find an approximate solution to the second problem (which is NP-complete). The code which implements this solution is designed to run on a patch of sky consisting of a set of rectangles in a spherical coordinate system, known in SDSS parlance as a tiling region.

NOTE: the term ``chunk'' or ``tiling chunk'' is sometimes used to denote a tiling region. To avoid confusion with the correct use of the term chunk, we use ``tiling region'' here.

Fiber Placement

First, we discuss the allocation of fibers given a set of tile centers, ignoring fiber collisions for the moment. Figure 1 shows at the left a very simple example of a distribution of targets and the positions of two tiles we want to use to observe these targets. Given that for each tile there is a finite number of available fibers, how do we decide which targets get allocated to which tile? This problem is equivalent to a network flow problem, which computer scientists have been kind enough to solve for us already.


Figure 1: Simplified Tiling and Network Flow View

The basic idea is shown in the right half of Figure 1, which shows the appropriate network for the situation in the left half. Using this figure as reference, we here define some terms which are standard in combinatorial literature and which will be useful here:

node: The nodes are the solid dots in the figure; they provide either sources/sinks of objects for the flow or simply serve as junctions for the flow. For example, in this context each target and each tile corresponds to a node.
arc: The arcs are the lines connecting the nodes. They show the paths along which objects can flow from node to node. In Figure 1, it is understood that the flow along the arc proceeds to the right. For example, the arcs traveling from target nodes to tile nodes express which tiles each target may be assigned to.
capacity: The minimum and maximum capacity of each arc is the minimum and maximum number of objects that can flow along it. For example, because each tile can accommodate only 592 target fibers, the capacities of the arcs traveling from the tile nodes to the sink node is 592.
cost: The cost per object along each arc is exacted for allowing objects to flow down a particular arc; the total cost is the summed cost of all the arcs. In this paper, the network is designed such that the minimum total cost solution is the desired solution.

Imagine a flow of 7 objects entering the network at the source node at the left. We want the entire flow to leave the network at the sink node at the right for the lowest possible cost. The objects travel along the arcs, from node to node. Each arc has a maximum capacity of objects which it can transport, as labeled. (One can also specify a minimum number, which will be useful later). Each arc also has an associated cost, which is exacted per object which is allowed to flow across that arc. Arcs link the source node to a set of nodes corresponding to the set of targets. Each target node is linked by an arc to the node of each tile it is covered by. Each tile node is linked to the sink node by an arc whose capacity is equal to the number of fibers available on that tile. None of these arcs has any associated cost. Finally, an ``overflow'' arc links the source node directly to the sink node, for targets which cannot be assigned to tiles. The overflow arc has effectively infinite capacity; however, a cost is assigned to objects flowing on the overflow arc, guaranteeing that the algorithm fails to assign targets to tiles only when it absolutely has to. This network thus expresses all the possible fiber allocations as well as the constraints on the numbers of fibers in each tile. Finding the minimum cost solution then maximizes the number of targets which are actually assigned to tiles.

Dealing with Fiber Collisions

As described above, there is a limit of 55'' to how close two fibers can be on the same tile. If there were no overlaps between tiles, these collisions would make it impossible to observe ~10% of the SDSS targets. Because the tiles are circular, some fraction of the sky will be covered with overlaps of tiles, allowing some of these targets to be recovered. In the presence of these collisions, the best assignment of targets to the tiles must account for the presence of collisions, and strive to resolve as many as possible of these collisions which are in overlaps of tiles. We approach this problem in two steps, for reasons described below. First, we apply the network flow algorithm of the above section to the set of ``decollided'' targets --- the largest possible subset of the targets which do not collide with each other. Second, we use the remaining fibers and a second network flow solution to optimally resolve collisions in overlap regions.

Figure 2: Fiber Collisions

The "decollided" set of targets is the maximal subset of targets which are all greater than 55'' from each other. To clarify what we mean by this maximal set, consider Figure 2. Each circle represents a target; the circle diameter is 55'', meaning that overlapping circles are targets which collide. The set of solid circles is the ``decollided'' set. Thus, in the triple collision at the top, it is best to keep the outside two rather than the middle one.

This determination is complicated slightly by the fact that some targets are assigned higher priority than others. For example, as explained in the Targetting pages, QSOs are given higher priority than galaxies by the SDSS target selection algorithms. What we mean here by ``priority'' is that a higher priority target is guaranteed never to be eliminated from the sample due to a collision with a lower priority object. Thus, our true criterion for determining whether one set of assignments of fibers to targets in a group is more favorable than another is that a greater number of the highest priority objects are assigned fibers.

Once we have identified our set of decollided objects, we use the network flow solution to find the best possible assignment of fibers to that set of objects.

After allocating fibers to the set of decollided targets, there will usually be unallocated fibers, which we want to use to resolve fiber collisions in the overlaps. We can again express the problem of how best to perform the collision resolution as a network, although the problem is a bit more complicated in this case. In the case of binaries and triples, we design a network flow problem such that the network flow solution chooses the tile assignments optimally. In the case of higher multiplicity groups, our simple method for binaries and triples does not work and we instead resolve the fiber collisions in a random fashion; however, fewer than 1% of targets are in such groups, and the difference between the optimal choice of assignments and the random choices made for these groups is only a small fraction of that.

We refer the reader to the tiling algorithm paper for more details, including how the fiber collision network flow is designed and caveats about what aspects of the method may need to be changed under different circumstances.

Tile Placement

Once one understands how to assign fibers given a set of tile centers, one can address the problem of how best to place those tile centers. Our method first distributes tiles uniformly across the sky and then uses a cost-minimization scheme to perturb the tiles to a more efficient solution.

In most cases, we set initial conditions by simply laying down a rectangle of tiles. To set the centers of the tiles along the long direction of the rectangle, we count the number of targets along the stripe covered by that tile. The first tile is put at the mean of the positions of target 0 and target N_t, where N_t is the number of fibers per tile (592 for the SDSS). The second tile is put at the mean between target N_t and 2N_t, and so on. The counting of targets along adjacent stripes is offset by about half a tile diameter in order to provide more complete covering.

The method is of perturbing this uniform distribution is iterative. First, one allocates targets to the tiles, but instead of limiting a target to the tiles within a tile radius, one allows a target to be assigned to further tiles, but with a certain cost which increases with distance (remember that the network flow accommodates the assignment of costs to arcs). One uses exactly the same fiber allocation procedure as above. What this does is to give each tile some information about the distribution of targets outside of it. Then, once one has assigned a set of targets to each tile, one changes each tile position to that which minimizes the cost of its set of targets. Then, with the new positions, one reruns the fiber allocation, perturbs the tiles again, and so on. This method is guaranteed to converge to a minimum (though not necessarily a global minimum), because the total cost must decrease at each step.

In practice, we also need to determine the appropriate number of tiles to use. Thus, using a standard binary search, we repeatedly run the cost-minimization to find the minimum number of tiles necessary to satisfy the SDSS requirements, namely that we assign fibers to 99% of the decollided targets.

In order to test how well this algorithm works, we have applied it both to simulated and real data. These results are discussed in the Tiling paper.

Technical Details

There are a few technical details which may be useful to mention in the context of SDSS data. Most importantly, we will describe which targets within the SDSS are ``tiled'' in the manner described here, and how such targets are prioritized. Second, we will discuss the method used by SDSS to deal with the fact that the imaging and spectroscopy are performed within the same five-year time period. Third, we will describe the tiling outputs which the SDSS tracks as the survey progresses. Throughout, we refer to the code which implements the algorithm described above as tiling.

Only some of the spectroscopic target types identified by the target selection algorithms in the SDSS are ``tiled.'' These types (and their designations in the primary and secondary target bitmasks) are described in the Targetting pages). They consist of most types of QSOs, main sample galaxies, LRGs, hot standard stars, and brown dwarfs. These are the types of targets for which tiling is run and for which we are attempting to create a well-defined sample. Once the code has guaranteed fibers to all possible ``tiled targets,'' remaining fibers are assigned to other target types by a separate code.

All of these target types are treated equivalently, except that they assigned different ``priorities,'' designated by an integer. As described above, the tiling code uses them to help decide fiber collisions. The sense is that a higher priority object will never lose a fiber in favor of a lower priority object. The priorities are assigned in a somewhat complicated way for reasons immaterial to tiling, but the essence is the following: the highest priority objects are brown dwarfs and hot standards, next come QSOs, and the lowest priority objects are galaxies and LRGs. QSOs have higher priority than galaxies because galaxies are higher density and have stronger angular clustering. Thus, allowing galaxies to bump QSOs would allow variations in galaxy density to imprint themselves into variations in the density of QSOs assigned to fibers, which we would like to avoid. For similar reasons, brown dwarfs and hot standard stars (which have extremely low densities on the sky) are given highest priority.

Each tile, as stated above, is 1.49 degrees in radius, and has the capacity to handle 592 tiled targets. No two such targets may be closer than 55'' on the same tile.

The operation of the SDSS makes it impossible to tile the entire 10,000 square degrees simultaneously, because we want to be able to take spectroscopy during non-pristine nights, based on the imaging which has been performed up to that point. In practice, periodically a ``tiling region'' of data is processed, calibrated, has targets selected, and is passed to the tiling code. During the first year of the SDSS, about one tiling region per month has been created; as more and more imaging is taken and more tiles are created, we hope to decrease the frequency with which we need to make tiling regions, and to increase their size.

A tiling region is defined as a set of rectangles on the sky (defined in survey coordinates). All of these rectangles cover only sky which has been imaged and processed. However, in the case of tiling, targets may be missed near the edges of a tiling region because that area was not covered by tiles. Thus, tiling is actually run on a somewhat larger area than a single tiling region, so the areas near the edges of adjacent tiling regions are also included. This larger area is known as a tiling region. Thus, in general, tiling regions overlap.

The first tiling region which is ``supported'' by the SDSS is denoted Tiling Region 4. The first tiling region for which the version of tiling described here was run is Tiling Region 7. Tiling regions earlier than Tiling Region 7 used a different (less efficient) method of handling fiber collisions. The earlier version also had a bug which artificially created gaps in the distribution of the fibers. The locations of the known gaps are given in the EDR paper for Tiling Region 4 as the overlaps between plates 270 and 271, plates 312 and 313, and plates 315 and 363 (also known as tiles 118 and 117, tiles 76 and 75, and tiles 73 and 74).

Tiling Window

In order to interpret the spectroscopic sample, one needs to use the information about how targets were selected, how the tiles were placed, and how fibers were assigned to targets. We refer to the geometry defined by this information as the "tiling window" and describe how to use it in detail elsewhere. As we note below, for the purposes of data release users it is also important to understand what the photometric imaging window which is released (including, if desired, masks for image defects and bright stars) and which plates have been released.

Last modified: Tue Apr 15 13:05:55 CDT 2003