IBM 1130 User Manual page 600

Computing system
Hide thumbs Also See for 1130:
Table of Contents

Advertisement

Section
Subsections
Page
75
30
I
20
02
Key (Tag) Sorting
In general, key sorting consists of extracting the
control key from each record and adding the record
number to form a key-tag pair. These pairs,
rather than the original records, are then sorted.
(Sorting is done with the key; the record number is
merely moved about so as to remain with its asso-
ciated key.) After sorting is completed, the pairs
provide an index for later retrieval of the data rec-
ords
in
proper sequence. The obvious advantage of
key sorting is the more rapid processing of the key-
tag pairs, rather than the much longer original re-
cords. During internal sorting, more pairs can be
sorted into strings; thus, fewer strings and, prob-
ably, fewer merge passes will result. The even-
tual retrieval of the data records (if needed) from
external storage is done using the final sorted key-
tag file.
A typical key sort with disk storage proceeds in
either two or three phases, depending upon whether
final retrieval of the data records is necessary.
Phase 1 is an internal key sort; phase 2 merges the
internally formed strings of key records; and phase
3, if required, retrieves the input records in
proper sequence. The approximate procedure dur-
ing each phase is described below.
Phase 1 (Internal Sort) consists of the following
steps:
1.
Place input records on the disk file in order
of occurrence.
2.
Form key-tag pairs by lifting the control
field(s) from each input record and adding to it
(them) the disk record number.
3.
Read G key-tag pairs at a time into core
storage and sort them internally (by any standard
method) into strings of length G. (G refers to the
number of items that can be contained at one time in
internal core storage.)
4.
Write the stings of G pairs successively
onto the disk file, using as many sectors or files as
necessary (usually no more than five files of strings).
Phase 2 (Merge). The merge phase of the sort
consists of merging the strings of pairs from the
separate files on disk. The merge is completed
when a single sequence of key-tag pairs has been
written onto disk. During the final merge pass, the
control keys are stripped from the key-tag pairs,
leaving only the disk record numbers or tags.
These record numbers then serve as an index for
placing the input records in sequence. At your
option, sorting can end at this point.
Phase 3 (Record Retrieval). This phase is
necessary if the data records are to be retrieved
from the disk file in their sorted order. (Remember,
only the tags have been sorted, not the records
themselves.) The manner in which this is done has
a greater effect on overall timing than phases 1 and
2 combined. The simplest way (also the slowest)
consists of retrieving the records one by one in the
order indicated by the successive disk record num-
bers. If the original input records constitute a
large file extending over several cylinders of the
disk, the probability is high that a seek must be
executed for the retrieval of each record. This will
add considerably to the time required, since the
seek time necessary to retrieve the records will
probably dominate the overall sort time.
A number of ways have been devised to minimize
this seek time during the retrieval of records in
phase 3. One method consists of bringing the disk
record numbers (from phase 2 of the sort) into
internal storage in some multiple of the output
blocking factor. The disk record numbers are then
sorted internally in ascending sequence, thereby
reducing the seek time between records. The
input records are read initially from the disk in
ascending record number sequence; blocks of re-
cords are then placed in proper sequence (in ac-
cordance with the original sequences of disk record
numbers); and the sorted records are finally written
back onto the disk file. The method reduces seek
time substantially, at the expense of more complex
programming.
Another method of modifying the key sort con-
sists of blocking the sorted keys so that the number
of items in each block plus an equal number of
original records just fills the core working area.
The items in each block are then sorted again to
place the disk record numbers in ascending se-
quence. As before, the records indicated in each
block can then be retrieved sequentially from the
file and sorted internally into the proper sequence.
It will be found, however, that in most cases, and
for large files in particular, these methods of re-
ducing the seek time still result in a greater overall
sort time than might have been requi red to perform
a complete record sort.

Advertisement

Table of Contents
loading

Table of Contents