Considerations For Sample; Sample Rows; Cluster Sampling - HP Neoview SQL Reference Manual

Hide thumbs Also See for Neoview SQL:
Table of Contents

Advertisement

CLUSTERS OF number-blocks BLOCKS
specifies cluster sampling. You can use the CLUSTERS clause for a base table only if
there is no WHERE clause in the SELECT statement. First, a cluster is chosen randomly,
and then all rows in the selected cluster are added to the sample. The size of the cluster
is determined by number-blocks. This process is repeated until the sample size is
generated. See
BALANCE
If you specify a BALANCE expression, Neoview SQL performs stratified sampling.
The intermediate result table is divided into disjoint strata based on the WHEN
conditions. Each stratum is sampled independently by using the sampling size. For
a given row, the stratum to which it belongs is determined by the first WHEN
condition that is true for that row—if there is a true condition. If there is no true
condition, the row belongs to the ELSE stratum.
FIRST rows-size [SORT BY colname [ASC[ENDING] | DESC[ENDING]] [,colname
[ASC[ENDING] | DESC[ENDING]]]...]
directs Neoview SQL to choose the first rows from the result table. You can specify the order
of the rows to sample. Otherwise, Neoview SQL chooses an arbitrary order. The sampling
size is determined by the rows-size, defined as:
number-rows ROWS | BALANCE WHEN condition THEN number-rows ROWS [WHEN
condition THEN number-rows ROWS]... [ELSE number-rows ROWS] END
specifies the value of the size for FIRST sampling by using the number of rows intended
in the sample. The value number-rows must be an integer literal.
You can determine the actual size of the sample. Suppose that there are N rows in the
intermediate result table. If the size s of the sample is specified as a number of rows, the
actual size of the resulting sample is the minimum of s and N.
PERIODIC rows-size EVERY number-rows ROWS [SORT BY colname [ASC[ENDING] |
DESC[ENDING]] [,colname [ASC[ENDING] | DESC[ENDING]]]...]
directs Neoview SQL to choose the first rows from each block (or period) of contiguous rows.
This sampling method is equivalent to a separate FIRST sampling for each period, and the
rows-size is defined as in FIRST sampling.
The size of the period is specified as a number of rows. You can specify the order of the rows
to sample. Otherwise, Neoview SQL chooses an arbitrary order.
You can determine the actual size of the sample. Suppose that there are N rows in the
intermediate result table. If the size s of the sample is specified as a number of rows and the
size p of the period is specified as a number of rows, the actual size of the resulting sample
is calculated as:
FLOOR (N/p) * s + minimum (MOD (N, p), s)
Note that minimum in this expression is used simply as the mathematical minimum of two
values.

Considerations for SAMPLE

Sample Rows

In general, when you use the SAMPLE clause, the same query returns different sets of rows for
each execution. The same set of rows is returned only when you use the FIRST and PERIODIC
sampling methods with the SORT BY option, where there are no duplicates in the specified
column combination for the sort.

Cluster Sampling

Cluster sampling is an option supported by the SAMPLE RANDOM clause in a SELECT statement.
A cluster, in this sense, is a logically contiguous set of disk blocks in the file in which a table is
"Cluster Sampling" (page
261).
SAMPLE Clause
261

Advertisement

Table of Contents
loading

Table of Contents