Sample Clause - HP Neoview SQL Reference Manual

Hide thumbs Also See for Neoview SQL:
Table of Contents

Advertisement

SAMPLE Clause

"Considerations for SAMPLE"
"Examples of SAMPLE"
The SAMPLE clause of the SELECT statement specifies the sampling method used to select a
subset of the intermediate result table of a SELECT statement. The intermediate result table
consists of the rows returned by a WHERE clause or, if there is no WHERE clause, the FROM
clause. See
"SELECT Statement" (page
SAMPLE is a Neoview SQL extension.
SAMPLE sampling-method
sampling-method is:
RANDOM percent-size
| FIRST rows-size
| PERIODIC rows-size EVERY number-rows ROWS
percent-size is:
percent-result PERCENT [ROWS
| {CLUSTERS OF number-blocks BLOCKS}]
| BALANCE WHEN condition
THEN percent-result PERCENT [ROWS]
[WHEN condition THEN percent-result PERCENT [ROWS]]...
[ELSE percent-result PERCENT [ROWS]] END
rows-size is:
number-rows ROWS
| BALANCE WHEN condition THEN number-rows ROWS
RANDOM percent-size
directs Neoview SQL to choose rows randomly (each row having an unbiased probability
of being chosen) without replacement from the result table. The sampling size is determined
by the percent-size, defined as:
percent-result PERCENT [ROWS | {CLUSTERS OF number-blocks BLOCKS}] |
BALANCE WHEN condition THEN percent-result PERCENT [ROWS] [WHEN
condition THEN percent-result PERCENT [ROWS]]... [ELSE percent-result
PERCENT [ROWS]] END
specifies the value of the size for RANDOM sampling by using a percent of the result
table. The value percent-result must be a numeric literal.
You can determine the actual size of the sample. Suppose that there are N rows in the
intermediate result table. Each row is picked with a probability of r%, where r is the
sample size in PERCENT. Therefore, the actual size of the resulting sample is
approximately r% of N. The number of rows picked follows a binomial distribution with
mean equal to r * N/100.
If you specify a sample size greater than 100 PERCENT, Neoview SQL returns all the
rows in the result table plus duplicate rows. The duplicate rows are picked from the result
table according to the specified sampling method. This technique is referred to as
oversampling and is not allowed with cluster sampling.
ROWS
specifies row sampling. Row sampling is the default if you specify neither ROWS nor
CLUSTERS.
318
SQL Clauses
[SORT BY colname [ASC[ENDING] | DESC[ENDING]]
[,colname [ASC[ENDING] | DESC[ENDING]]]...]
[SORT BY colname [ASC[ENDING] | DESC[ENDING]]
[,colname [ASC[ENDING] | DESC[ENDING]]]...]
[WHEN condition THEN number-rows ROWS]...
[ELSE number-rows ROWS] END
164).

Advertisement

Table of Contents
loading

Table of Contents