keywords:
Bookmark and Share



Front Back
Generally describe how proc sql can be used to combine data from multiple tables vertically.
When you need to select data from multiple tables and combine the tables vertically, Proc SQL can be an efficient alternative to using other SAS procedures or the data step.  In a Proc SQL set operation, you use on for four set operators (except, intersect, union, and outer union) to combine tables (and views) vertically by combining the results of two queries:
proc sql ;
select * from a
set-operator <all> <corr>
select * from b ;
quit ;

Optionally, one or both of the keywords ALL or CORR (corresponding) can be used.
Describe how PROC SQL evaluates a select statement with one set operator.
Proc SQL evaluates a SELECT statement with one set operation as follows:
  • Each query is evaluated to produce an intermediate (internal) result table
  • Each intermediate result table then becomes an operand linked with a set operator to form an expression, for example, Table1 UNION Table2.
  • Proc SQL evalutes the entire expression to produce a single output result set.
A single SELECT statement can contain only one set operation.  True or false?
False.  A single select statement can contain more than one set operation, for example:

proc sql ;
select * from table1
set-operator
select * from table2
set-operator
select * from table3
;
Describe the processing steps when Proc SQL evaluates a select statement that contains multiple set operations. For example:
proc sql;
select * from table1
set-operator
select * from table2
set-operator
select* from table3 ;
quit ;
When Proc SQL evaluates a select statement that contains multiple set operations, the following processing steps are required (step 3 is added due to multiple set operators):
  • each query is evaluated to produce an internal, intermediate result
  • each intermediate results becomes an operand linked with a set operator to form an expression, for example, table1 union table2
  • if the set operation contains more than two queries, then the result from the first two queries becomes an operand for the next set operator and operand
  • proc sql evaulates the entire expression to produce a single output resultant.
When processing set operators, Proc SQL follows a default order of precendence unless this order is overridden by parenthesis in the expression(s).  By default intersect is evaluated first.  Outer union, union, and except all have the same level of precedence.
Except set operator - selects unique rows from the first table that are not found in the second table.  Displays one occurrence of a given row in output.

Overlays columns based on their position in the select clause without regard to the individual column names.
Intersect set operator - selects unique rows that are common to both tables.  Displays one occurrence of a given row in output.

Overlays columns based on their position in the select clause without regard to the individual column names.
Union set operator - selects unique rows from one or both tables.

Overlays columns based on their position in the select clause without regard to the individual column names.
Outer union set operator - selects all rows from both tables.  The outer union operator concatenates the results of the queries.  Displays one occurrence of a given row in output.

Does not overlay columns.
How many passes through the data does a set operation that displays only unique rows make?  Discribe the pass(es).
Two.  When processing a set operation that displays only unique rows (except, intersect, or union), Proc SQL makes two passes through the data by default:
  • Proc SQL eliminates the duplicate rows in the tables
  • Proc SQL selects the rows that meet the criteria and (where requested) overlays the columns.
Set operations that display both unique and duplicate rows only make one pass through the data.
How many set operator key words are there?  Name it/them and their syntax.
Two set operator key words exist: ALL and CORR (or corresponding).  To modify set operator behavior, use immediately following the set operator:
proc sql ;
select * from table1
set-operator all corr
select * from table2 ;
quit ;
Describe set operator ALL behavior and when it is used.
The ALL keyword makes only one pass through the data and does not remove duplicate rows. 

ALL is used when:
  • You do not care if there are duplicates
  • Duplicates are not possible
  • ALL cannot be used with outer union.
Describe set operator CORR behavior and when it is used.
The CORR keyword compares and overlays columns by name instead of by position:
  • when used with except, intersect, and union, removes any column(s) that do not have the same name in both tables.
  • when used with outer union, overlays same-named columns and displays columns that have nonmatching names without overlaying.
  • If an alias is assigned to a column in the select clause, corr will use the alias instead of the permanent column name.
CORR is used when two tables have some or all columns in column, but the columns are not in the same order.
Describe the results of this:

Proc SQL ;
select * from one
except
select * from two ;
quit ;
Displays the unique rows in table one that are not found in table two.
Describe the results of this:

Proc SQL ;
select * from one
except all
select * from two ;
quit ;
Displays all rows in the first table (both unique and duplicate) that do not have a matching row in the second table.

If duplicates are not wanted and are not possible, the keyword all can be added to the except set operator to prevent Proc SQL from making an extra pass through the data, thereby speeding up query processing.
Describe the results of this:

Proc SQL ;
select * from one
except corr
select * from two ;
quit ;
Select only columns that have the same name AND all unique rows in the first table that do not appear in the second table.
Describe the results of this:

Proc SQL ;
select * from one
except all corr
select * from two ;
quit ;
Select all unique and duplicate rows in the first table that do not appear in the second table AND will overlay and display only columns that have the same name.
Describe the results of this:

Proc SQL ;
select * from one
intersect
select * from two ;
quit ;
Select all unique rows that are common to both tables, overlaying columns by column position.
Describe the results of this:

Proc SQL ;
select * from one
intersect all
select * from two ;
quit ;
select all unique and duplicate rows that are common to both tables, overlaying columns by column position.
Describe the results of this:

Proc SQL ;
select * from one
intersect corr
select * from two ;
quit ;
select all unique rows that are common to the two tables based upon the column name instead of the column position.
Describe the results of this:

Proc SQL ;
select * from one
intersect all corr
select * from two ;
quit ;
select all unqiue and duplicate rows that are in common to the two tables based on columns that have the same names.  Columns that do not exist in both tables are not displayed.
Describe the results of this:

Proc SQL ;
select * from one
union
select * from two ;
quit ;
concatenates and sorts all rows from the two tables and then eliminates any duplicate rows.  Columns are overlaid by position.
Describe the results of this:

Proc SQL ;
select * from one
union all
select * from two ;
quit ;
Displays all rows from both tables both unique and duplicates.  Columns are overlaid by position.
Describe the results of this:

Proc SQL ;
select * from one
union corr
select * from two ;
quit ;
Displays all rows from tables one and two that are unique to the combined set and based upon columns that have the same name rather than the same position.  Only columns that are common to both tables will appear.
Describe the results of this:

Proc SQL ;
select * from one
union all corr
select * from two ;
quit ;
Displays all rows in the two tables both unique and duplicate based on the columns that have the same names. Only columns that are common to both tables will appear.
Describe the results of this:

proc sql ;
select * from one
outer union
select * from two ;
quit ;
Displays all rows from both tables without overlaying columns - even if they have the same name.

ALL keyword is not used with outer union because the operator's default behavior is to include all rows in the output.
Describe the results of this:

proc sql ;
select * from one
outer union corr
select * from two ;
quit ;
Displays all rows from both tables and overlays columns that have the same name.
x of y cards