Oracle 分析函数
Contents
1. 分析函数前传之聚合函数
1.1 聚合函数定义
Aggregate functions return a single result row based on groups of rows
Aggregate functions return a single result row based on groups of rows, rather than on single rows. Aggregate functions can appear in select lists and in ORDER BY and HAVING clauses. They are commonly used with the GROUP BY clause in a SELECT statement, where Oracle Database divides the rows of a queried table or view into groups. In a query containing a GROUP BY clause, the elements of the select list can be aggregate functions, GROUP BY expressions, constants, or expressions involving one of these. Oracle applies the aggregate functions to each group of rows and returns a single result row for each group.
If you omit the GROUP BY clause, then Oracle applies aggregate functions in the select list to all the rows in the queried table or view. You use aggregate functions in the HAVING clause to eliminate groups from the output based on the results of the aggregate functions, rather than on the values of the individual rows of the queried table or view.
1.2 SQL举例
--根据部门分组,返回每个部门的总工资
--SUM-->COUNT MAX MIN AVG
SELECT department_id,SUM(t.salary) salary
FROM employees t GROUP BY department_id;
查询结果
department_id | salary |
---|---|
100 | 70000 |
90 | 249000 |
............................
............................
1.3 高级分组
SELECT T.JOB_ID, T.DEPARTMENT_ID, SUM(T.SALARY) SALARY
FROM EMPLOYEES T
GROUP BY CUBE(T.JOB_ID, T.DEPARTMENT_ID);
会根据JOB_ID DEPARTMENT_ID 自由组合进行分组
JOB_ID 参与分组0,不参与分组1
DEPARTMENT_ID 参与分组0, 不参与分组1
最终的组合 00 01 10 11
1.4 识别参与分组的列信息
GROUPING 函数
SELECT T.JOB_ID, T.DEPARTMENT_ID,
SUM(T.SALARY),
GROUPING(job_id)||GROUPING(t.department_id) groupcode,
CASE GROUPING(job_id)||GROUPING(t.department_id)
WHEN '11' THEN
'总工资'
WHEN '10' THEN
'根据department_id分组'
WHEN '01' THEN
'根据job_id分组'
WHEN '00' THEN
'根据根据department_id分组和job_id分组' END groupresult
FROM EMPLOYEES T
GROUP BY CUBE(T.JOB_ID, T.DEPARTMENT_ID)
ORDER BY GROUPING(job_id)||GROUPING(t.department_id);
根据groupcode可过滤不需要的分组
1.5 自定义分组
根据GROUPING SETS定义要分组的列
SELECT t.job_id, t.department_id,
GROUPING(t.job_id),GROUPING(t.department_id),
SUM(t.salary) FROM employees t
GROUP BY GROUPING SETS((t.job_id),(t.department_id),())
1.6 聚合函数的缺点
无法获取非分组的列
SELECT DEPARTMENT_ID, SUM(T.SALARY),t.first_name
FROM EMPLOYEES T
GROUP BY DEPARTMENT_ID;
ORA-00979: 不是 GROUP BY 表达式
2. 分析函数
2.1 分析函数定义
Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group.
Analytic functions compute an aggregate value based on a group of rows. They differ from aggregate functions in that they return multiple rows for each group. The group of rows is called a window and is defined by the analytic_clause. For each row, a sliding window of rows is defined. The window determines the range of rows used to perform the calculations for the current row. Window sizes can be based on either a physical number of rows or a logical interval such as time.
Analytic functions are the last set of operations performed in a query except for the final ORDER BY clause. All joins and all WHERE, GROUP BY, and HAVING clauses are completed before the analytic functions are processed. Therefore, analytic functions can appear only in the select list or ORDER BY clause.
Analytic functions are commonly used to compute cumulative, moving, centered, and reporting aggregates.
2.2 分析函数的语法图
analytic_function([ arguments ]) OVER (analytic_clause)
[ query_partition_clause ] [ order_by_clause [ windowing_clause ] ]
2.3 一个有名无实的分析函数
SELECT t.employee_id,
SUM(t.salary)
OVER( ) FROM employees t;
因为OVER() 没有写ORDER BY,没办法进行处理,所有的数据就被分到了一组。汇总的也就是所有员工的工资总和
2.4 上个例子的改进版(使用默认窗口)
SELECT t.employee_id,t.salary,
SUM(t.salary)
OVER(ORDER BY t.salary) FROM employees t;
查询结果就是先根据工资排序,然后计算从第一行到当前行的累计工资
每一样都会被处理,每个行都有单独的窗口,且可以自定义。
2.5 窗口之前的预备知识 RANGE VS ROWS
ROWS 代表物理行
RANGE 逻辑行最显著的区别就是当排序列存在重复值时,RANGE会合并成一行处理,ROWS按多行处理
- range
SELECT t.employee_id,t.salary,
SUM(t.salary)
OVER(ORDER BY t.salary RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM employees t;
- rows
SELECT t.employee_id,t.salary,
SUM(t.salary)
OVER(ORDER BY t.salary ROWS BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM employees t;
2.6 PARTITION BY
在分析函数中使用,主要目的是将表进行逻辑分区,根据分区键将表拆分成独立的多个表。
2.7 使用 BETWEEN AND 同时自定义上界和下界
- RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW.
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW)
FROM employees t
ORDER BY t.employee_id;
- RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING)
FROM employees t
ORDER BY t.employee_id;
- RANGE BETWEEN CURRENT ROW AND CURRENT ROW
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE BETWEEN CURRENT ROW AND CURRENT ROW)
FROM employees t
ORDER BY t.employee_id;
- RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING)
FROM employees t
ORDER BY t.employee_id;
- RANGE BETWEEN N PRECEDING | FOLLOWING and PRECEDING | FOLLOWING
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE BETWEEN 1 PRECEDING AND 1 FOLLOWING)
FROM employees t
ORDER BY t.employee_id;
2.8 只定义上界
没有显示的提供下界,下界为当前行
- UNBOUNDED PRECEDING
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE UNBOUNDED PRECEDING)
FROM employees t
ORDER BY t.employee_id;
- CURRENT ROW
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE CURRENT ROW)
FROM employees t
ORDER BY t.employee_id;
- N PRECEDING
SELECT
t.employee_id,
t.salary,
SUM(t.salary) OVER(ORDER BY t.employee_id RANGE 1 PRECEDING )
FROM employees t
ORDER BY t.employee_id;
2.9 常用的分析函数
2.9.1 LAG VS LEAD
Lag 函数可以将某列进行整体偏移,做为一个新列,不影响原来的列
语法图LAG(value_expr [, offset ] [, default ])
OVER ([ query_partition_clause ] order_by_clause)
将每个部门分组,根据员工ID排序,然后向后移动一行
SELECT
t.employee_id,
lag(t.employee_id,1,t.employee_id)
over(PARTITION BY t.department_id ORDER BY t.employee_id) lag_employee_id,
t.department_id
FROM employees t
ORDER BY t.department_id;
Lag函数的其中一个作用就是可以查找不连续的数字
因为偏移量不能为负数,所以Oralce提供和一个LEAD函数和LAG函数组成一对,既可以向下偏移,也可以向上偏移。
2.9.2 FIRST_VALUE VS LAST_VALUE
FIRST_VALUE 获取某个分组,排序后的第一个值
语法图FIRST_VALUE (expr [ IGNORE NULLS ])
OVER (analytic_clause)
SELECT
t.employee_id,
t.salary,
first_value(t.employee_id) over(ORDER BY t.salary) min_salary,
t.*
FROM employees t;
SELECT
t.employee_id,
t.salary,
t.department_id,
first_value(t.employee_id) over(PARTITION BY t.department_id
ORDER BY t.salary) min_salary,
t.*
FROM employees t;
FIRST_VALUE 加入窗口语句未生效
SELECT t.salary,FIRST_VALUE(salary)
OVER (ORDER BY salary
--rows BETWEEN UNBOUNDED PRECEDING AND UNBOUNDED FOLLOWING
-- ROWS BETWEEN 1 PRECEDING AND 1 FOLLOWING
--rows BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
--rows BETWEEN UNBOUNDED PRECEDING AND CURRENT ROW
--第三个窗口语句无效
)
AS min_salary
FROM employees t ORDER BY t.salary;
LAST_VALUE
SELECT
t.employee_id,
t.salary,
last_value(t.employee_id)
over(ORDER BY t.salary
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING
) max_salary,
t.*
FROM employees t;
SELECT
t.employee_id,
t.salary,
t.department_id,
last_value(t.employee_id)
over(PARTITION BY t.department_id
ORDER BY t.salary
ROWS BETWEEN CURRENT ROW AND UNBOUNDED FOLLOWING) min_salary,
t.*
FROM employees t;
2.9.3 LISTAGG
语法图
LISTAGG(measure_expr [, 'delimiter'])
WITHIN GROUP (order_by_clause) [OVER query_partition_clause]
将多行根据指定的分隔符合并成一行
SELECT LISTAGG(last_name, ';') WITHIN GROUP
(ORDER BY hire_date)
FROM employees t;
支持分区,所以可以根据部门分组
SELECT LISTAGG(last_name, '; ') WITHIN GROUP
(ORDER BY hire_date)
OVER (PARTITION BY t.department_id)
FROM employees t;
3.结束
本例中的employees及其数据来源于Oracle自带的hr用户下的表