I'm evaluating various options to run a bunch of high-performing queries against a single temporary data set in Oracle. In T-SQL, I'd probably use in-memory temporary tables, but Oracle doesn't have an exact equivalent of this feature.
I'm currently seeing these options:
1. Global temporary tables
CREATE GLOBAL TEMPORARY TABLE test_temp_t (
n NUMBER(10),
s VARCHAR2(10)
) ON COMMIT DELETE ROWS; -- Other configurations are possible, too
DECLARE
t test_t;
n NUMBER(10);
BEGIN
-- Replace this with the actual temporary data set generation
INSERT INTO test_temp_t
SELECT MOD(level, 10), '' || MOD(level, 12)
FROM dual
CONNECT BY level < 1000000;
-- Replace this example query with more interesting statistics
SELECT COUNT(DISTINCT t.n)
INTO n
FROM test_temp_t t;
DBMS_OUTPUT.PUT_LINE(n);
END;
Plan:
----------------------------------------------------
| Id | Operation | A-Rows | A-Time |
----------------------------------------------------
| 0 | SELECT STATEMENT | 1 |00:00:00.27 |
| 1 | SORT AGGREGATE | 1 |00:00:00.27 |
| 2 | VIEW | 10 |00:00:00.27 |
| 3 | HASH GROUP BY | 10 |00:00:00.27 |
| 4 | TABLE ACCESS FULL| 999K|00:00:00.11 |
----------------------------------------------------
2. Unnesting of PL/SQL table type variables
CREATE TYPE test_o AS OBJECT (n NUMBER(10), s VARCHAR2(10));
CREATE TYPE test_t AS TABLE OF test_o;
DECLARE
t test_t;
n NUMBER(10);
BEGIN
-- Replace this with the actual temporary data set generation
SELECT test_o(MOD(level, 10), '' || MOD(level, 12))
BULK COLLECT INTO t
FROM dual
CONNECT BY level < 1000000;
-- Replace this example query with more interesting statistics
SELECT COUNT(DISTINCT n)
INTO n
FROM TABLE(t) t;
DBMS_OUTPUT.PUT_LINE(n);
END;
Plan:
------------------------------------------------------------------
| Id | Operation | A-Rows | A-Time |
------------------------------------------------------------------
| 0 | SELECT STATEMENT | 1 |00:00:00.68 |
| 1 | SORT GROUP BY | 1 |00:00:00.68 |
| 2 | COLLECTION ITERATOR PICKLER FETCH| 999K|00:00:00.22 |
------------------------------------------------------------------
3. Materialised views
I'm ruling them out for this use-case, because the temporary data set in question is rather complex and the implications on updating the materialised view would be too significant.
Real data considerations
The above are examples of what I'm trying to do. The real data sets involve:
- The temp data is denormalised from around 15 joined tables.
- It is produced around 2-20x / seconds.
- The actual amount of rows per temp data set is around 10-200 (not as big as in the above example).
- Each user of the system has their own temp data set (1M users overall, 10k concurrent users).
- Once the data set is established, around 10-50 analytical queries should be run against it.
- These analyses must be run online, i.e. they cannot be deferred to a batch job.
Questions
From my intuition, the temp table query "should" be slower because it (probably) involves I/O and disk access, whereas the PL/SQL collection query is a mere in-memory solution. But in my trivial benchmark, this is not the case as the temp table query beats the PL/SQL collection query by factor 3x. Why is this the case? Is there some PL/SQL <-> SQL context switch happening?
Do I have other options for fast (yet extensive) "in-memory" data analysis on a well-defined temporary data set? Are there any significant publicly available benchmarks comparing the various options?
See Question&Answers more detail:os