psql 

=> \c db19
You are now connected to database "db19" as user "postgres".

=> set work_mem = '128MB';
SET


Первая попытка
~~~~~~~~~~~~~~


=> explain analyze
=> select gen.d::date date_ordered, sum(i.amount) amount
=> from generate_series(now(), now() - interval '9 day', interval '-1 day') gen(d)
=> left join orders o on (o.date_ordered::date = gen.d::date)
=> left join items i on (i.order_id = o.id)
=> group by gen.d::date
=> order by gen.d::date;
                                                                        QUERY PLAN                                                                         
-----------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=288604.12..288604.62 rows=200 width=12) (actual time=1461.214..1461.216 rows=10 loops=1)
   Sort Key: ((gen.d)::date)
   Sort Method: quicksort  Memory: 17kB
   ->  HashAggregate  (cost=288593.48..288596.48 rows=200 width=12) (actual time=1461.191..1461.197 rows=10 loops=1)
         Group Key: (gen.d)::date
         ->  Hash Right Join  (cost=54353.48..238593.48 rows=10000000 width=12) (actual time=274.614..1457.014 rows=5381 loops=1)
               Hash Cond: (i.order_id = o.id)
               ->  Seq Scan on items i  (cost=0.00..41740.00 rows=2000000 width=8) (actual time=0.028..459.177 rows=2000000 loops=1)
               ->  Hash  (cost=41853.48..41853.48 rows=1000000 width=12) (actual time=273.343..273.343 rows=537 loops=1)
                     Buckets: 1048576  Batches: 1  Memory Usage: 4113kB
                     ->  Merge Left Join  (cost=21843.48..41853.48 rows=1000000 width=12) (actual time=272.605..273.084 rows=537 loops=1)
                           Merge Cond: (((gen.d)::date) = ((o.date_ordered)::date))
                           ->  Sort  (cost=59.84..62.34 rows=1000 width=8) (actual time=1.008..1.009 rows=10 loops=1)
                                 Sort Key: ((gen.d)::date)
                                 Sort Method: quicksort  Memory: 17kB
                                 ->  Function Scan on generate_series gen  (cost=0.01..10.01 rows=1000 width=8) (actual time=0.982..0.988 rows=10 loops=1)
                           ->  Sort  (cost=21783.64..22283.64 rows=200000 width=12) (actual time=167.507..224.304 rows=200000 loops=1)
                                 Sort Key: ((o.date_ordered)::date)
                                 Sort Method: quicksort  Memory: 11909kB
                                 ->  Seq Scan on orders o  (cost=0.00..4174.00 rows=200000 width=12) (actual time=0.011..82.036 rows=200000 loops=1)
 Planning time: 1.303 ms
 Execution time: 1462.151 ms
(22 rows)


У запроса явная проблема с оценкой кардинальности, и источником ее является
функция generate_series. Как и для любой другой функции, возвращающей
набор строк, планировщик считает, что кардинальность равна 1000.


Вторая попытка
~~~~~~~~~~~~~~

Один из вариантов исправления состоит в том, чтобы создать свою функцию
с указанием нужной кардинальности:

=> create or replace function gen10()
=> returns setof date
=> rows 10
=> as $$
=>   select d::date from generate_series(now(), now() - interval '9 day', interval '-1 day') gen(d);
=> $$ language sql;
CREATE FUNCTION

Здесь конструкция "rows 10" определяет оценку кардинальности.
Разумеется, можно определить функцию с параметром, но оценка, увы,
должна быть константой.

=> explain analyze
=> select gen.d date_ordered, sum(i.amount) amount
=> from gen10() gen(d)
=> left join orders o on (o.date_ordered::date = gen.d::date)
=> left join items i on (i.order_id = o.id)
=> group by gen.d::date
=> order by gen.d::date;
                                                                    QUERY PLAN                                                                    
--------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=18301.17..18301.19 rows=10 width=8) (actual time=192.591..192.593 rows=10 loops=1)
   Sort Key: gen.d
   Sort Method: quicksort  Memory: 17kB
   ->  HashAggregate  (cost=18300.88..18301.00 rows=10 width=8) (actual time=192.575..192.580 rows=10 loops=1)
         Group Key: gen.d
         ->  Nested Loop Left Join  (cost=0.90..17800.88 rows=100000 width=8) (actual time=0.764..185.261 rows=5381 loops=1)
               ->  Hash Right Join  (cost=0.47..5274.48 rows=10000 width=8) (actual time=0.739..130.060 rows=537 loops=1)
                     Hash Cond: ((o.date_ordered)::date = gen.d)
                     ->  Seq Scan on orders o  (cost=0.00..4174.00 rows=200000 width=12) (actual time=0.003..43.036 rows=200000 loops=1)
                     ->  Hash  (cost=0.35..0.35 rows=10 width=4) (actual time=0.132..0.132 rows=10 loops=1)
                           Buckets: 1024  Batches: 1  Memory Usage: 5kB
                           ->  Function Scan on gen10 gen  (cost=0.25..0.35 rows=10 width=4) (actual time=0.127..0.129 rows=10 loops=1)
               ->  Index Scan using items_order_id_idx on items i  (cost=0.43..1.14 rows=11 width=8) (actual time=0.024..0.095 rows=10 loops=537)
                     Index Cond: (order_id = o.id)
 Planning time: 0.249 ms
 Execution time: 192.639 ms
(16 rows)


Оценка кардинальности исправлена, однако теперь видна вторая проблема:
из-за преобразования типа невозможно использовать индекс по дате
на таблице заказов.


Третья попытка
~~~~~~~~~~~~~~

Создадим функциональный индекс:

=> create index on orders ((date_ordered::date));
CREATE INDEX

И проанализируем таблицу, чтобы собралась статистика:

=> analyze orders;
ANALYZE

=> explain analyze
=> select gen.d date_ordered, sum(i.amount) amount
=> from gen10() gen(d)
=> left join orders o on (o.date_ordered::date = gen.d::date)
=> left join items i on (i.order_id = o.id)
=> group by gen.d::date
=> order by gen.d::date;
                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2516.01..2516.03 rows=10 width=8) (actual time=17.165..17.167 rows=10 loops=1)
   Sort Key: gen.d
   Sort Method: quicksort  Memory: 17kB
   ->  HashAggregate  (cost=2515.72..2515.84 rows=10 width=8) (actual time=17.149..17.154 rows=10 loops=1)
         Group Key: gen.d
         ->  Nested Loop Left Join  (cost=5.52..2488.32 rows=5480 width=8) (actual time=0.141..14.128 rows=5381 loops=1)
               ->  Nested Loop Left Join  (cost=5.10..1801.87 rows=548 width=8) (actual time=0.134..1.239 rows=537 loops=1)
                     ->  Function Scan on gen10 gen  (cost=0.25..0.35 rows=10 width=4) (actual time=0.095..0.102 rows=10 loops=1)
                     ->  Bitmap Heap Scan on orders o  (cost=4.85..179.60 rows=55 width=12) (actual time=0.021..0.092 rows=54 loops=10)
                           Recheck Cond: ((date_ordered)::date = gen.d)
                           Heap Blocks: exact=528
                           ->  Bitmap Index Scan on orders_date_ordered_idx1  (cost=0.00..4.83 rows=55 width=0) (actual time=0.013..0.013 rows=54 loops=10)
                                 Index Cond: ((date_ordered)::date = gen.d)
               ->  Index Scan using items_order_id_idx on items i  (cost=0.43..1.14 rows=11 width=8) (actual time=0.008..0.019 rows=10 loops=537)
                     Index Cond: (order_id = o.id)
 Planning time: 0.330 ms
 Execution time: 17.214 ms
(17 rows)


Теперь запрос выполняется достаточно эффективно.


Еще вариант
~~~~~~~~~~~

Если все-таки запрос будет выполняться для разного числа дней, как можно
обеспечить правильную оценку кардинальности?
Есть вариант с использованием временной таблицы, хотя он и подразумевает
известные накладные расходы.

=> create temporary table gen(d date) on commit preserve rows;
CREATE TABLE

=> insert into gen(d) select gen.d::date from generate_series(now(), now() - interval '9 day', interval '-1 day') gen(d);
INSERT 0 10

=> analyze gen;
ANALYZE

Теперь планировщик будет знать, сколько строк в таблице:

=> explain analyze
=> select gen.d::date date_ordered, sum(i.amount) amount
=> from gen
=> left join orders o on (o.date_ordered::date = gen.d::date)
=> left join items i on (i.order_id = o.id)
=> group by gen.d::date
=> order by gen.d::date;
                                                                         QUERY PLAN                                                                         
------------------------------------------------------------------------------------------------------------------------------------------------------------
 Sort  (cost=2516.76..2516.78 rows=10 width=8) (actual time=16.243..16.246 rows=10 loops=1)
   Sort Key: gen.d
   Sort Method: quicksort  Memory: 17kB
   ->  HashAggregate  (cost=2516.47..2516.59 rows=10 width=8) (actual time=16.228..16.230 rows=10 loops=1)
         Group Key: gen.d
         ->  Nested Loop Left Join  (cost=5.27..2489.07 rows=5480 width=8) (actual time=0.033..13.874 rows=5381 loops=1)
               ->  Nested Loop Left Join  (cost=4.85..1802.62 rows=548 width=8) (actual time=0.025..1.138 rows=537 loops=1)
                     ->  Seq Scan on gen  (cost=0.00..1.10 rows=10 width=4) (actual time=0.003..0.008 rows=10 loops=1)
                     ->  Bitmap Heap Scan on orders o  (cost=4.85..179.60 rows=55 width=12) (actual time=0.017..0.087 rows=54 loops=10)
                           Recheck Cond: ((date_ordered)::date = gen.d)
                           Heap Blocks: exact=528
                           ->  Bitmap Index Scan on orders_date_ordered_idx1  (cost=0.00..4.83 rows=55 width=0) (actual time=0.010..0.010 rows=54 loops=10)
                                 Index Cond: ((date_ordered)::date = gen.d)
               ->  Index Scan using items_order_id_idx on items i  (cost=0.43..1.14 rows=11 width=8) (actual time=0.005..0.019 rows=10 loops=537)
                     Index Cond: (order_id = o.id)
 Planning time: 0.345 ms
 Execution time: 16.287 ms
(17 rows)


=> \q
