Как выбрать диапазон дат * и * записи непосредственно перед и после этого диапазона?
Я работаю с таблицей, в которой каждая строка имеет метку времени, и эта метка времени уникальна для данного набора других значений столбца:
CREATE TEMPORARY TABLE time_series (
id SERIAL PRIMARY KEY,
created TIMESTAMP WITH TIME ZONE NOT NULL,
category TEXT,
value INT
);
CREATE UNIQUE INDEX ON time_series (created, category);
INSERT INTO time_series (created, category, value)
VALUES ('2000-01-01 00:00:00Z', 'foo', 1),
('2000-01-01 06:00:00Z', 'bar', 5),
('2000-01-01 12:00:00Z', 'bar', 5),
('2000-01-02 00:00:00Z', 'bar', 5),
('2000-01-02 12:34:45Z', 'bar', 2),
('2000-01-03 00:00:00Z', 'bar', 3),
('2000-01-04 00:00:00Z', 'bar', 3),
('2000-01-04 11:11:11Z', 'foo', 4),
('2000-01-04 22:22:22Z', 'bar', 5),
('2000-01-04 23:23:23Z', 'bar', 4),
('2000-01-05 00:00:00Z', 'foo', 1),
('2000-01-05 23:23:23Z', 'bar', 4);
Метки времени не расположены равномерно. Моя задача, учитывая произвольную дату начала и время окончания, состоит в том, чтобы получить записи между этими датами и записями непосредственно перед и после этого диапазона. В принципе, как мне упростить этот запрос:
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created DESC
LIMIT 1)
UNION
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-02 06:00:00Z'
AND created < '2000-01-04 12:00:00Z')
UNION
(SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created
LIMIT 1)
ORDER BY created;
created value
2000-01-02 00:00:00+00 5
2000-01-02 12:34:45+00 2
2000-01-03 00:00:00+00 3
2000-01-04 00:00:00+00 3
2000-01-04 22:22:22+00 5
Вариант использования - получение точек данных для отображения графика: я знаю время левого и правого краев графика, но в целом они не будут точно соответствовать created
datetimes, поэтому для отображения графика до самого края мне нужны точки данных по обе стороны диапазона.
Не решения:
- Я не могу просто выделить весь диапазон, потому что он может быть огромным.
- Я не могу выбрать произвольно длительный период за пределами данного диапазона, потому что этот набор данных может снова быть огромным или какой бы период я ни выбрал, может оказаться недостаточно для получения следующих показаний.
1 ответ
ИЗМЕНИТЬ:
Вы можете комбинировать UNION ALL
с ORDER BY
а также LIMIT
и некоторые границы статьи.
Что-то вроде этого:
ПОДХОД 1:
SELECT created,
value
FROM (SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created ASC LIMIT 1
) AS ub
UNION ALL
SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-02 06:00:00Z'
AND created < '2000-01-04 12:00:00Z'
UNION ALL
SELECT created,
value
FROM (SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created DESC LIMIT 1
) AS lb
ORDER BY 1;
EXPLAIN ANALYZE
из подхода 1:
QUERY PLAN
-----------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=3.60..3.61 rows=3 width=12) (actual time=0.228..0.237 rows=5 loops=1)
Sort Key: time_series.created
Sort Method: quicksort Memory: 25kB
-> HashAggregate (cost=3.55..3.58 rows=3 width=12) (actual time=0.182..0.195 rows=5 loops=1)
Group Key: time_series.created, time_series.value
-> Append (cost=1.16..3.53 rows=3 width=12) (actual time=0.073..0.163 rows=5 loops=1)
-> Limit (cost=1.16..1.16 rows=1 width=12) (actual time=0.070..0.073 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=12) (actual time=0.065..0.067 rows=1 loops=1)
Sort Key: time_series.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series (cost=0.00..1.15 rows=1 width=12) (actual time=0.026..0.035 rows=2 loops=1)
Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
-> Seq Scan on time_series time_series_1 (cost=0.00..1.18 rows=1 width=12) (actual time=0.007..0.016 rows=3 loops=1)
Filter: ((created >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (created < '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 7
-> Limit (cost=1.16..1.16 rows=1 width=12) (actual time=0.051..0.054 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=12) (actual time=0.047..0.049 rows=1 loops=1)
Sort Key: time_series_2.created
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_2 (cost=0.00..1.15 rows=1 width=12) (actual time=0.009..0.016 rows=2 loops=1)
Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
Planning time: 0.388 ms
Execution time: 0.438 ms
(25 rows)
Можно использовать другой аналогичный подход.
ПОДХОД 2:
SELECT created, value
FROM time_series
WHERE category = 'bar'
AND created >= (SELECT created
FROM time_series
WHERE category = 'bar'
AND created < '2000-01-02 06:00:00Z'
ORDER BY created ASC LIMIT 1)
AND created < (SELECT created
FROM time_series
WHERE category = 'bar'
AND created >= '2000-01-04 12:00:00Z'
ORDER BY created DESC LIMIT 1
)
EXPLAIN ANALYZE
из подхода 2:
--------------------------------------------------------------------------------------------------------------------------------------
Seq Scan on time_series (cost=2.33..3.50 rows=1 width=12) (actual time=0.143..0.157 rows=6 loops=1)
Filter: ((created >= $0) AND (created < $1) AND (category = 'bar'::text))
Rows Removed by Filter: 4
InitPlan 1 (returns $0)
-> Limit (cost=1.16..1.16 rows=1 width=8) (actual time=0.066..0.069 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=8) (actual time=0.061..0.062 rows=1 loops=1)
Sort Key: time_series_1.created
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_1 (cost=0.00..1.15 rows=1 width=8) (actual time=0.008..0.015 rows=2 loops=1)
Filter: ((created < '2000-01-02 06:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
InitPlan 2 (returns $1)
-> Limit (cost=1.16..1.16 rows=1 width=8) (actual time=0.041..0.044 rows=1 loops=1)
-> Sort (cost=1.16..1.16 rows=1 width=8) (actual time=0.038..0.039 rows=1 loops=1)
Sort Key: time_series_2.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series time_series_2 (cost=0.00..1.15 rows=1 width=8) (actual time=0.007..0.013 rows=2 loops=1)
Filter: ((created >= '2000-01-04 12:00:00+00'::timestamp with time zone) AND (category = 'bar'::text))
Rows Removed by Filter: 8
Planning time: 0.392 ms
Execution time: 0.288 ms
Поскольку вы используете ограничение, запрос будет выполняться быстро.
ПОДХОД 3:
WITH a as (
SELECT created,
value,
lag(created, 1) OVER (ORDER BY created desc) AS ub,
lag(created, -1) OVER (ORDER BY created desc) AS lb
FROM time_series
WHERE category = 'bar'
) SELECT created,
value
FROM a
WHERE ub>='2000-01-02 06:00:00Z'
AND lb<'2000-01-04 12:00:00Z'
ORDER BY created
EXPLAIN ANALYZE
из подхода 3:
QUERY PLAN
--------------------------------------------------------------------------------------------------------------------------------------------
Sort (cost=1.19..1.20 rows=1 width=12) (actual time=0.174..0.181 rows=5 loops=
1)
Sort Key: a.created
Sort Method: quicksort Memory: 25kB
CTE a
-> WindowAgg (cost=1.14..1.16 rows=1 width=28) (actual time=0.075..0.107 rows=7 loops=1)
-> Sort (cost=1.14..1.14 rows=1 width=12) (actual time=0.056..0.067 rows=7 loops=1)
Sort Key: time_series.created DESC
Sort Method: quicksort Memory: 25kB
-> Seq Scan on time_series (cost=0.00..1.12 rows=1 width=12) (actual time=0.018..0.030 rows=7 loops=1)
Filter: (category = 'bar'::text)
Rows Removed by Filter: 3
-> CTE Scan on a (cost=0.00..0.03 rows=1 width=12) (actual time=0.088..0.131 rows=5 loops=1)
Filter: ((ub >= '2000-01-02 06:00:00+00'::timestamp with time zone) AND (lb < '2000-01-04 12:00:00+00'::timestamp with time zone))
Rows Removed by Filter: 2
Planning time: 0.175 ms
Execution time: 0.247 ms
(16 rows)