Back

Lower search and indexes

I found a really slow Postgres database query in a project I was working on. Here is the steps I took to fixing it.

TLDR

Scope

Our application has a Users table which has unique email column

Problem

Speed. A single query taking 1500ms.

We often need to query this table by email for core system functionality such as logging in, and registeration.

EXPLAIN ANALYZE SELECT  "users".* FROM "users" WHERE (LOWER(email) = 'smith') ORDER BY "users"."email" ASC LIMIT 20 OFFSET 10;
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=247.16..740.63 rows=20 width=355) (actual time=1567.649..1567.649 rows=0 loops=1)
   ->  Index Scan using index_users_on_email on users  (cost=0.42..116903.19 rows=4738 width=355) (actual time=1567.645..1567.645 rows=0 loops=1)
         Filter: (lower((email)::text) = 'smith'::text)
         Rows Removed by Filter: 829683
 Planning time: 0.275 ms
 Execution time: 1567.695 ms
(6 rows)

Path to Solution

Firstly, I wanted to experiment to see where the speed decrease is coming from. I decided to remove the lower to see if that made any difference.

Without Lower - 0.091ms

EXPLAIN ANALYZE SELECT  "users".* FROM "users" WHERE ((email) = 'smith') ORDER BY "users"."email" ASC LIMIT 20 OFFSET 10;
-----------------------------------------------------------------------------------------------------------------------------------
Limit  (cost=8.44..16.46 rows=1 width=355) (actual time=0.047..0.047 rows=0 loops=1)
  ->  Index Scan using index_users_on_email on users  (cost=0.42..8.44 rows=1 width=355) (actual time=0.045..0.045 rows=0 loops=1)
        Index Cond: ((email)::text = 'smith'::text)
Planning time: 0.309 ms
Execution time: 0.091 ms
(5 rows)

Woah! 0.091ms - The lower is definately what is causing it.

Now I'm going to try and create a lower index on the column, as I read that you could do this with Postgres. This will hopefully speed it up.

CREATE INDEX "index_users_on_email_lower " ON public.users USING btree (lower((email)::text) varchar_pattern_ops); 


EXPLAIN ANALYZE SELECT  "users".* FROM "users" WHERE (LOWER(email) = 'smith') ORDER BY "users"."email" ASC LIMIT 20 OFFSET 10;
--------------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=247.16..740.63 rows=20 width=355) (actual time=1567.649..1567.649 rows=0 loops=1)
   ->  Index Scan using index_users_on_email on users  (cost=0.42..116903.19 rows=4738 width=355) (actual time=1567.645..1567.645 rows=0 loops=1)
         Filter: (lower((email)::text) = 'smith'::text)
         Rows Removed by Filter: 829683
 Planning time: 0.275 ms
 Execution time: 1567.695 ms
(6 rows)

Speed seems the same, But I've noticed in the query plan that the lower index is not being used. A different index named index_users_on_email is being used instead. This is a unique index to ensure that emails are unique in our system.

I dropped both indexes that were being used and tried again:

DROP INDEX "index_users_on_email";
DROP INDEX "index_users_on_email_lower";


EXPLAIN ANALYZE SELECT  "users".* FROM "users" WHERE (LOWER(email) = 'smith') ORDER BY "users"."email" ASC LIMIT 20 OFFSET 20;
---------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=27348.42..27350.76 rows=20 width=355) (actual time=361.960..361.960 rows=0 loops=1)
   ->  Gather Merge  (cost=27346.09..27806.72 rows=3948 width=355) (actual time=361.957..361.957 rows=0 loops=1)
         Workers Planned: 2
         Workers Launched: 2
         ->  Sort  (cost=26346.06..26351.00 rows=1974 width=355) (actual time=339.556..339.556 rows=0 loops=3)
               Sort Key: email
               Sort Method: quicksort  Memory: 25kB
               ->  Parallel Seq Scan on users  (cost=0.00..26238.02 rows=1974 width=355) (actual time=339.378..339.378 rows=0 loops=3)
                     Filter: (lower((email)::text) = 'smith'::text)
                     Rows Removed by Filter: 315894
 Planning time: 0.624 ms
 Execution time: 369.545 ms
(12 rows)

300ms without any indexes and a sequence scan! We need that unique index though otherwise we could get duplicate emails being stored in the system.

Lets add the unique index back in, but this time with a lower check

CREATE UNIQUE INDEX index_users_on_email ON users (lower(email));

EXPLAIN ANALYZE SELECT  "users".* FROM "users" WHERE (lower(email) = 'smith') ORDER BY "users"."email" ASC LIMIT 20 OFFSET 0;
------------------------------------------------------------------------------------------------------------------------------------------
 Limit  (cost=8.45..8.46 rows=1 width=355) (actual time=0.047..0.047 rows=0 loops=1)
   ->  Sort  (cost=8.45..8.46 rows=1 width=355) (actual time=0.046..0.046 rows=0 loops=1)
         Sort Key: email
         Sort Method: quicksort  Memory: 25kB
         ->  Index Scan using index_users_on_email on users  (cost=0.42..8.44 rows=1 width=355) (actual time=0.039..0.039 rows=0 loops=1)
               Index Cond: (lower((email)::text) = 'smith'::text)
 Planning time: 0.154 ms
 Execution time: 0.071 ms
(8 rows)

0.071ms from 1500ms is great speed improvement. This excercise also highlighted a huge flaw in the way the unique index was created on this table. We had a unique index on the email column but it wasn't a lower(unique) index which meant the unique check was case sensitive! For something like a unique index this is certainly not what you want.