Back

Fixing a slow query

We found that a certain number of queries were running really slow. In this article I want to explore some of the findings I've had when speeding up some of the queries.

Tables

Below are the tables that are involved in this query

When designing the data model for this platform I really tried to follow good normalisation practices. That is, trying to split up the responsibilities of the tables as much as possible. I'd always heard though that it was a balance between normalisation and performance, and I believe that one of the reasons we have issues is joining large tables.

Query

The query in question that was running slow was quite a basic one. A query that allows for businesses to be able to search for a user, fetching data from all of the tables.

SELECT people.id,
details.first_name,
details.last_name,
details.dob, users.email, 
validations.valid

FROM people
INNER JOIN users ON people.user_id = users.id
INNER JOIN details ON users.personal_detail_id = details.id
LEFT JOIN validations ON people.validation_id = validations.id
INNER JOIN business_people ON business_people.student_id = people.id
WHERE (business_people.business_id = 1 AND business_people.active is TRUE) AND
(
   (LOWER(details.first_name) = 'smith' OR LOWER(details.last_name) = 'smith') OR
   LOWER(users.email) = 'smith'
)
ORDER BY "details"."last_name" ASC LIMIT 20 OFFSET 0 ;

So there is a fair bit going on here.

In all honesty there was a whole lot more wrong before I implemented some emergency changes to the tables in question

We got a bit of a speed increase by making these changes but not a whole lot.

Optimisation Process

My goal here is get the time down to less than 10ms

Test Results

Test Run 1 Run 2 Run 3
Original 5506ms 5214ms 6082ms
2 - QUERY ~* instead of LOWER 2484ms 3911ms 2677ms
3 - MVIEW / Query with LOWER 605ms 489ms 489ms
3 - MVIEW / Query without LOWER 399ms 389ms 399ms
4 - MVIEW / 3x QUERY UNION ALL 6ms 10ms 6ms

Test 2

*Test 3 -- Materialized View (MVIEW) *

    CREATE INDEX on user_list(last_name);
    CREATE INDEX on user_list(email);
    CREATE INDEX on user_list(business_id);
    CREATE UNIQUE INDEX on user_list(id, business_id);

So with the new MVIEW my query ended up being:

SELECT * FROM user_list
WHERE (business_id = 1) AND 
(LOWER(first_name) = 'smith' OR (LOWER(last_name) = 'smith') OR (LOWER(email) = 'smith'))
ORDER BY "last_name" ASC 
LIMIT 20 OFFSET 0;

TOTAL TIME ~ 500ms

Not bad, still pretty slow for a query. I knew that the LOWER was really costing me because we were asking SQL to lower the result set before doing the comparison.

Test 3

I moved the LOWER into the view because I figured there is no point in running this expensive function everytime we query.

So i dropped the view and started again - but this time adding a LOWER on firstname and lastname

CREATE MATERIALIZED VIEW user_list
AS

  SELECT people.id,
  lower(details.first_name) as first_name,
  lower(details.last_name) as last_name,
  details.dob,
  users.email, 
  validations.valid

  FROM people
  INNER JOIN users ON people.user_id = users.id
  INNER JOIN details ON users.personal_detail_id = details.id
  LEFT JOIN validations ON people.validation_id = validations.id
  INNER JOIN business_people ON business_people.student_id = people.id
  WHERE business_people.active is TRUE

WITH NO DATA;

This means I could remove the LOWER out of my query and end up with

SELECT * FROM rental_by_category
WHERE (business_id = 1) AND (first_name = 'smith' OR last_name = 'smith' OR email = 'smith')
ORDER BY "last_name" ASC
LIMIT 20 OFFSET 0;

TOTAL TIME ~380ms

Test 4

I had read about OR clauses being quite slow, and so I experimented with removing the different OR statements and saw a huge performance boost. How could I achieve an OR type of filter without actually doing one? Enter UNION ALL

EXPLAIN ANALYZE

  SELECT * FROM user_list
  WHERE (business_id = 1) AND (first_name = 'smith')

  UNION ALL

  SELECT * FROM user_list
  WHERE (business_id = 1) AND (last_name = 'smith')

  UNION ALL

  SELECT * FROM user_list
  WHERE (business_id = 1) AND (email = "smith")

ORDER BY "last_name" ASC
LIMIT 20 OFFSET 0;

TOTAL TIME ~ 7ms

That's within parameters so this is the solution I'm going to go with

Extra Experimentation

Refreshing the MVIEW

So the MVIEW is cached, thats what makes it so fast.

We need to run REFRESH MATERIALIZED VIEW CONCURRENTLY user_list; whenever we want to refresh the data. How often really depends on your needs.

There are ways to do this automatically using TRIGGERS however I didn't really want to refresh the MVIEW everytime something changed, I'm ok with doing it every 30 mins for instance.