Barefoot Development

I'm Glad Rails Loves SQL

It's a good thing that the core team for Ruby on Rails has chosen to embrace SQL, and easily expose it to developers. Other object-relational mapping layers often try to hide SQL as 'evil'. However, there are times when nothing will substitute for being able to talk directly to the database, in its own language.

Yesterday was such a time. I have an application that displays articles divided into hierarchies of categories. Articles are rendered differently on the site depending on their category.

I needed to change a list of articles from one top-level category (let's call it food) so that some articles in one of its sub-categories (call it expert reviews) only appeared one time per sub-category. So, while articles from other sub-categories of food would all appear in the list, only the latest article from any expert reviewer would appear in that same list.

I wanted to handle this in the model using a custom finder so that controllers could just call something like Category.published_article_list(options) and the details would be irrelevant.

Ok, enough background, here's the method I added to the Category model:

def published_article_list(options = {})
if (food?)
# Special query that returns only the latest article for each 'reviewer' category
sql = "mid_category_id != #{REVIEW_CAT} OR articles.id IN ( "
sql += "SELECT SUBSTRING( MAX( CONCAT( published_on, id ) ), 11 ) AS ra_id "
sql += "FROM articles "
sql += "WHERE mid_category_id = #{REVIEW_CAT} AND "
sql += " #{Article.conditions_published} "
sql += "GROUP BY category_id ) ) "
options[:conditions] = sql
end
self.published_root_articles.find(:all, options)
end

That SQL is a little serious, so here's what it does in English. It starts by removing any articles that have a mid-level category of "review". It then adds the articles from that category that we want using a subquery.

The meat of the subquery is in this part of the SELECT clause: SUBSTRING( MAX( CONCAT( published_on, id ) ), 11 ) AS ra_id. I needed to get the ID of the latest article, grouped by category_id (the individual reviewer). So, this code concatenates the published_on date to the article ID, finds the max of all those values, then uses SUBSTRING to chop the concatenated date off, leaving only the id of the latest article. So, for a list of articles with IDs and dates like this:

id date category_id
101 2007-10-01 1
102 2007-09-30 1
103 2007-10-02 2
104 2007-10-01 2

When the concatenated dates and IDs are sorted, they end up like this:

2007-10-02103
2007-10-01104
2007-10-01101
2007-09-30102

Using this method, the subquery accurately adds the IDs of articles 101 and 103 to the IN clause above.

The only downside to this method is that the CONCAT function requires a full table scan, so indexes won't be used in the subquery. Since this page is cached, it wasn't a problem for this application, but if performance becomes an issue other optimization could be done while still using this strategy.

Thanks for loving SQL, Rails!

Doug Smith, Senior Developer, Barefoot

0 comments

Post a Comment

« Home