Login

Stored Procedures and Drupal 7

This article is for PHP programmers and those folks who are comfortable working directly with relational databases.

Stored procedures are not a general purpose solution. However, as Drupal is adopted by wider audiences, you may find your self in a situation where a dedicated solution is required (e.g. making calls to stored procedures).

For example, a customer wants Drupal to report the result of a tried and true scientific calculation. The calculation resides in a stored procedure. Drupal may be new to the customer, but the customer's core business processes are well established (i.e. the stored procedure is established).

This article provides a simple proof-of-concept example to get you started. This article does not represent a comprehensive study of Drupal 7 and stored procedures. To summarize, I found this to be an interesting topic.

Using a stored procedure, record selection was 14 times faster (versus a prepared statement via db_select).

Background:

The main advantage of stored procedures is performance.

The main disadvantage is, store procedures are usually vendor specific. Often a stored procedure written for one database (e.g. MySql) can not be run in another database (e.g. Postgres).

A database administrator (DBA) often writes the stored procedure. An application programmer writes code that calls the stored procedure.

The term stored procedure refers to "application logic" stored directly in the database (as opposed to application logic stored in PHP files). Application logic is thus run (and stored) within the database, reduces network traffic.

Case Study Details.

We are going to write 2 simple functions. We'll perform some crude timings and look at the results.

Both methods simply select "titles" from all nodes of a specific content type and publishing state. Not a real world query, but enough to demo performance difference. I am using MySql version 5.5.9 running under Arch Linux 2.6.37. I am using the current Drupal 7.0 release. The database server, web server and Drupal installation all resided on the same host (my laptop).

The first method uses the Drupal API method db_select(), the second method calls a stored procedure. Other than that, both methods are the same. Both methods take the database result, capture a timing value and display everything using theme_table().

Timing results (measured in seconds):

db_select() call stored procedure Difference Magnitude x Faster (stored procedure)
0.0125620365 0.000674963 0.0118870735 18.61
0.0096518993 0.0007021427 0.0089497566 13.75
0.0115540028 0.0007169247 0.0108370781 16.12
0.0083408356 0.0006780624 0.0076627731 12.30
0.0120971203 0.000742197 0.0113549232 16.30
0.0085980892 0.0006630421 0.0079350471 12.97
0.0078208447 0.0006899834 0.0071308613 11.33
0.0103640556 0.0006439686 0.0097200871 16.09
0.0084490776 0.0007648468 0.0076842308 11.05
0.0082781315 0.0006940365 0.007584095 11.93
Total      
0.0977160931 0.0069701672 0.0907459259 14.02

How to call a stored procedure.

Drupal 7 uses the PHP PDO data-access abstraction layer. A PDOStatement provides the methods to bind input parameters (e.g. content type and publishing state) and call a stored procedure.

Use a database connection to prepare a PDOStatement statement. Therefore we use the Drupal 7 API method Database::getConnection() to obtain a database connection. We save the connection's default statement class. We set the statement class of the connection to PDOStatement:

  $conn = Database::getConnection();
  $saved_class = $conn->getAttribute(PDO::ATTR_STATEMENT_CLASS);
  $conn->setAttribute(PDO::ATTR_STATEMENT_CLASS, array('PDOStatement'));
.

We prepare the statement and bind the input parameters:

  $statement = $conn->prepare("Call GetNodeList(?,?)");
 
  $op_status = $statement->bindParam(1, $node_type, PDO::PARAM_STR | PDO::PARAM_INPUT_OUTPUT, 25);
  $op_status = $statement->bindParam(2, $publish_state, PDO::PARAM_INT | PDO::PARAM_INPUT_OUTPUT);

We execute the statement and then set connection back to it's default statement type:

  $exec_result = $statement->execute();
  $conn->setAttribute(PDO::ATTR_STATEMENT_CLASS, $saved_class);

Note! Full source located at the bottom of this post.

Creating a Stored Procedure.

Let's look at our stored procedure. The following adds the stored procedure to your MySql database.

DELIMITER @@
DROP PROCEDURE GetNodeList @@
CREATE PROCEDURE GetNodeList
(IN node_type VARCHAR(25),IN publish_status INTEGER)
BEGIN    
 
  SELECT title as title from node where type = node_type and status = publish_status;     
 
END @@ 
DELIMITER ; 

Application code.

Here are our two methods. First the method using db_select() :

function test_select() {
 
  $mtime = microtime();
  $mtime = explode(" ", $mtime);
  $mtime = $mtime[1] + $mtime[0];
  $starttime = $mtime;
 
  $query = db_select('node', 'n')->fields('n', array('title'))
     ->condition('type', 'article', '=')
     ->condition('status', 1, '=');
 
  $result = $query->execute();
  $rows = array();
  foreach ($result as $row) {
    $rows[]['data'] = array($row->title);
  }
  $vars = array();
  $vars['header'] = $header;
  $vars['rows'] = $rows;
 
  $mtime = microtime();
  $mtime = explode(" ", $mtime);
  $mtime = $mtime[1] + $mtime[0];
  $endtime = $mtime;
  $totaltime = ($endtime - $starttime);
 
  $vars['caption'] = "This page was created in " . $totaltime . " seconds";
 
  return theme('table', $vars);
}

Call to Stored Procedure.

function test_stored_procedure() {
 
  $mtime = microtime();
  $mtime = explode(" ", $mtime);
  $mtime = $mtime[1] + $mtime[0];
  $starttime = $mtime;
 
  $header = array('title');
 
  $conn = Database::getConnection();
  $saved_class = $conn->getAttribute(PDO::ATTR_STATEMENT_CLASS);
 
  $conn->setAttribute(PDO::ATTR_STATEMENT_CLASS, array('PDOStatement'));
 
  $statement = $conn->prepare("Call GetNodeList(?,?)");
 
  $op_status = $statement->bindParam(1, $node_type, PDO::PARAM_STR | PDO::PARAM_INPUT_OUTPUT, 25);
  $op_status = $statement->bindParam(2, $publish_state, PDO::PARAM_INT | PDO::PARAM_INPUT_OUTPUT);
 
  $node_type = 'article';
  $publish_state = 1;
 
  $exec_result = $statement->execute();
  $conn->setAttribute(PDO::ATTR_STATEMENT_CLASS, $saved_class);
 
  $rows = array();
  while ($row = $statement->fetchColumn(0)) {
    $rows[]['data'] = array($row);
  }
 
  $vars = array();
  $vars['header'] = $header;
  $vars['rows'] = $rows;
 
  $mtime = microtime();
  $mtime = explode(" ", $mtime);
  $mtime = $mtime[1] + $mtime[0];
  $endtime = $mtime;
  $totaltime = ($endtime - $starttime);
  $vars['caption'] = "This page was created in " . $totaltime . " seconds";
 
  return theme('table', $vars);
}

A few thoughts.

I am not a database administrator, however I've programmed in various environments for over 30 years. During that time I have written stored procedures.

Starting this article, my initial guess was, there would be no noticeable performance difference.  The logical select is minor. The number of rows in the test database is tiny (under 30 total nodes exist in my whole Drupal 7 installation).

To my surprise, the stored procedure was 14 times faster.

Each database vendor provides a specific set of performance enhancements. I would not assume that same performance gain using a different vendor (E.G Postgres). Some vendors have optimized their prepared statements to such a degree, that a totally different set of results might be obtained. In other words, the performance difference between stored procedures and prepared statements may differ by vendor.

Thus, you really need to analyze and benchmark each specific situation.

I hope you found this interesting.

Comments

Mh, mh... - You want to

Mh, mh...

- You want to compare it with a static db_query(). What you are comparing here is Drupal's query builder with a prepared statements. It's quite obvious that db_select() is a lot slower. It's built for creating dynamic queries and should only be used when necessary.

- Your code for calling the prepared statement is imho way too complex than it needs to be. A simple "$result = db_query('Call GetNodeList(:type,:state)', array(':type' => 'article', ':state' => 1));" should be all you need.

- There are some handy helper methods for fetching data which can replace your fetch loop. "$rows = $result->fetchCol()" should be all you need. The 'data' key is only necessary when you have other things you want to apply like classes.

- This is a bad example to start with, because when querying {node} tables, you should *always* use $query->addTag('node_access') for security reasons so that node access is applied correctly.

Proof of Concept

Hi,
See my comment below titled Network Traffic. In the real world a legacy scientific calculation that's been in place to 15 years is not going to be written with the Drupal node table? Perhaps the author used the node table because most Drupal installations have one. The article is not about how to select nodes??? It's as stated, a proof-of-concept?

I tried your example and it was slower than the stored procedure. Have you tried your db_query mechanism with a stored procedure that contains out parameters? Are you certain that db_query binds the parameters like a PDOStatement? A stored procedure is pre-compiled executable in the database. The bind process is there for a reason.

Devel Generate

Try running Devel Generate (http://drupal.org/project/devel) to create a whole boatload of nodes and see what performance is like with different sized sites.

I suspect the difference will be much smaller for large data sets.

Network Traffic

Often a business method is a mix of application language code and calls to the database. It's not uncommon to have your web server and database server running on different hosts. Thus a stored procedure that performs all of the business logic in the database, by definition, reduces network traffic. A reduction is network traffic is a performance boost. More records in a typical complex proc would produce a much wider performance gap.

The stored procedure used in this example doesn't represent a typical "proc". As alluded in the article, stored procedures are normally complex processes that have many steps. I'm guessing the example used in the article was used just as a succinct example. Something that folks could easily paste in to the command line.

I've worked in places where stored procedures are created by a team of DBA's. You don't have a choice, they require you to use their centralized procs.