Problem Statement


In relational database management systems, stored procedures are subroutines that allow developers to execute prepared sql statements directly inside the database.  Most commercial and many open source RDBMS systems have a stored procedure implementation; in fact many of the procedural elements of stored procedures were incorporated into the ISO Sql standards in the early 2000’s. 

 

Stored Procedures have a great many benefits.  The ability to execute application logic directly in the database layer can be extremely efficient and desirable many cases, for a variety of reasons.  For example, Stored procedures can completely avoid the problems associated with network traffic.  Data consistency and integrity become very easy to manage, because business logic is centralized in a single location.  In addition, managing data access security can be greatly simplified. 

 

The use of stored procedures over time, however, can have significant drawbacks.  Stored procedures are highly procedural, and creating unit tests can be extremely problematic, if not impossible, because of the difficulty of separating concerns.  Dependency injection is almost impossible, although some niche products exist to assist with this problem.  More importantly, the use of stored procedures inhibits application scalability. Storing business logic in the database implies that only the database can execute it, and it can be prohibitively expensive to vertically scale a database.  Vertical scaling involves upgrading the database server itself, usually by increasing ram, increasing hard disk space, or increasing the CPU.  In most business applications, these types of servers are typically high-end enterprise machines, where upgraded components can be extremely expensive, and server licensing costs can be astronomically high.  For example, it is not uncommon for databases to use RAID arrays, which provide for disk redundancy in case of drive failure.  For the purposes of reliable data storage, this is appropriate.  For performing many types of computations, however, there is an opportunity cost to using unnecessarily expense hardware to execute these tasks.  For cost reasons, it makes sense to offload computations onto cheaper servers that do not require expensive hardware or licensing.

 

Conventional approaches to application scalability mainly focus on horizontal scaling; distributing processing to multiple computers or virtual machines.  Unlike vertically scaling a database server, horizontal scaling has the added benefit of being able to use far less expensive, expendable machines.  In horizontal scaling, the failure of a single computer (node) in a cluster is acceptable, and damage can be easily mitigated.  It also has the added benefit of allowing for efficient parallel processing in some cases.  Unfortunately, this type of scaling is not possible with stored procedures.   

 

Stored procedures can become particularly problematic if they perform large number of computations, and therefore take a very long time to execute.  Long execution times can lead to database unavailability and performance issues, leaving less resources for the database to achieve its primary mission: storing and retrieving data.  Databases are expensive resources; a cost savings could be achieved by finding a way to “break up” these large procedures in such a way that cheaper servers could be used to perform computations.

 

Such an approach would not be trivial.  There is a cost to offloading data to another server to perform computations, even if the computers are collocated.  In some cases, such an approach might not be feasible if the amount of data involved in the calculation is extremely large; any benefit achieved would quickly erode due to network latency.  A technique (or tool) that could find an optimal way to decide the best approach to offloading calculations could achieve a number of goals:

 

1)      Increase code testability by moving key pieces of logic outside of the stored procedure.  We can measure improvements by examining unit test code coverage and cyclomatic complexity

2)      .Reduce the workload on the database by moving computations onto a different, less expensive server.  Improvements can be measured in

a)   overall computation time

b)   changes in database server CPU, memory, and disk performance

c)    Monitoring and measuring network latency, which will inevitably be introduced in this process

 

 

Note that this approach does not necessarily imply that the stored procedures themselves should be disposed of entirely; it’s possible that such an approach could simply combine smaller, faster executing stored procedures derived from the original, orchestrated by an external process.