Stored Procedures: A Seemingly Nice Tool with Hidden Problems


Abstract

The stored procedure is as widely used in database computing as the controversy around the technique is long-standing. By analyzing its two recognized merits, we try to locate the potential risks it poses and its application scenarios.

 

Source

https://www.datasciencecentral.com/profiles/blogs/stored-procedures-a-seemingly-nice-tool-with-hidden-problems

 

Summary:

A brief article that discusses 2 important aspects of the use of stored procedures:  The use of stored procedures to seperate business logic and UI logic, and stored procedure computational performance.  The author claims that "According to tests, using stored procedures to prepare data gives a higher performance compared with retrieving data out via SQL and then manipulating it outside of the database", but he does not talk about which specific tests he's referring to.  The author claims that the overhead of using a database connectivity tool such as JDBC is significant, due to network latency and slow data access, also arguing that set-based operations (specifically aggregate functions) perform better when run inside the database.  I get the sense that the author is comparing the difference between simply using primitive "select" operations to retrieve data, and then using a different language to perform processing.

 

Building on the approach:

Since the author does not document how he arrived at the conclusions that stored procedures have better performance, and doesn't really quantify what "performance" means in this case, it's difficult to validate his claims.  He does mention that "If there are too many stored procedures in a database for computation, the database will be further burdened and the already slow stored procedure will become slower when performing concurrent operations", which sounds like "using a database as a service instead of a repository can cause performance issues", which seems like a fairly obvious conclusion.  One interesting note: it might be a good idea to benchmark database connectivity libraries, to see if there are significant performance differences that might impact results.

 

 

Relevant content:

 

Does the stored procedure have high computational performance?

According to tests, using stored procedures to prepare data gives a higher performance compared with retrieving data out via SQL and then manipulating it outside of the database. But why the stored procedure performs better?

According to some views on the internet, stored procedures are fast because they are precompiled, while the SQL statements need to be compiled as they are executed. In fact, compared with the time spent in computing the data, the time taken to compile SQL statements can be ignored. A SQL statement that needs to be repeatedly executed with different parameters can also be precompiled. Some programmers embed different parameters in one SQL and send a different SQL to the database each time for execution. Then the compilation time can be quite long.

The truth is that stored procedures are fast because data is computed in the database. An application outside of the database must use the database interface to access the data. Most of the database interfaces, however, perform poor, particularly the JDBC interface intended for the JAVA applications. The JDBC interface will be called each time when a SQL statement is sent to the database for execution, which brings down the execution speed. There is network latency if the application and the database are not in the same computer, but it isn’t more serious than the poor performance of an interface. For external storage computing, the time taken to retrieve data from the database is usually longer than the data computing itself.

Moreover, stored procedures have a bad execution performance. According to tests on a top commercial database, the stored procedure is nearly one order of magnitude slower by retrieving data out row by row when performing a certain computation (such as a sum over a big table’s  field), while one SQL statement can get it done. Even a high-level language like Java is much faster than the stored procedure when performing the same computation by retrieving data from the file system. It’s easier to write parallel code for the external storage computing to make full use of the multiple CPUs of modern servers. But generally stored procedures don’t support the parallel processing. If there are too many stored procedures in a database for computation, the database will be further burdened and the already slow stored procedure will become slower when performing concurrent operations.

 

 

If there’s a convenient computing engine independent of the database, we can get away from the stored procedure’s problems by moving computations that are volatile and closely connected with the user interface outside of the database to integrate with the application. This way the maintenance cost can be reduced. We can even perform computations that are more stable outside of the database to address the issues of multi-database and non-database sources.  An application structure without stored procedures is a more reasonable one.