SQL (Structured Query Language) has been with us for more than half a century and it’s not going away anytime soon. Popular in both traditional relational databases and newer NoSQL databases technologies, SQL is widely used for data analytics, Big Data processing, coding languages, and more.
I’m a fan of SQL. It’s a powerful, yet simple language. In this post, I’ll share:
- The power of SQL and why we’re using it
- Common pitfalls and best practices
What is SQL and Why You Should Use it?
SQL was initially developed by IBM in the early 1970s, adopted by Oracle later that decade, and has been used in relational databases ever since. In 1986, SQL was adopted as a standard by the American National Standards Institute (ANSI).
As the volume of data has grown over time, new databases and technologies have emerged. One of these is Big Data, for data sets too complex for traditional data-processing applications to deal with. Common in Big Data, NoSQL databases aren’t based on the relational model. They’re sometimes called Not Only SQL, to emphasize that they may support an SQL-like query language. Some examples of newer non-relational databases and libraries that support SQL include:
- Apache Spark Analytics Engine
- Google Big Query – managed analytics data warehouse
- Facebook Presto and AWS Athena
- Elastic search
- Python Data Analysis Library (pandasql)
SQL is widely used both in OLTP (Transaction Processing) and OLAP (Analytic Processing), despite the two being very different use cases:
Snippet 1: pandasql
|Transaction Processing||Analytic Processing|
|Common usage||Relational databases||Big Data databases|
|SQL usage||Insert, update, and queries by many users concurrently||Select queries. Few Queries at a time|
|Scale of computation||Small||Large – entire tables|
|Query duration||Fast||Can be hours or more|
|Performance related features||Indices and keys||Partitions, redundant data (per use-case)|
|Analytic functions||Infrequently used||Common|
Before diving into SQL, here’s an example of SQL usage outside of databases — Python’s Data Analysis Library:
Name COUNT(DISTINCT ID) 0 James 1 1 John 2 2 Ted 1
As these examples show, SQL is alive and well in the newest and most popular technologies.
Let’s take a closer look at its power.
SQL Functions Examples
The strongest part of SQL is its functions. Here are the three types of functions SQL supports:
|Type||Scope||Reduction of records||Examples|
|Aggregated (group by)||Group of records||Yes||
|Analytic||Group of records||No||
Recommended SQL Coding Conventions (Naming and More)
To gain the power of SQL we highly recommend using coding conventions. This will improve both writing and reading, and improve team work. Here’s a short list of the more important ones:
|Names and case||Use underscores where you would naturally include a space in the name (last name -> last_name)
Upper case for keywords and functions (SELECT, FROM, WHERE, IN, SUM…)
Lower case for everything else: variables, table names, columns etc.
|Explicitly use keywords||In joins use keywords: CROSS, INNER… ON, LEFT… ON, RIGHT… ON
Avoid joins as part of the WHERE clause: SELECT… FROM table_a, table_b WHERE…
Aliasing – Always include the AS keyword
|Avoid nested forms, use WITH instead||Avoid SELECT … FROM (SELECT …)
Use WITH clauses! This way you can avoid complex queries, and create step-by-step readable queries
Avoid IN / NOT IN – Use JOIN.
|Line Spacing||Before AND, OR After each keyword definition|
|Include comments||Both /* and closing */ and — are acceptable|
|Do not use SELECT *||Yes, it’s compact and easy, but new columns are added and included to your query result without a need. Write explicitly the column name you need|
SQL Query Examples: Good & Bad
To show you how important the coding conventions are, please look at these good and bad examples.
|Flow||Two simple queries||One complex query|
|Keywords||Uppercase||Lowercase, like other text|
|Join||Explicit – easy to understand||Implicit – with other WHERE predicates|
|Indentation||Line spacing before OR helps in knowing its scope||No line spacing before OR. Larger condition will be much hard to understand|
Snippet 4: Bad example
select emp.department, sum(exp.amount) expenses from employees emp, expenses exp where emp.department in (select department from employees where department like '%development%' or department like '%management%' group by department having count(distinct id) > 50) and emp.id = exp.employee_id group by emp.department
Snippet 5: Good example
-- filter: Development and Management departments with more than 50 employees WITH dept AS (SELECT department FROM employees WHERE department LIKE '%Development%' OR department LIKE '%Management%' GROUP BY department HAVING Count(DISTINCT id) > 50) -- find expenses for each of the filtered departments SELECT emp.department, SUM(exp.amount) AS expenses FROM employees AS emp INNER JOIN dept ON emp.department = dept.department INNER JOIN expenses AS exp ON emp.id = exp.employee_id GROUP BY emp.department
Here’s a nice online tool for SQL formatting that you can use: https://sqlformat.org/
SQL Problems Troubleshooting – Mistakes & Pitfalls
Like any other language, SQL has its own pitfalls which you should be aware of to save yourself time and prevent bugs in your application.
The first cause for SQL errors & bugs – Null values
Snippet 6: NULL query result
SELECT NULL = NULL --> NULL SELECT NULL != NULL --> NULL SELECT NULL IS NULL --> true SELECT 1 + NULL --> NULL SELECT SUM(num_column) WHERE FALSE --> NULL
To overcome this issue you should use coalesce whenever there’s a danger:
Snippet 7: Overcome Null in joins
FROM t1 INNER JOIN t2 ON COALESCE(t1.name, '') = COALESCE(t2.name, '')
SQL Indices and Partitions Are Not Used
We spend a lot of time building our database schema by adding indices and declaring partitions. But we won’t get the performance that we want if we stop there — we have to make sure our database engine uses them.
For example, let’s assume we have an index on the first_name column:
|Index not used|
There may be changes from one implementation to another, but the concept is valid in any SQL implementation.
How do you make sure your indices and partitions are being used? Analyze your query execution plan. In some SQL implementations, you can use the EXPLAIN keyword. In Oracle, for example, it’s called EXPLAIN PLAN.
Analytical Functions Pitfalls in SQL – Evaluation Order
This one is important if you’re using an analytic function, as misunderstanding the evaluation order may lead to bugs. For example:
Snippet 8: Analytic function with filter
SELECT first_name, last_name, department, COUNT(1) OVER (PARTITION BY department) AS size_of_department FROM employees WHERE last_name LIKE 'A%'
We’ll get the number of employees in the department with a last name beginning with A, instead of all the employees in the department. Remember that analytic functions are executed last.
Date and time syntax, NULLs, string concatenation, comparison case sensitivity and other aspects of SQL vary from one vendor to another – making it hard to move an application between vendors.
Try using standard SQL only. For example – use COALESCE instead of ISNULL (SQL Server) or NVL(Oracle).
Prevent SQL Injection (SQLi) Attacks
A well known security pitfall in web applications is the SQL injection attack – a code injection technique that attacks applications by inserting statements into a field for execution. Such attacks are usually used to steal data, change data, or change permissions in the attacked database.
Preventing SQL Injection attacks starts during development, and should be followed by the use of scanners and web application firewalls to protect your application. Please refer to the SQL Injection blog post for examples and ways to prevent such attacks.
SQL is great. I personally prefer SQL-based language to any other query language or API. However, you need to use it correctly to make your work easier. Start by building your schema correctly and later use coding conventions and other options, like the WITH keyword and window functions.
Don’t forget to keep the known pitfalls in mind. And, If you’re writing an application, then protect it from SQL Injection.