SQL – from past to present

DataAdmin — Fri, 06 Jun 2025 15:04:55 +0000

The importance of SQL and relational databases has been overly debated, specially in the last 20 years with the ever growing volume of data. Now in the AI age both things are still fundamental to most companies, from everything that is said about SQL little is mentioned about its history, why it came to be and how it is evolving. For those reasons we are going to go in depth about this.

Origins

Databases have been around since the 1960s. However, the languages available to manipulate data didn’t allow handling a large amount of data at once, other than that they required that the user specified exactly how they needed the data to be handled. In trying to find solutions for that, Donald D. Chamberlin (linkedin page and IEEEXplore page) and Raymond F. Boyce developed a language that could not only manipulate multiple data points at once, as also the user could write what it needed, not the steps to perform the task (declarative vs imperative language).

Published in the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description, access and control and originally called SEQUEL, this language would use the relational data model developed by Edgar F. Codd (more of his biography on IBM page and ACM institute). The paper called A Relational Model of Data for Large Shared Data Banks can be found in this link

Figure 1: Edgar F. Codd, Donald D. Chamberlin and Raymond F. Boyce, respectively.

On Codd’s paper a set of 12 rules are presented in order to use the relational model and the features a computing language should have in order to interact with it. In SQL in a nutshell we have a detailed explanation about all the rules and how they complement each other. But basically they are grouped in 6 sets, one that talks about data structures and how data should be abstracted and accessed. Of set talking about Nulls, other for tables metadata, an additional for the operations on these tables, followed by one for how the language that is used should behave and views. A summary of the rules and each group they are used can be viewed in Figure 2.

Figure 2: Categories of Codd’s rules.

The evolution

Far from being a dead language SQL is in constant evolution, adapting to the changes of time and evolving needs of the community. From its first standard version in 1986/1987, as shown in the timeline in Figure 3, it has been adding new features and evolving its specification with important advancements like control-flow expressions, window functions and JSON support.

The latest standard, ISO/IEC 9075:2023 or SQL:2023 in short, adds JSON data type and property graph queries. Some important highlights and an overview on how to use them can be seen below and for those interested in deepening their knowledge, all the standards can be found in the ISO page or the ANSI page

Figure 3: SQL Standard Timeline.

User-defined types

User-defined types (UDTs) enable you to create custom data types by extending existing ones. This is achieved through the CREATE TYPE and CREATE DOMAIN statements. UDTs can be utilized to create distinct types, structured types, reference types, arrays, rows, or cursors.

User-defined types are significant in SQL because they enhance data modeling, allowing developers to establish types that more accurately reflect business concepts. This results in a more readable and semantic database schema, promoting the reusability of validation or complex structures.

Many prominent modern relational database management systems (DBMSs) and object-relational database systems implement and utilize UDTs, with notable examples including PostgreSQL, SQL Server, Oracle Database, and DuckDB. MySQL and MariaDB lack native support for User-Defined Types (UDTs) in the same intricate and robust manner found in databases like Oracle or PostgreSQL.

Listing 1 shows an example of how to create the types “color” and “body_type”, and also create the table “car”. When we try to insert values delimited by the types, the insertion works, but when we try to insert a value outside of the possible values, it returns an error. As can be seen in Listing 2.

CREATE DOMAIN body_type VARCHAR(5)
     CHECK (VALUE IN ('van', 'sedan', 'hatch'));
CREATE TYPE color AS ENUM ('red', 'green', 'blue');

CREATE TABLE cars (
 type body_type,
 color color
);

Listing 1: Creation of domain body_type and ENUM type color.

INSERT INTO cars (type, color) VALUES ('van', 'blue');
-- Query returned successfully in 117 msec.
INSERT INTO cars (type, color) VALUES ('LL', 'blue');
-- ERROR:  value for domain body_type violates check constraint "body_type_check"

Listing 2:

Control-flow

Control flow functions in SQL are special functions that allows you to add conditional logic in your queries. Instead of simply retrieving data based on static conditions, these functions let you execute different actions or return different values depending on whether certain conditions are met. Think of them as the “if-then-else” statements of SQL.

Control flow functions are incredibly important for several reasons. You can create more detailed and granular reports by categorizing data or displaying customized messages based on specific criteria. For example, classifying sales as “High”, “Medium” or “Low” based on their values. Let’s go back to the example of transactions in New York and San Francisco.

The vast majority of modern relational Database Management Systems (DBMSs) support flow control functions. Functions such as CASE, IF, IFNULL, NULLIF, and COALESCE are essential for adding conditional logic to your queries and stored procedures. But each DBMS has its own implementation; for more information, see the documentation: SQL Server, MySQL, PostgeSQL, MariaDB.

id	date	city	amount
1	1 de nov. de 2020	San Francisco	42.065
2	1 de nov. de 2020	New York	112.985
3	2 de nov. de 2020	San Francisco	221.325
4	2 de nov. de 2020	New York	49.900
5	2 de nov. de 2020	New York	98.030
6	3 de nov. de 2020	San Francisco	87.260
7	3 de nov. de 2020	San Francisco	345.225
8	3 de nov. de 2020	New York	56.335
9	4 de nov. de 2020	New York	184.310
10	4 de nov. de 2020	San Francisco	170.500

Table 1: Table of transactions in New York City and San Francisco.

SELECT
 id,
 date,
 city,
 amount,
 CASE
 WHEN amount > 150 THEN 'High'
 WHEN amount BETWEEN 80 AND 150 THEN 'Medium'
 ELSE 'Low'
 END AS transaction_classification
FROM transactions;

id	date	city	amount	transaction_classification
1	1 de nov. de 2020	San Francisco	42.065	Low
2	1 de nov. de 2020	New York	112.985	Medium
3	2 de nov. de 2020	San Francisco	221.325	High
4	2 de nov. de 2020	New York	49.900	Low
5	2 de nov. de 2020	New York	98.030	Medium
6	3 de nov. de 2020	San Francisco	87.260	Medium
7	3 de nov. de 2020	San Francisco	345.225	High
8	3 de nov. de 2020	New York	56.335	Low
9	4 de nov. de 2020	New York	184.310	High
10	4 de nov. de 2020	San Francisco	170.500	High

Listing 3: How to use the CASE flow control clause to classify transactions into “low”, “medium” and “high”.

Window Function

Window functions have been available in SQL since SQL:2003 and are supported by all major SQL database systems. Notable among them are DuckDB, MySQL, PostgreSQL, Oracle Database, SQL Server, and DB2.

While traditional aggregate functions (such as SUM, AVG, COUNT), when used with GROUP BY, collapse multiple rows into a single summarized row by group, window functions operate on a set of rows relative to the current row, called the window frame, without grouping them. They use the OVER() clause to define the “window” (the set of rows) over which the function should operate. As shown in Figure 4.

They are helpful for tasks such as calculating moving averages, calculating cumulative statistics, or accessing row values relative to the current row’s position (ranking values). Listing 4, for example, we are using the window function ROW_NUMBER to rank the data in table 1.

It is worth noting that we can also use traditional aggregation functions with the OVER clause. In Listing 5 for example, we use the SUM function to calculate the percentage of total transactions by city. The ROUND function is being used to round off the final results.

Figure 4: Difference on aggregation and window functions.

SELECT
    id,
    date,
    City,
    amount,
    ROW_NUMBER() OVER (PARTITION BY city ORDER BY amount DESC) AS  ranking
FROM transactions;

id	date	city	amount	ranking
9	4 de nov. de 2020	New York	184.310	1
2	1 de nov. de 2020	New York	112.985	2
5	2 de nov. de 2020	New York	98.030	3
8	3 de nov. de 2020	New York	56.335	4
4	2 de nov. de 2020	New York	49.900	5
7	3 de nov. de 2020	San Francisco	345.225	1
3	2 de nov. de 2020	San Francisco	221.325	2
10	4 de nov. de 2020	San Francisco	170.500	3
6	3 de nov. de 2020	San Francisco	87.260	4
1	1 de nov. de 2020	San Francisco	42.065	5

Listing 4: Use o row_number in a window to rank transactions by amount in each year

SELECT
 id,
 date,
 city,
 amount,
 ROUND((amount / SUM(amount) OVER (PARTITION BY city)) * 100.0, 2) AS  percentage
FROM transactions;

id	date	city	amount	percentage
2	1 de nov. de 2020	New York	112.985	22,53
4	2 de nov. de 2020	New York	49.900	9,25
8	3 de nov. de 2020	New York	56.335	11,23
9	4 de nov. de 2020	New York	184.310	36,75
5	2 de nov. de 2020	New York	98.030	19,55
10	4 de nov. de 2020	San Francisco	170.500	19,68
3	2 de nov. de 2020	San Francisco	221.325	25,55
6	3 de nov. de 2020	San Francisco	87.260	10,07
7	3 de nov. de 2020	San Francisco	345.225	39,85
1	1 de nov. de 2020	San Francisco	42.065	4,86

Listing 5: Using the SUM function as a window function

JSON Support

JavaScript Object Notation (JSON) is a widely used format for storing and exchanging data, especially in web applications. Its format accommodates various data types, including arrays, integers, strings, nulls, and more, as illustrated in Figure 5. Many databases today support a JSON datatype, such as DuckDB, MySQL, PostgreSQL, Oracle Database, and SQL Server. Others, like SQLite, lack a JSON type but offer functions for handling JSON data.

JSON support typically involves accessing the fields within the JSON using dot notation or a variation of arrow notation. Additionally, it allows for merging different JSON objects, verifying if JSON is correctly formatted, transforming it into an array, and writing or deleting data. For example, Listing 1 demonstrates how to retrieve values from a JSON object by specifying the desired key.

Figure 5: JSON example.

SELECT JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name');

+------------------------------------------------+
| JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name') |
+---------------------------------------------+
| "Aztalan"            
+---------------------------------------------+

Listing 6: Command to extract value given json key and result.

Multidimensional Arrays

According to SQL Support for Multidimensional Arrays In some situations it is needed to go beyond the 2d representation of data and use what is called Multidimensional Arrays (MDAs) to represent more complex data. Some examples of that can be to model weather data along with image processing and time series data. MDAs are basically arrays that have element other arrays, starting with 3 dimensional arrays and going to any arbitrary number of dimensions. A representation for the 3d case is seen in Figure 5.

Although it is not clear the extent of support of most DBs for MDAs DuckDB, OracleDB and PostegreSQL have documentation on how to create them based on the widely adopted array data type. One example is shown in the listing 3 for DuckDB .

Figure 6: Illustration of 3d MDA.

SELECT array_value(array_value(1, 2), array_value(3, 4), array_value(5, 6));

Listing 7: Creation of MDA, called nested array in DuckDB.

Present

Currently there are many implementations of SQL, although there is an official standard of SQL each vendor must certify itself as compliant. This happened since 1996 when the National Institute of Standards and Technology (NIST) stopped certifying SQL DBMS’. The list of vendors that support SQL is ever growing, some worth mentioning are: MySQL and MariaDB, Oracle Database, PostgreSQL and Microsoft SQL Server, see Figure 5.

Even with SQL popularity there are a number of languages that aim to make use of the relational database model. Some aim to tackle different use cases such as hierarchical data, object oriented programming and others. The technologies below are alternatives to the SQL language:

Figure 7: List of some of the most popular relational databases.

Bibliography

[1]Wikipedia Contributors, “SQL,” Wikipedia, Dec. 11, 2018. https://en.wikipedia.org/wiki/SQL

[2]AWS, “What is SQL (Structured Query Language)?”, AWS. https://aws.amazon.com/what-is/sql/?nc1=h_ls

[3]Alura, “banco dados mysql executando procedures”, Alura. https://www.alura.com.br/conteudo/banco-dados-mysql-executando-procedures

[4]IONOS editorial team, “Learn SQL – A tutorial with examples”, IONOS, Nov. 16, 2024. https://www.ionos.com/digitalguide/server/configuration/sql-introduction-with-examples/

[5]”A Brief History of SQL and its Usefulness”, COGINITI. https://www.coginiti.co/tutorials/introduction/what-is-sql/

[6]”8.15. Arrays” Disponivel em: https://www.postgresql.org/docs/17/arrays.html

[7]”Creating and Populating Multi-Dimensional Arrays” Disponivel em: https://docs.oracle.com/cd/E92519_02/pt856pbr3/eng/pt/tpcr/task_CreatingandPopulatingMulti-DimensionalArrays-071663.html?pli=ul_d38e251_tpcr

[8]”Array Type” Disponível em: https://duckdb.org/docs/stable/sql/data_types/array.html

[9]”Array Functions” Disponível em: https://duckdb.org/docs/stable/sql/functions/array.html

[10]MIŠEV, Dimitar e BAUMANN, Peter. SQL Support for Multidimensional Arrays Disponível em: https://www.ifis.uni-luebeck.de/~moeller/Lectures/WS-19-20/NDBDM/12-Literature/Misev-Baumann-SQL-MDA.pdf

[11]”JSON data in SQL Server” Disponível em: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver17

[12]”JSON functions (Transact-SQL)” Disponível em: https://learn.microsoft.com/en-us/sql/t-sql/functions/json-functions-transact-sql?view=sql-server-ver17

[13]”9.16. JSON Functions and Operators” Disponível em: https://www.postgresql.org/docs/current/functions-json.html

[14]SAXON, Chris. “How to Store, Query, and Create JSON Documents in Oracle Database” Disponível em: https://blogs.oracle.com/sql/post/how-to-store-query-and-create-json-documents-in-oracle-database

[15]”8.14. JSON Types” Disponível em: https://www.postgresql.org/docs/current/datatype-json.html

[16]”13.5 The JSON Data Type” Disponível em: https://dev.mysql.com/doc/refman/8.4/en/json.html

[17]”JSON Overview” Disponível em: https://duckdb.org/docs/stable/data/json/overview.html

[18]”JSON Functions And Operators” Disponível em: https://www.sqlite.org/json1.html

[19]ETI-MFON, Ime. “Control Flow Functions in SQL” Disponível em: https://medium.com/@etimfonime/control-flow-functions-in-sql-9b66f8830da5

[20]”Control-of-Flow” Disponível em: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/control-of-flow?view=sql-server-ver17

[21]”15.6.5 Flow Control Statements” Disponível em: https://dev.mysql.com/doc/refman/8.4/en/flow-control-statements.html

[22]”9.18. Conditional Expressions” Disponível em: https://www.postgresql.org/docs/current/functions-conditional.html

[23]”Manipulate user-defined type (UDT) data” Disponível em: https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-types/working-with-user-defined-types-manipulating-udt-data?view=sql-server-ver16

[24]”36.13. User-Defined Types” Disponível em: https://www.postgresql.org/docs/current/xtypes.html

Uncategorized – Data in Depth