Uncategorized – Data in Depth https://dataindepth.site By Felipe Fernandes Lopes Wed, 06 Aug 2025 18:25:38 +0000 en-US hourly 1 https://wordpress.org/?v=6.8.2 https://dataindepth.site/wp-content/uploads/2025/06/cropped-foto_minha-scaled-1-32x32.jpg Uncategorized – Data in Depth https://dataindepth.site 32 32 SQL – from past to present https://dataindepth.site/2025/06/06/sql-from-past-to-present/ https://dataindepth.site/2025/06/06/sql-from-past-to-present/#respond Fri, 06 Jun 2025 15:04:55 +0000 https://dataindepth.site/?p=1 The importance of SQL and relational databases has been overly debated, specially in the last 20 years with the ever growing volume of data. Now in the AI age both things are still fundamental to most companies, from everything that is said about SQL little is mentioned about its history, why it came to be and how it is evolving. For those reasons we are going to go in depth about this.

Origins

Databases have been around since the 1960s. However, the languages available to manipulate data didn’t allow handling a large amount of data at once, other than that they required that the user specified exactly how they needed the data to be handled. In trying to find solutions for that, Donald D. Chamberlin (linkedin page and IEEEXplore page) and Raymond F. Boyce developed a language that could not only manipulate multiple data points at once, as also the user could write what it needed, not the steps to perform the task (declarative vs imperative language). 

Published in the 1974 ACM SIGFIDET (now SIGMOD) workshop on Data description, access and control and originally called SEQUEL, this language would use the relational data model developed by Edgar F. Codd (more of his biography on IBM page and ACM institute). The paper called A Relational Model of Data for Large Shared Data Banks can be found in this link 

Figure 1: Edgar F. Codd, Donald D. Chamberlin and Raymond F. Boyce, respectively.

On Codd’s paper a set of 12 rules are presented in order to use the relational model and the features a computing language should have in order to interact with it. In SQL in a nutshell we have a detailed explanation about all the rules and how they complement each other. But basically they are grouped in 6 sets, one that talks about data structures and how data should be abstracted and accessed. Of set talking about Nulls, other for tables metadata, an additional for the operations on these tables, followed by one for how the language that is used should behave and views. A summary of the rules and each group they are used can be viewed in Figure 2.

Figure 2: Categories of Codd’s rules.

The evolution 

Far from being a dead language SQL is in constant evolution, adapting to the changes of time and evolving needs of the community. From its first standard version in 1986/1987, as shown in the timeline in Figure 3, it has been adding new features and evolving its specification with important advancements like control-flow expressions, window functions and JSON support. 

The latest standard, ISO/IEC 9075:2023 or SQL:2023 in short, adds JSON data type and property graph queries. Some important highlights and an overview on how to use them can be seen below and for those interested in deepening their knowledge, all the standards can be found in the ISO page or the ANSI page 

Figure 3: SQL Standard Timeline.

User-defined types

User-defined types (UDTs) enable you to create custom data types by extending existing ones. This is achieved through the CREATE TYPE and CREATE DOMAIN statements. UDTs can be utilized to create distinct types, structured types, reference types, arrays, rows, or cursors.

User-defined types are significant in SQL because they enhance data modeling, allowing developers to establish types that more accurately reflect business concepts. This results in a more readable and semantic database schema, promoting the reusability of validation or complex structures.

Many prominent modern relational database management systems (DBMSs) and object-relational database systems implement and utilize UDTs, with notable examples including PostgreSQL, SQL Server, Oracle Database, and DuckDB. MySQL and MariaDB lack native support for User-Defined Types (UDTs) in the same intricate and robust manner found in databases like Oracle or PostgreSQL.

Listing 1 shows an example of how to create the types “color” and “body_type”, and also create the table “car”. When we try to insert values ​​delimited by the types, the insertion works, but when we try to insert a value outside of the possible values, it returns an error. As can be seen in Listing 2.

CREATE DOMAIN body_type VARCHAR(5)
     CHECK (VALUE IN ('van', 'sedan', 'hatch'));
CREATE TYPE color AS ENUM ('red', 'green', 'blue');

CREATE TABLE cars (
 type body_type,
 color color
);

Listing 1: Creation of domain body_type and ENUM type color.

INSERT INTO cars (type, color) VALUES ('van', 'blue');
-- Query returned successfully in 117 msec.
INSERT INTO cars (type, color) VALUES ('LL', 'blue');
-- ERROR:  value for domain body_type violates check constraint "body_type_check"

Listing 2:

Control-flow

Control flow functions in SQL are special functions that allows you to add conditional logic in your queries. Instead of simply retrieving data based on static conditions, these functions let you execute different actions or return different values depending on whether certain conditions are met. Think of them as the “if-then-else” statements of SQL.

Control flow functions are incredibly important for several reasons. You can create more detailed and granular reports by categorizing data or displaying customized messages based on specific criteria. For example, classifying sales as “High”, “Medium” or “Low” based on their values. Let’s go back to the example of transactions in New York and San Francisco.

The vast majority of modern relational Database Management Systems (DBMSs) support flow control functions. Functions such as CASE, IF, IFNULL, NULLIF, and COALESCE are essential for adding conditional logic to your queries and stored procedures. But each DBMS has its own implementation; for more information, see the documentation: SQL Server, MySQL, PostgeSQL, MariaDB.

iddatecityamount
11 de nov. de 2020San Francisco42.065
21 de nov. de 2020New York112.985
32 de nov. de 2020San Francisco221.325
42 de nov. de 2020New York49.900
52 de nov. de 2020New York98.030
63 de nov. de 2020San Francisco87.260
73 de nov. de 2020San Francisco345.225
83 de nov. de 2020New York56.335
94 de nov. de 2020New York184.310
104 de nov. de 2020San Francisco170.500

Table 1: Table of transactions in New York City and San Francisco.

SELECT
 id,
 date,
 city,
 amount,
 CASE
 WHEN amount > 150 THEN 'High'
 WHEN amount BETWEEN 80 AND 150 THEN 'Medium'
 ELSE 'Low'
 END AS transaction_classification
FROM transactions;
iddatecityamounttransaction_classification
11 de nov. de 2020San Francisco42.065Low
21 de nov. de 2020New York112.985Medium
32 de nov. de 2020San Francisco221.325High
42 de nov. de 2020New York49.900Low
52 de nov. de 2020New York98.030Medium
63 de nov. de 2020San Francisco87.260Medium
73 de nov. de 2020San Francisco345.225High
83 de nov. de 2020New York56.335Low
94 de nov. de 2020New York184.310High
104 de nov. de 2020San Francisco170.500High

Listing 3: How to use the CASE flow control clause to classify transactions into “low”, “medium” and “high”.

Window Function

Window functions have been available in SQL since SQL:2003 and are supported by all major SQL database systems. Notable among them are DuckDB, MySQL, PostgreSQL, Oracle Database, SQL Server, and DB2.

While traditional aggregate functions (such as SUM, AVG, COUNT), when used with GROUP BY, collapse multiple rows into a single summarized row by group, window functions operate on a set of rows relative to the current row, called the window frame, without grouping them. They use the OVER() clause to define the “window” (the set of rows) over which the function should operate. As shown in Figure 4.

They are helpful for tasks such as calculating moving averages, calculating cumulative statistics, or accessing row values ​​relative to the current row’s position (ranking values). Listing 4, for example, we are using the window function ROW_NUMBER to rank the data in table 1.

It is worth noting that we can also use traditional aggregation functions with the OVER clause. In Listing 5 for example, we use the SUM function to calculate the percentage of total transactions by city. The ROUND function is being used to round off the final results.

Figure 4: Difference on aggregation and window functions.

SELECT
    id,
    date,
    City,
    amount,
    ROW_NUMBER() OVER (PARTITION BY city ORDER BY amount DESC) AS  ranking
FROM transactions;
iddatecityamountranking
94 de nov. de 2020New York184.3101
21 de nov. de 2020New York112.9852
52 de nov. de 2020New York98.0303
83 de nov. de 2020New York56.3354
42 de nov. de 2020New York49.9005
73 de nov. de 2020San Francisco345.2251
32 de nov. de 2020San Francisco221.3252
104 de nov. de 2020San Francisco170.5003
63 de nov. de 2020San Francisco87.2604
11 de nov. de 2020San Francisco42.0655

Listing 4: Use o row_number in a window to rank transactions by amount in each year

SELECT
 id,
 date,
 city,
 amount,
 ROUND((amount / SUM(amount) OVER (PARTITION BY city)) * 100.0, 2) AS  percentage
FROM transactions;
iddatecityamountpercentage
21 de nov. de 2020New York112.98522,53
42 de nov. de 2020New York49.9009,25
83 de nov. de 2020New York56.33511,23
94 de nov. de 2020New York184.31036,75
52 de nov. de 2020New York98.03019,55
104 de nov. de 2020San Francisco170.50019,68
32 de nov. de 2020San Francisco221.32525,55
63 de nov. de 2020San Francisco87.26010,07
73 de nov. de 2020San Francisco345.22539,85
11 de nov. de 2020San Francisco42.0654,86

Listing 5: Using the SUM function as a window function

JSON Support

JavaScript Object Notation (JSON) is a widely used format for storing and exchanging data, especially in web applications. Its format accommodates various data types, including arrays, integers, strings, nulls, and more, as illustrated in Figure 5. Many databases today support a JSON datatype, such as DuckDB, MySQL, PostgreSQL, Oracle Database, and SQL Server. Others, like SQLite, lack a JSON type but offer functions for handling JSON data.

JSON support typically involves accessing the fields within the JSON using dot notation or a variation of arrow notation. Additionally, it allows for merging different JSON objects, verifying if JSON is correctly formatted, transforming it into an array, and writing or deleting data. For example, Listing 1 demonstrates how to retrieve values from a JSON object by specifying the desired key.

Figure 5: JSON example.

SELECT JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name');

+------------------------------------------------+
| JSON_EXTRACT('{"id": 14, "name": "Aztalan"}', '$.name') |
+---------------------------------------------+
| "Aztalan"            
+---------------------------------------------+

Listing 6: Command to extract value given json key and result.

Multidimensional Arrays

According to SQL Support for Multidimensional Arrays In some situations it is needed to go beyond the 2d representation of data and use what is called Multidimensional Arrays (MDAs) to represent more complex data. Some examples of that can be to model weather data along with image processing and time series data. MDAs are basically arrays that have element other arrays, starting with 3 dimensional arrays and going to any arbitrary number of dimensions. A representation for the 3d case is seen in Figure 5.

Although it is not clear the extent of support of most DBs for MDAs DuckDB, OracleDB and PostegreSQL have documentation on how to create them based on the widely adopted array data type. One example is shown in the listing 3 for DuckDB .

Figure 6: Illustration of 3d MDA.

SELECT array_value(array_value(1, 2), array_value(3, 4), array_value(5, 6));

Listing 7: Creation of MDA, called nested array in DuckDB.

Present

Currently there are many implementations of SQL, although there is an official standard of SQL each vendor must certify itself as compliant. This happened since 1996 when the National Institute of Standards and Technology (NIST) stopped certifying SQL DBMS’. The list of vendors that support SQL is ever growing, some worth mentioning are: MySQL and MariaDB, Oracle Database, PostgreSQL and Microsoft SQL Server, see Figure 5.

Even with SQL popularity there are a number of languages that aim to make use of the relational database model. Some aim to tackle different use cases such as hierarchical data, object oriented programming and others.  The technologies below are alternatives to the SQL language:

Figure 7: List of some of the most popular relational databases.

Bibliography

[1]Wikipedia Contributors, “SQL,” Wikipedia, Dec. 11, 2018. https://en.wikipedia.org/wiki/SQL 

[2]AWS, “What is SQL (Structured Query Language)?”, AWS. https://aws.amazon.com/what-is/sql/?nc1=h_ls

[3]Alura, “banco dados mysql executando procedures”, Alura. https://www.alura.com.br/conteudo/banco-dados-mysql-executando-procedures

[4]IONOS editorial team, “Learn SQL – A tutorial with examples”, IONOS, Nov. 16, 2024. https://www.ionos.com/digitalguide/server/configuration/sql-introduction-with-examples/

[5]”A Brief History of SQL and its Usefulness”, COGINITI. https://www.coginiti.co/tutorials/introduction/what-is-sql/

[6]”8.15. Arrays” Disponivel em: https://www.postgresql.org/docs/17/arrays.html

[7]”Creating and Populating Multi-Dimensional Arrays” Disponivel em: https://docs.oracle.com/cd/E92519_02/pt856pbr3/eng/pt/tpcr/task_CreatingandPopulatingMulti-DimensionalArrays-071663.html?pli=ul_d38e251_tpcr

[8]”Array Type” Disponível em: https://duckdb.org/docs/stable/sql/data_types/array.html

[9]”Array Functions” Disponível em: https://duckdb.org/docs/stable/sql/functions/array.html

[10]MIŠEV, Dimitar e BAUMANN, Peter. SQL Support for Multidimensional Arrays Disponível em: https://www.ifis.uni-luebeck.de/~moeller/Lectures/WS-19-20/NDBDM/12-Literature/Misev-Baumann-SQL-MDA.pdf

[11]”JSON data in SQL Server” Disponível em: https://learn.microsoft.com/en-us/sql/relational-databases/json/json-data-sql-server?view=sql-server-ver17

[12]”JSON functions (Transact-SQL)” Disponível em: https://learn.microsoft.com/en-us/sql/t-sql/functions/json-functions-transact-sql?view=sql-server-ver17

[13]”9.16. JSON Functions and Operators” Disponível em: https://www.postgresql.org/docs/current/functions-json.html

[14]SAXON, Chris. “How to Store, Query, and Create JSON Documents in Oracle Database” Disponível em: https://blogs.oracle.com/sql/post/how-to-store-query-and-create-json-documents-in-oracle-database

[15]”8.14. JSON Types” Disponível em: https://www.postgresql.org/docs/current/datatype-json.html

[16]”13.5 The JSON Data Type” Disponível em: https://dev.mysql.com/doc/refman/8.4/en/json.html

[17]”JSON Overview” Disponível em: https://duckdb.org/docs/stable/data/json/overview.html

[18]”JSON Functions And Operators” Disponível em: https://www.sqlite.org/json1.html

[19]ETI-MFON, Ime. “Control Flow Functions in SQL” Disponível em: https://medium.com/@etimfonime/control-flow-functions-in-sql-9b66f8830da5

[20]”Control-of-Flow” Disponível em: https://learn.microsoft.com/en-us/sql/t-sql/language-elements/control-of-flow?view=sql-server-ver17

[21]”15.6.5 Flow Control Statements” Disponível em: https://dev.mysql.com/doc/refman/8.4/en/flow-control-statements.html

[22]”9.18. Conditional Expressions” Disponível em: https://www.postgresql.org/docs/current/functions-conditional.html

[23]”Manipulate user-defined type (UDT) data” Disponível em: https://learn.microsoft.com/en-us/sql/relational-databases/clr-integration-database-objects-user-defined-types/working-with-user-defined-types-manipulating-udt-data?view=sql-server-ver16

[24]”36.13. User-Defined Types” Disponível em: https://www.postgresql.org/docs/current/xtypes.html

]]>
https://dataindepth.site/2025/06/06/sql-from-past-to-present/feed/ 0