Schema

DatabaseSchema: The setup information for your database is stored in a special file called the “Schema”, and this is updated whenever you make changes to the structure of your database. Think of the schema as saying “here’s our database and it’s got a couple tables. The first table is ‘users’ and it’s got columns for ‘ID’ (which is an integer), ‘name’ (which is a bunch of characters), ‘email’ (which is a bunch of characters)

Indexes

CREATE INDEXES Create indexes, which basically do all the hard work of sorting your table ahead of time, for columns that you’ll likely be using to search on later (like username)… it will make your database much faster. Effects:

  • Speed up queries (SELECT, JOIN, WHERE, ORDER BY).
  • Reduce disk I/O and improve efficiency in large tables.
  • Ensure data integrity with unique indexes.
  • Must be used wisely as too many indexes can slow down INSERT, UPDATE and DELETE.

Tips

  1. SQL likes semicolons at the end of lines and using single quotes ' instead of double quotes ".

Statement structure

Every CRUD command in SQL contains a few parts The action statement, the table it should run on, and theclauses. // If you just do an action on a table without specifying conditions, it will apply to the whole database and you’ll probably break something. For example, to update a row in a table

UPDATE usersSET name='barfoo', email='bar@foo.com'WHERE email='foo@bar.com';

Syntax and commands

SELECT

Fetches data from a database Syntax like so:

    SELECT column1, column2, ...
    FROM table_name;
  • To select all columns from a table: SELECT * FROM table_name
  • The DISTINCT clause is used to return only distinct (different) values: SELECT DISTINCT Country FROM Customers;
    Projection: determines which columns from a table are returned by the query (according to this webpage)

Sub Queries

These can be placed anywhere a Table is expected

WHERE

This statement is used to filter records. It is used to extract only those records that fulfill a specified condition. Example:

SELECT * FROM Customers
WHERE Name="Bob";

Logical operators

These include the run of the mil <, <=, >, and >=.

  • Equality is = for numeric values and LIKE for string values
  • The IN operator does it exactly what it sounds like, it’s used to state a value is in a collection (a, b, c,...) or in a sub-query.
  • For and and or, it’s just and and or.
  • The BETWEEN operator is for selecting values in a given range (inclusive.
SELECT * FROM Products  
WHERE Price BETWEEN 10 AND 20;

UPDATE

Updates data in a database. Syntax example:

UPDATE table_name
SET column1 = value1,
		column2 = value2, ...
WHERE condition;

GROUP BY and ORDER BY

The GROUP BY statement is used with Aggregate functions, which are placed in the SELECT statement along with the columns to group by. The ORDER BY sorts the resulting set in ascending or descending order using a specific column.

General syntax example:

SELECT column_name(s)  
FROM table_name  
WHERE condition
GROUP BY column_name(s)  
ORDER BY column_name(s) ASC|DESC;

Example:

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;

This groups the counts by by their respective countries

Aggregate Functions

These include

  • COUNT()
  • MAX()
  • MIN()
  • SUM()
  • AVG()

HAVING

This clause was added later in SQL’s life to be able to use aggregate functions to conditional statements, as we cannot use them with WHERE. Syntax:

SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s); 

For example:

SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;

This query returns the number of customers per country but only the ones with more than five customers… We use the aggregate function in the HAVING clause.

Aliases

Are a way to give a table or column a temporary name, they make queries more readable. To assign an alias we use the AS keyword like so

SELECT CustomerName AS name
FROM customers
WHERE customers.name LIKE "BOB %";
 

More

  • DELETE - deletes data from a database

  • INSERT INTO - inserts new records (tables) into a database

  • CREATE DATABASE - creates a new database

  • ALTER DATABASE - used to add, remove, or modify a database’s configuration and constraints.

  • CREATE TABLE - creates a new table

  • ALTER TABLE - used to add, remove, or modify a table’s columns Or change constraints.

  • DROP TABLE - deletes a table

  • CREATE INDEX - creates an index (search key)

  • DROP INDEX - deletes an index

  • We can signal out a primary key for a column using the keyword PRIMARY KEY

    • The feature AUTO_INCREMENT allows SQL to handle unique IDs and therefore don’t need include a primary key when adding a new tuple
  • To enforce no null values for a column we use the NOT NULL keyword

  • And the UNIQUE keyword ensures that for a particular column, all the values are unique and we have no repetition

  • TEXT is different from VARCHAR(n) where the latter is used to short strings (a few words), the prior is used to store longer strings such as whole sentences and paragraphs and whatnot.

Common examples

Inserting a tuple into a table

INSERT INTO tableName (column1, column2, column3)
VALUES (....,....,....);
 
--AUTO_INCREMENT handling the primary key in this example

Limit results when selecting from a table

We can limit the resulting table we get from a SELECT query by using the keyword LIMIT n which limits the results to n rows

SELECT stdName, stdGPA
FROM   Students
LIMIT  10
WHERE  Students.stdGPA > 3.5;
 
-- this will only return 10 students even if there are more with GPAs over 3.5

Ordering the query

We can order the resulting table using the keywords ORDER BY <ordType> Such that ordType can ASC or DESC

Searching for specific string values

The keyword LIKE is basically a full on comparator of strings Example:

SELECT stdName, stdGPA
FROM   Students
LIMIT  2
WHERE  Students.stdGPA > 3.5 AND Students.stdName LIKE "BoB%";

Views

Views are virtual tables that are returned by a query. Views are just like real tables with columns and rows, but act as a window into the larger data set or table. They don’t take physical memory space and are handy for more efficient, focused retrieval.

“You can think of a view as a query result rather than way to store data, they are mostly used to display often used relationships between tables or often used queries Also views do not take up physical memory and need to be manually updated by recreating it.”

Views need to be updated by recreating them, as updating the actual data doesn't reflect onto the view automatically.

Uses

  • Better for privacy when needing to share certain data with an outsider or 3rd party.
  • Can be used wherever a table’s expected.

Creating Views

CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition; 

For example:

CREATE VIEW [Products Above Average Price] AS  
SELECT ProductName, Price  
FROM Products  
WHERE Price > (SELECT AVG(Price) FROM Products);

We can then query these views as we do with any other table

SELECT * FROM [Products Above Average Price];

Updating Views

To update views we need to recreate them as mentioned before. This is done using CREATE OR REPLACE Example:

CREATE OR REPLACE VIEW [Brazil Customers] AS  
SELECT CustomerName, ContactName, City  
FROM Customers  
WHERE Country = 'Brazil';

RELATIONAL TABLES

Have a primary key and a foreign key referencing another relation (table) and here is an example of how you create such a table

CREATE TABLE tableName(
id     INT AUTO_INCREMENT
prop1  VARCHAR(5)
prop2  INT
prop3  INT NOT NULL
PRIMARY KEY (id)
FOREIGN KEY (prop3) REFERENCES otherTableName(id)
 

JOIN

Used to query tables that have relationships i.e. merging tables for a query. Tthere are four types, and the type used determines the data returned from the query.

JOINs combine the columns from two tables together into a returned result, these new rows are referred to as tuples.

A JOIN needs two columns of the same type to work, preferably a primary key in one table and the foreign key in another table

Syntax: <JOIN TYPE> JOIN table2 ON condition For example:

SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID

When specifying conditions with JOINS we use the keyword ON instead of WHERE

INNER JOIN

The default JOIN in SQL. Only returns the tuples in the left and right that do have the relationship AND do reference one another.

Example:

SELECT *
FROM Customer
JOIN event
ON Customer.customer_id = event.customer_id;

// If the data in the columns your joining isn’t unique, the resulting table will contain duplicates.

LEFT JOIN

Left here refers to the first table. A LEFT JOIN will return all the tuples in Left even if no relation to tuples in Right (will probably get a few null values).

SELECT *
FROM Customer
LEFT JOIN event
ON Customer.customer_id = event.customer_id;

RIGHT JOIN

RIGHT refers to the table being joined on. A RIGHT JOIN will return all the tuples in Right even if no relation to tuples in Left. This is great for when we want to get all the rows in the table we’re joining on and

OUTER JOIN

Combines both the left and right tables together, sometimes referred to as a FULL JOIN. Combines columns from all the tables based 1+ common dimensions (columns). // Outer Join isn’t supported in all flavors of SQL, for instance MySQL doesn't support it.

UNION JOIN

Combines two datasets together, mashing columns that have the same name and type together To do this we use the UNION keyword followed by another select. Example:

SELECT age
FROM teacher
UNION
SELECT age
FROM student;
  • UNION ALL will not ignore duplicate fields.

CROSS JOIN

Doesn’t search for matches between the two datasets. Instead, Each row in the first table will be attached to a row in the second table.

SELECT
	teacher.name, student.name
FROM teacher
CROSS JOIN student;

This example will join every teacher’s name with every student’s name.

Normalization

The process of fixing the problems that data sets will often suffer from, failures of data-integrity. These include null values where there shouldn’t be any, duplicate values, double values for columns that should have a single value, non-unique identifiers, etc. Normal forms are basically criteria sets, 1NF, 2NF, 3NF, etc.

Normalized Tables are:

  • Better structured leading to easier understanding of the data.
  • Extendable.
  • Protected from:
    • Insertion anomalies
    • Update anomalies
    • deletion anomalies

Anomalies:

// TODO

  1. Insertion
  2. Deletion
  3. Update

First Normal Form - 1NF

The first normal form, 1NF is the bare minimum.

Rules:

  1. Row order must not be used to convey information.
  2. Columns must contain only one data type, i.e. all values in a column are the same type.
  3. Each row, and by extension an entire table, must have a primary key.
  4. Each cell must include atomic values, no data grouped together.

Violation examples:

  1. Row Order is used to convey information. // Don’t confuse this with sorting! If the order is important then use a column with relevant data to convey that information.
  2. Mixing Data Types in the same column.
  3. Not having a unique PKs for each row
  4. When groups of repeating data are stored together in a single row.

Second Normal Form - 2NF

This form is concerned with how the non-key columns relate to the PK of the table, and is useful for preventing anomalies. In 2NF, each non-key column must depend on the entire PK, this accounts for Composite keys.

Rules:

  1. All the 1NF rules.
  2. Each attribute (column) is functionally dependent on the entire PK.

Violation Examples

  1. Having an attribute that depends on a part of the PK.

Third Normal Form - 3NF

The 3NF is as far as I’m concerned the extent of normalization, but there are other forms.

Rules

  1. All of the 2NF rules.
  2. Every non-key attribute must depend on the key, the whole key, and nothing but the key.

Violation Examples


References