Schema
DatabaseSchema: The setup information for your database is stored in a special file called the “Schema”, and this is updated whenever you make changes to the structure of your database. Think of the schema as saying “here’s our database and it’s got a couple tables. The first table is ‘users’ and it’s got columns for ‘ID’ (which is an integer), ‘name’ (which is a bunch of characters), ‘email’ (which is a bunch of characters)
Indexes
CREATE INDEXES
Create indexes, which basically do all the hard work of sorting your table ahead of time, for columns that you’ll likely be using to search on later (like username)… it will make your database much faster.
Effects:
- Speed up queries (SELECT, JOIN, WHERE, ORDER BY).
- Reduce disk I/O and improve efficiency in large tables.
- Ensure data integrity with unique indexes.
- Must be used wisely as too many indexes can slow down INSERT, UPDATE and DELETE.
Tips
- SQL likes semicolons at the end of lines and using single quotes
'
instead of double quotes"
.
Statement structure
Every CRUD command in SQL contains a few parts
The action statement
, the table
it should run on, and theclauses
.
// If you just do an action on a table without specifying conditions, it will apply to the whole database and you’ll probably break something.
For example, to update a row in a table
UPDATE usersSET name='barfoo', email='bar@foo.com'WHERE email='foo@bar.com';
Syntax and commands
SELECT
Fetches data from a database Syntax like so:
SELECT column1, column2, ...
FROM table_name;
- To select all columns from a table:
SELECT * FROM table_name
- The
DISTINCT
clause is used to return only distinct (different) values:SELECT DISTINCT Country FROM Customers;
Projection: determines which columns from a table are returned by the query (according to this webpage)
Sub Queries
These can be placed anywhere a Table is expected
WHERE
This statement is used to filter records. It is used to extract only those records that fulfill a specified condition. Example:
SELECT * FROM Customers
WHERE Name="Bob";
Logical operators
These include the run of the mil <,
<=
, >
, and >=
.
- Equality is
=
for numeric values andLIKE
for string values - The
IN
operator does it exactly what it sounds like, it’s used to state a value is in a collection(a, b, c,...)
or in a sub-query. - For and and or, it’s just
and
andor
. - The
BETWEEN
operator is for selecting values in a given range (inclusive.
SELECT * FROM Products
WHERE Price BETWEEN 10 AND 20;
UPDATE
Updates data in a database. Syntax example:
UPDATE table_name
SET column1 = value1,
column2 = value2, ...
WHERE condition;
GROUP BY and ORDER BY
The GROUP BY
statement is used with Aggregate functions, which are placed in the SELECT
statement along with the columns to group by.
The ORDER BY
sorts the resulting set in ascending or descending order using a specific column.
General syntax example:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
ORDER BY column_name(s) ASC|DESC;
Example:
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country;
This groups the counts by by their respective countries
Aggregate Functions
These include
COUNT()
MAX()
MIN()
SUM()
AVG()
HAVING
This clause was added later in SQL’s life to be able to use aggregate functions to conditional statements, as we cannot use them with WHERE
.
Syntax:
SELECT column_name(s)
FROM table_name
WHERE condition
GROUP BY column_name(s)
HAVING condition
ORDER BY column_name(s);
For example:
SELECT COUNT(CustomerID), Country
FROM Customers
GROUP BY Country
HAVING COUNT(CustomerID) > 5;
This query returns the number of customers per country but only the ones with more than five customers…
We use the aggregate function in the HAVING
clause.
Aliases
Are a way to give a table or column a temporary name, they make queries more readable.
To assign an alias we use the AS
keyword like so
SELECT CustomerName AS name
FROM customers
WHERE customers.name LIKE "BOB %";
More
-
DELETE
- deletes data from a database -
INSERT INTO
- inserts new records (tables) into a database -
CREATE DATABASE
- creates a new database -
ALTER DATABASE
- used to add, remove, or modify a database’s configuration and constraints. -
CREATE TABLE
- creates a new table -
ALTER TABLE
- used to add, remove, or modify a table’s columns Or change constraints. -
DROP TABLE
- deletes a table -
CREATE INDEX
- creates an index (search key) -
DROP INDEX
- deletes an index -
We can signal out a primary key for a column using the keyword
PRIMARY KEY
- The feature
AUTO_INCREMENT
allows SQL to handle unique IDs and therefore don’t need include a primary key when adding a new tuple
- The feature
-
To enforce no null values for a column we use the
NOT NULL
keyword -
And the
UNIQUE
keyword ensures that for a particular column, all the values are unique and we have no repetition -
TEXT
is different fromVARCHAR(n)
where the latter is used to short strings (a few words), the prior is used to store longer strings such as whole sentences and paragraphs and whatnot.
Common examples
Inserting a tuple into a table
INSERT INTO tableName (column1, column2, column3)
VALUES (....,....,....);
--AUTO_INCREMENT handling the primary key in this example
Limit results when selecting from a table
We can limit the resulting table we get from a SELECT query by using the keyword LIMIT n
which limits the results to n
rows
SELECT stdName, stdGPA
FROM Students
LIMIT 10
WHERE Students.stdGPA > 3.5;
-- this will only return 10 students even if there are more with GPAs over 3.5
Ordering the query
We can order the resulting table using the keywords ORDER BY <ordType>
Such that ordType
can ASC
or DESC
Searching for specific string values
The keyword LIKE is basically a full on comparator of strings Example:
SELECT stdName, stdGPA
FROM Students
LIMIT 2
WHERE Students.stdGPA > 3.5 AND Students.stdName LIKE "BoB%";
Views
Views are virtual tables that are returned by a query. Views are just like real tables with columns and rows, but act as a window into the larger data set or table. They don’t take physical memory space and are handy for more efficient, focused retrieval.
“You can think of a view as a query result rather than way to store data, they are mostly used to display often used relationships between tables or often used queries Also views do not take up physical memory and need to be manually updated by recreating it.”
Views need to be updated by recreating them, as updating the actual data doesn't reflect onto the view automatically.
Uses
- Better for privacy when needing to share certain data with an outsider or 3rd party.
- Can be used wherever a table’s expected.
Creating Views
CREATE VIEW view_name AS
SELECT column1, column2, ...
FROM table_name
WHERE condition;
For example:
CREATE VIEW [Products Above Average Price] AS
SELECT ProductName, Price
FROM Products
WHERE Price > (SELECT AVG(Price) FROM Products);
We can then query these views as we do with any other table
SELECT * FROM [Products Above Average Price];
Updating Views
To update views we need to recreate them as mentioned before.
This is done using CREATE OR REPLACE
Example:
CREATE OR REPLACE VIEW [Brazil Customers] AS
SELECT CustomerName, ContactName, City
FROM Customers
WHERE Country = 'Brazil';
RELATIONAL TABLES
Have a primary key and a foreign key referencing another relation (table) and here is an example of how you create such a table
CREATE TABLE tableName(
id INT AUTO_INCREMENT
prop1 VARCHAR(5)
prop2 INT
prop3 INT NOT NULL
PRIMARY KEY (id)
FOREIGN KEY (prop3) REFERENCES otherTableName(id)
JOIN
Used to query tables that have relationships i.e. merging tables for a query. Tthere are four types, and the type used determines the data returned from the query.
JOINs combine the columns from two tables together into a returned result, these new rows are referred to as tuples.
A JOIN needs two columns of the same type to work, preferably a primary key in one table and the foreign key in another table
Syntax: <JOIN TYPE> JOIN table2 ON condition
For example:
SELECT Orders.OrderID, Customers.CustomerName, Orders.OrderDate
FROM Orders
INNER JOIN Customers ON Orders.CustomerID=Customers.CustomerID
When specifying conditions with JOINS we use the keyword ON
instead of WHERE
INNER JOIN
The default JOIN in SQL. Only returns the tuples in the left and right that do have the relationship AND do reference one another.
Example:
SELECT *
FROM Customer
JOIN event
ON Customer.customer_id = event.customer_id;
// If the data in the columns your joining isn’t unique, the resulting table will contain duplicates.
LEFT JOIN
Left here refers to the first table. A LEFT JOIN
will return all the tuples in Left even if no relation to tuples in Right (will probably get a few null values).
SELECT *
FROM Customer
LEFT JOIN event
ON Customer.customer_id = event.customer_id;
RIGHT JOIN
RIGHT refers to the table being joined on. A RIGHT JOIN
will return all the tuples in Right even if no relation to tuples in Left.
This is great for when we want to get all the rows in the table we’re joining on and
OUTER JOIN
Combines both the left and right tables together, sometimes referred to as a FULL JOIN. Combines columns from all the tables based 1+ common dimensions (columns). // Outer Join isn’t supported in all flavors of SQL, for instance MySQL doesn't support it.
UNION JOIN
Combines two datasets together, mashing columns that have the same name and type together
To do this we use the UNION
keyword followed by another select.
Example:
SELECT age
FROM teacher
UNION
SELECT age
FROM student;
UNION ALL
will not ignore duplicate fields.
CROSS JOIN
Doesn’t search for matches between the two datasets. Instead, Each row in the first table will be attached to a row in the second table.
SELECT
teacher.name, student.name
FROM teacher
CROSS JOIN student;
This example will join every teacher’s name with every student’s name.
Normalization
The process of fixing the problems that data sets will often suffer from, failures of data-integrity. These include null values where there shouldn’t be any, duplicate values, double values for columns that should have a single value, non-unique identifiers, etc. Normal forms are basically criteria sets, 1NF, 2NF, 3NF, etc.
Normalized Tables are:
- Better structured leading to easier understanding of the data.
- Extendable.
- Protected from:
- Insertion anomalies
- Update anomalies
- deletion anomalies
Anomalies:
// TODO
- Insertion
- Deletion
- Update
First Normal Form - 1NF
The first normal form, 1NF is the bare minimum.
Rules:
- Row order must not be used to convey information.
- Columns must contain only one data type, i.e. all values in a column are the same type.
- Each row, and by extension an entire table, must have a primary key.
- Each cell must include atomic values, no data grouped together.
Violation examples:
- Row Order is used to convey information. // Don’t confuse this with sorting! If the order is important then use a column with relevant data to convey that information.
- Mixing Data Types in the same column.
- Not having a unique PKs for each row
- When groups of repeating data are stored together in a single row.
Second Normal Form - 2NF
This form is concerned with how the non-key columns relate to the PK of the table, and is useful for preventing anomalies. In 2NF, each non-key column must depend on the entire PK, this accounts for Composite keys.
Rules:
- All the 1NF rules.
- Each attribute (column) is functionally dependent on the entire PK.
Violation Examples
- Having an attribute that depends on a part of the PK.
Third Normal Form - 3NF
The 3NF is as far as I’m concerned the extent of normalization, but there are other forms.
Rules
- All of the 2NF rules.
- Every non-key attribute must depend on the key, the whole key, and nothing but the key.