The schema of a database is the logical structure that defines the organization of data.
The schema defines the structure (sometimes called the intention) of the database. It is relatively static and changes only through explicit schema modification. The actual data stored at any point in time is the database instance (or extension), which must conform to the schema.
A database instance (or extension) describes the content stored in a database at a given point in time. If the database has a fixed schema, every database instance must conform to it — that is, every tuple in the instance must respect the domains, key constraints, and functional dependencies defined by the schema.
An entity-relationship diagram is a method of conceptually describing the schema of a database in a human viewable format.
The entities of an entity-relationship diagram describes a "thing" that is capable of an independent existence that can be uniquely identified. An entity is a thing that exists either physically or logically.
An entity may be a physical object such as a house or a car (they exist physically), an event such as a house sale or a car service, or a concept such as a customer transaction or order (they exist logically as a concept).
Entities can typically be thought of as nouns. Examples include a computer, an employee, a song, or a mathematical theorem.
The relationships of an entity-relationship diagram captures how entities are related to each other. Relationships can be thought of as verbs linking two or more nouns.
Examples include an owns relationship between a company and a computer, a supervises relationship between an employee and a department, a performs relationship between an artist and a song, and a proves relationship between a mathematician and a conjecture.
The cardinality of a relationship describes the number of relationships that an entity can participate in. There are generally 4 types of cardinality:
The participation of a relationship describes whether an entity must be involved in that relationship in order to exist. There are generally 2 types of participation:
When designing an entity-relationship diagram, the entities are drawn as squares with their descriptive name (typically a noun) in the centre, these are then connected to the attributes using a line which are drawn as circles with their respective attribute keys in the middle.
Relationships are drawn as a vertical rhombus (resembling a kite) with a descriptive name in the centre (typically a verb) in the centre. These relationships are then linked to their constituting entities.
The lines to relationships can be "plain" or have arrows, these combinations of arrows an plane lines describes the cardinality of the relationship.
Two plane lines indicate a many-to-many relationship
Example of a many-to-many relationship between actors who work on movies.
One plane line and an arrow indicate a one-to-many or a many-to-one relationship
Example of a one-to-many relationship between directors who direct movies.
Two arrows indicate a one-to-one relationship
Example of a one-to-one relationship between a country and their capital.
When an entity requires total participation it's line to the relationship is drawn bolded.
Example of a one-to-many relationship between directors who direct movies, where movies are required to have a relationship to a director.
A weak entity is an entity that cannot be uniquely identified by it's own attributes alone. It instead relies on a relationship with another entity (known as it's owner entity), to uniquely identify it.
Weak entities must participate in a one-to-many relationship with it's owner entity, called it's identifying relationship (typically a belongs to relationship). Weak entities must have total participation in their identifying relationships sets meaning every weak entity must be related to an owner.
The identifying key of a weak entity is a combination of it's own partial key (discriminator) and the primary key of the owner entity. In a entity-relationship diagram the identifying key is underlined with a dashed line, and weak entities are drawn with thick lines and the same with their identifying relationship.
In the same way as object oriented languages represent inheritance attributes can be inherited through generalization. This is a done through a IsA relationship between two or more entities.
In a entity-relationship diagram we show this be drawing a relationship with a triangle with the label IsA and then connecting it to the relevant entities. On corner (the "top") is connected to the "parent" entity (the one which we inherit attributes from) and then the opposite face is connected to any entities which inherit it's attributes.
Specialization is a method of dividing entities into their sub-entities based on it's characteristics. This is a done through a IsA relationship between two or more entities.
In a entity-relationship diagram we show this be drawing a relationship with an upside down triangle with the label IsA and then connecting it to the relevant entities. On the upper face (the "top") is connected to the "parent" entity (the one which we specialize from) and then the opposite faces are connected to any entities which are specializations of that entity.
Aggregation is a method of modelling relationships between other relationships which is typically not allowed in entity-relationship diagrams. This is most useful for adding descriptive attributes or relationships to existing associations.
We show aggregation in a entity-relationship diagram by surrounding the entities to be aggregated in a dashed box and then drawing a line from the relationships.
SQL is a language for representing operations and relationships in a relational database. SQL was original based on relational algebra and can be broken down into two sub languages DDL and DML.
DDL is a sub language of SQL which describes the syntax of the handling of schemas.
Tables are created using the CREATE TABLE directive, and then specifying the name of the table as well as the schema of it's fields.
CREATE TABLE person (
person_id SMALLINT UNSIGNED PRIMARY KEY,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(20) NOT NULL
);
Create a table
personwith a primary key and first and last name fields.
Tables can be modified using the ALTER TABLE directive with the table name and then a set of operations:
ADD <column_name> <datatype> - Add a new column with the name column_name and of type datatype.DROP <column_name> - Removes the column of column_name from the table.ALTER TABLE person
ADD middle_name VARCHAR(20) NOT NULL;
Tables can be removed using the DROP TABLE or DROP TABLE IF EXISTS directives the IF EXISTS variant prevents an error if the table doesn't exist.
DROP TABLE IF EXISTS person;
Remove the table
personif it was created already.
Table constraints are used to specify rules for certain columns in a table they are placed after the column datatype when creating or modifying a table.
CREATE TABLE person (
person_id SMALLINT UNSIGNED PRIMARY KEY,
-- ^–------––– Constraint
first_name VARCHAR(20) NOT NULL,
-- ^–––––––– Constraint
last_name VARCHAR(20) NOT NULL
-- ^–––––––– Constraint
);
List of Constraints
NOT NULL - Ensures the column is not empty (null value).UNIQUE - All values in the column are different.PRIMARY KEY - Specifies that it's the unique identifier for each row in a table and is a combination of NOT NULL and UNIQUE.FOREIGN KEY - Prevents actions that would destroy the links between tables (must reference a primary key of another row - typically from another table)CHECK - Ensures the column satisfies a given boolean expression (for example CHECK (age >= 18))DEFAULT - Provides a default value if no value is specified.Some constraints like PRIMARY KEY, FOREIGN KEY can be declared out of line and can accept a combination of multiple columns.
CREATE TABLE enrollment (
student_id INT UNSIGNED NOT NULL,
course_id INT UNSIGNED NOT NULL,
enrolled_on DATE NOT NULL,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES student(student_id),
FOREIGN KEY (course_id) REFERENCES course(course_id)
);
Views are virtual tables created from queries on one or more base tables. A view does not store data; instead, it displays data stored in other tables dynamically. Views are useful for simplifying complex queries and restricting access to specific data.
Views are created using the CREATE VIEW directive, followed by the view name and the AS keyword with a SELECT query.
CREATE VIEW person_details AS
SELECT
person_id,
first_name,
last_name
FROM person
WHERE person_id > 0;
Create a view
person_detailsthat displays only active person records.
Views can be removed using the DROP VIEW or DROP VIEW IF EXISTS directives. The IF EXISTS variant prevents an error if the view doesn't exist.
DROP VIEW IF EXISTS person_details;
Remove the view
person_detailsif it exists.
To change a view's definition, use the CREATE OR REPLACE VIEW directive, which will either create the view if it doesn't exist or replace its definition.
CREATE OR REPLACE VIEW person_details AS
SELECT
person_id,
first_name,
last_name,
middle_name
FROM person
WHERE person_id > 0;
Replace the
person_detailsview with an updated definition that includes the middle name field.
DML is a sub language of SQL which describes the syntax of the retrieving and manipulating of database records.
DML consists of several core statements for working with data:
Retrieve data from tables. The SELECT statement includes:
Modify existing records in tables.
Combine rows from multiple tables based on related columns. Includes:
Combine results from multiple queries using Set theory operations:
Perform calculations on multiple rows:
Functional dependence describes a relationship between attributes of a relation. If the attribute Y is functionally dependent on X (X
For example consider a relation with the attributes PostalCode, City, and Province. If PostalCode City, Province, then for all tuples in the relation, if they have equal PostalCode values, they must have the same City and Province.
The
Functional dependence can indicate to us that we're storing redundant data and can help us decompose relations without losing information.
Given a set of functional dependences for a relation we can derive additional functional dependences through the following methods:
The following axioms (known as Armstrong's Axioms) are sound and complete — they can derive all and only the functional dependencies that are logically implied by a given set. A set of axioms can be used to transform functional dependences
City, Major City).SID City, then SID, Major City, Major).SID City and City AreaCode, then SID AreaCode)These axioms can then be used to derive a few additional rules for ease of use
SID AreaCode and SID City, then SID AreaCode, City)SID AreaCode, City then SID AreaCode and SID City)Alternatively we can use the "closure method", given a set of attributes X we denote the closure of it as X
The algorithm is expressed as
Let Closure = X
Until Closure doesn't change do
if a
Consider the schema SupplierPart(supplierName, city, status, partNumber, partName, quantity) and the functional dependences
supplierName citycity statuspartNumber partNamesupplierName, partNumber quantityFind the closure of supplierName, partNumber, supplierName, and partNumber.
(supplierName, partNumber)supplierName, partNumber}
supplierName city is a functional dependence and {supplierName} is a subset of (supplierName, partNumber)city to (supplierName, partNumber)
(supplierName, partNumber)supplierName, partNumber, city}
city status is a functional dependence and {city} is a subset of (supplierName, partNumber)status to (supplierName, partNumber)
(supplierName, partNumber)supplierName, partNumber, city, status}
partNumber partName is a functional dependence and {partNumber} is a subset of (supplierName, partNumber)partName to (supplierName, partNumber)
(supplierName, partNumber)supplierName, partNumber, city, status, partName}
supplierName, partNumber quantity is a functional dependence and {supplierName, partNumber} is a subset of (supplierName, partNumber)quantity to (supplierName, partNumber)
(supplierName, partNumber)supplierName, partNumber, city, status, partName, quantity}
We've exhausted all function dependences so this is our final closure. This final closure contains all attributes of the schema so we know (supplierName, partNumber) is a super key.
supplierNamesupplierName}
supplierName city is a functional dependence and {supplierName} is a subset of supplierNamecity to supplierName
supplierNamesupplierName, city}
city status is a functional dependence and {city} is a subset of supplierNamestatus to supplierName
supplierNamesupplierName, city, status}
We've exhausted all functional dependencies which are contained in the closure so this is our final closure.
partNumberpartNumber}
partNumber partName is a functional dependence and {partNumber} is a subset of partNumberpartName to partNumber
partNumberpartNumber, partName}
We've exhausted all functional dependencies which are contained in the closure so this is our final closure.
Using those three closures we can see that supplierName, partNumber is a candidate key as reducing either supplierName or partNumber means the key is no longer a super key.
A super key is a set of one or more attributes in a schema that uniquely identifies a tuple.
Formally, a set of attributes
Equivalently,
Every relation always contains at least one super key: the trivial key, which is the set of all its attributes.
Consider a table Employee with the following attributes
EmployeeID: Unique ID assigned by the companySSN: Social Security NumberName: Name of the employeeDepartment: The department the employee works atSome examples of super keys are the following sets:
{EmployeeID}: Is unique for every Employee{SSN}: Like the ID is unique for every Employee{EmployeeID, Name}: Via reflexivity as Name is redundant.{EmployeeID, SSN, Name, Department}: The trivial super key which already contains all columns.
A candidate key is any super key which cannot be further reduced by removing attributes — that is, no proper subset of it is also a super key.
Formally, a set of attributes
Every relation has at least one candidate key. All candidate keys of a relation have the same property of uniquely identifying tuples — the choice of which one to designate as the primary key is a design decision.
Consider a table Employee with the following attributes
EmployeeID: Unique ID assigned by the companySSN: Social Security NumberName: Name of the employeeDepartment: The department the employee works atSome examples of it's candidate keys are the following sets:
{EmployeeID}: Is unique for every Employee{SSN}: Like the ID is unique for every EmployeeConversely non-candidate keys which are still super keys are the following
{EmployeeID, SSN}: The SSN or EmployeeID are redundant keys to identify a record.{EmployeeID, Name}: The Name attribute is redundant.
A primary key is a designated candidate key that is primarily used to identify records.
Consider a table Employee with the following attributes
EmployeeID: Unique ID assigned by the companySSN: Social Security NumberName: Name of the employeeDepartment: The department the employee works atSome examples of it's candidate keys are the following sets:
{EmployeeID}: Is unique for every Employee{SSN}: Like the ID is unique for every `EmployeeGiven this list of candidate keys we can choose the primary key to be either EmployeeID or SSN. Typically we would choose EmployeeID to be our primary key as we would probably be generating it whereas SSN would be entered by the user.
A normal form is a property of a relation schema that characterizes it in terms of the redundancy it permits. Normal forms are defined with respect to functional dependencies (and in higher forms, multivalued and join dependencies) that hold on the schema.
Normal forms are ordered by strictness. Each successive normal form imposes additional constraints that eliminate further classes of redundancy:
That is, every BCNF relation is in 3NF, every 3NF relation is in 2NF, and every 2NF relation is in 1NF.
| Normal Form | Condition |
|---|---|
| 1NF | All attributes are atomic |
| 2NF | 1NF + no partial dependencies on the primary key |
| 3NF | 2NF + no transitive dependencies on the primary key |
| BCNF | Every non-trivial FD has a super key as its determinant |
The goal of applying successively stricter normal forms is to minimize update anomalies (insertion, deletion, and modification anomalies) that arise from redundant data storage without losing information. The process of restructuring a schema to satisfy a normal form is called database normalization.
First normal form is the most basic normal form. A relation is in first normal form if and only if every attribute contains only atomic (indivisible) values — that is, no attribute holds a set, list, or nested relation as its value.
A relation schema
This is essentially a requirement of the relational model itself. E.F. Codd's original definition of a relation requires that all attribute values are atomic, so any valid relation is in 1NF by definition.
A relation violates 1NF when an attribute contains:
PhoneNumbers attribute storing {555-1234, 555-5678}.FullName attribute storing a structured value with sub-components FirstName and LastName.Orders attribute containing an entire table of order records within a single tuple.To bring a relation into 1NF:
Most modern relational database systems enforce atomicity at the storage level, making 1NF violations uncommon in practice. However, the theoretical distinction remains important as 1NF is the foundation upon which all higher normal forms are built.
Second normal form is a normal form used in database normalization. Second normal form requires that a schema is in 1NF and that every non-key attribute is fully functionally dependent on the entire primary key, meaning no non-key attribute depends on only part of a composite key.
A schema is in second normal form if:
Consider a table Toothbrush with the following attributes
ManufacturerManufacturerCountryand the following functional dependencies
Manufacturer, Model ManufacturerCountryManufacturer ManufacturerCountryIs Toothbrush in 2NF? Justify why or why not and if it isn't decompose it such that it is.
Toothbrush is not in 2NF because ManufacturerCountry is only dependent on Manufacturer not on Model. To reconcile this we decompose it into two new tables
Toothbrush with the following attributes
Manufacturerand Manufacturer
ManufacturerManufacturerCountry
Third normal form is a normal form used in database normalization. Third normal form eliminates transitive dependencies of non-key attributes on the primary key.
3NF is strictly weaker than BCNF. The difference is that 3NF permits a non-trivial FD
The advantage of 3NF over BCNF is that it is always possible to find a decomposition into 3NF that is both loseless and dependency-preserving, which is not always achievable for BCNF.
A schema is in third normal form if for every non-trivial functional dependency
Equivalently, a schema is in 3NF if it is in 2NF and no non-prime attribute is transitively dependent on any candidate key.
A transitive dependency exists when
Consider a schema Student(StudentID, Department, DepartmentHead) with the following functional dependencies:
StudentID DepartmentDepartment DepartmentHeadThe only candidate key is {StudentID}.
This schema is in 2NF (the key is a single attribute, so partial dependencies are impossible), but it is not in 3NF because DepartmentHead is transitively dependent on StudentID through Department:
Here Department DepartmentHead violates 3NF because Department is not a super key and DepartmentHead is not a prime attribute.
To bring it into 3NF, we decompose into:
Student(StudentID, Department)Department(Department, DepartmentHead)
Boyce-Codd normal form is a normal form used in database normalization and is a slightly stricter version of 3NF. Boyce-Codd normal form removes all redundancies based on functional dependence.
If a schema is in Boyce-Codd normal form all of it's functional dependencies X
If a relational schema is in BCNF, then it is automatically also in 3NF because BCNF is a stricter form of 3NF.
Consider a table Address with the following attributes
HouseNumberStreetCityProvincePostalCodeand the following functional dependencies
HouseNumber, Street, PostalCode → CityHouseNumber, Street, PostalCode → ProvincePostalCode → CityPostalCode → ProvinceIs Address in BCNF? Justify why or why not and if it isn't decompose it such that it is.
Address is not in BCNF.
The candidate key is {HouseNumber, Street, PostalCode} (its closure yields all attributes). The dependencies PostalCode → City and PostalCode → Province violate BCNF because PostalCode alone is not a super key for Address.
To bring it into BCNF, we decompose using the violating dependency PostalCode → City, Province:
PostalArea(PostalCode, City, Province): Here PostalCode is a super key, so all functional dependencies satisfied.Address(HouseNumber, Street, PostalCode): No non-trivial functional dependencies remain that violate BCNF.This decomposition is loseless because the common attribute PostalCode is a super key for PostalArea.
Relational algebra is a procedural query language that defines a set of operations on relations. Each operation takes one or more relations as input and produces a new relation as output. This closure property means operations can be composed arbitrarily. Relational algebra serves as the theoretical foundation for query languages such as Structured Query Language.
The six fundamental operations are sufficient to express any relational algebra query. All other operations can be derived from these.
Selects tuples from a relation that satisfy a given predicate.
where
Example:
Selects specified attributes from a relation, removing duplicates from the result.
Example:
Returns all tuples that appear in either of two union-compatible relations.
Requires
Returns all tuples in the first relation that are not in the second.
Combines every tuple of one relation with every tuple of another.
where
Renames either the relation or its attributes.
renames relation
These operations can be expressed in terms of the fundamental operations but are used so frequently that they have their own notation.
Returns tuples present in both relations.
Combines tuples from two relations that agree on all common attributes, then projects out the duplicate columns. The natural join is central to the theory of lossless-join decomposition.
Returns tuples in
Assigns the result of a relational algebra expression to a temporary relation variable. This is purely notational convenience for breaking complex expressions into steps.
| Property | Operations |
|---|---|
| Commutative | |
| Associative | |
| Not commutative |
Selection is idempotent:
Selections can be cascaded:
Projections can be cascaded:
SQL is a language for representing operations and relationships in a relational database. SQL was original based on relational algebra and can be broken down into two sub languages DDL and DML.
DDL is a sub language of SQL which describes the syntax of the handling of schemas.
Tables are created using the CREATE TABLE directive, and then specifying the name of the table as well as the schema of it's fields.
CREATE TABLE person (
person_id SMALLINT UNSIGNED PRIMARY KEY,
first_name VARCHAR(20) NOT NULL,
last_name VARCHAR(20) NOT NULL
);
Create a table
personwith a primary key and first and last name fields.
Tables can be modified using the ALTER TABLE directive with the table name and then a set of operations:
ADD <column_name> <datatype> - Add a new column with the name column_name and of type datatype.DROP <column_name> - Removes the column of column_name from the table.ALTER TABLE person
ADD middle_name VARCHAR(20) NOT NULL;
Tables can be removed using the DROP TABLE or DROP TABLE IF EXISTS directives the IF EXISTS variant prevents an error if the table doesn't exist.
DROP TABLE IF EXISTS person;
Remove the table
personif it was created already.
Table constraints are used to specify rules for certain columns in a table they are placed after the column datatype when creating or modifying a table.
CREATE TABLE person (
person_id SMALLINT UNSIGNED PRIMARY KEY,
-- ^–------––– Constraint
first_name VARCHAR(20) NOT NULL,
-- ^–––––––– Constraint
last_name VARCHAR(20) NOT NULL
-- ^–––––––– Constraint
);
List of Constraints
NOT NULL - Ensures the column is not empty (null value).UNIQUE - All values in the column are different.PRIMARY KEY - Specifies that it's the unique identifier for each row in a table and is a combination of NOT NULL and UNIQUE.FOREIGN KEY - Prevents actions that would destroy the links between tables (must reference a primary key of another row - typically from another table)CHECK - Ensures the column satisfies a given boolean expression (for example CHECK (age >= 18))DEFAULT - Provides a default value if no value is specified.Some constraints like PRIMARY KEY, FOREIGN KEY can be declared out of line and can accept a combination of multiple columns.
CREATE TABLE enrollment (
student_id INT UNSIGNED NOT NULL,
course_id INT UNSIGNED NOT NULL,
enrolled_on DATE NOT NULL,
PRIMARY KEY (student_id, course_id),
FOREIGN KEY (student_id) REFERENCES student(student_id),
FOREIGN KEY (course_id) REFERENCES course(course_id)
);
Views are virtual tables created from queries on one or more base tables. A view does not store data; instead, it displays data stored in other tables dynamically. Views are useful for simplifying complex queries and restricting access to specific data.
Views are created using the CREATE VIEW directive, followed by the view name and the AS keyword with a SELECT query.
CREATE VIEW person_details AS
SELECT
person_id,
first_name,
last_name
FROM person
WHERE person_id > 0;
Create a view
person_detailsthat displays only active person records.
Views can be removed using the DROP VIEW or DROP VIEW IF EXISTS directives. The IF EXISTS variant prevents an error if the view doesn't exist.
DROP VIEW IF EXISTS person_details;
Remove the view
person_detailsif it exists.
To change a view's definition, use the CREATE OR REPLACE VIEW directive, which will either create the view if it doesn't exist or replace its definition.
CREATE OR REPLACE VIEW person_details AS
SELECT
person_id,
first_name,
last_name,
middle_name
FROM person
WHERE person_id > 0;
Replace the
person_detailsview with an updated definition that includes the middle name field.
DML is a sub language of SQL which describes the syntax of the retrieving and manipulating of database records.
DML consists of several core statements for working with data:
Retrieve data from tables. The SELECT statement includes:
Modify existing records in tables.
Combine rows from multiple tables based on related columns. Includes:
Combine results from multiple queries using Set theory operations:
Perform calculations on multiple rows:
The SELECT statement is used to retrieve data from a table, allowing you to query one or more columns.
SELECT first_name, last_name
FROM person;
Select the first and last names from the person table.
To select all columns, use the * wildcard:
SELECT * FROM person;
Use DISTINCT to return only unique values, eliminating duplicate rows:
SELECT DISTINCT last_name
FROM person;
Select all unique last names from the person table.
You can use DISTINCT with multiple columns:
SELECT DISTINCT first_name, last_name
FROM person;
Returns unique combinations of first and last names.
Use the WHERE clause to filter rows based on conditions:
SELECT first_name, last_name
FROM person
WHERE person_id = 5;
The IN operator checks if a value matches any value in a list or subquery result:
SELECT first_name, last_name
FROM person
WHERE person_id IN (1, 2, 3);
Select persons with IDs 1, 2, or 3.
Using IN with a subquery:
SELECT name
FROM student
WHERE student_id IN (
SELECT student_id
FROM enrollment
WHERE course_id = 101
);
Select students enrolled in course 101.
The NOT IN operator checks if a value does not match any value in a list or subquery result:
SELECT first_name, last_name
FROM person
WHERE person_id NOT IN (1, 2, 3);
Select all persons except those with IDs 1, 2, or 3.
Using NOT IN with a subquery:
SELECT name
FROM student
WHERE student_id NOT IN (
SELECT student_id
FROM enrollment
);
Select students who are not enrolled in any course.
Subqueries can be nested within other queries in various clauses:
In SELECT clause:
SELECT first_name, last_name,
(SELECT COUNT(*)
FROM enrollment
WHERE enrollment.student_id = student.student_id) AS course_count
FROM student;
Select students with their enrollment count.
In WHERE clause with comparison operators:
SELECT name
FROM student
WHERE student_id = (
SELECT student_id
FROM enrollment
WHERE course_id = 101
LIMIT 1
);
Using EXISTS:
SELECT name
FROM student
WHERE EXISTS (
SELECT 1
FROM enrollment
WHERE enrollment.student_id = student.student_id
AND course_id = 101
);
Select students who are enrolled in course 101.
Using NOT EXISTS:
SELECT name
FROM student
WHERE NOT EXISTS (
SELECT 1
FROM enrollment
WHERE enrollment.student_id = student.student_id
);
Select students who have no enrollments.
Set comparison operators allow you to compare a value against a Set of values returned by a subquery.
The ANY operator compares a value to any value in a Set. The condition is true if the comparison is true for at least one value in the Set.
SELECT name, grade
FROM student
WHERE grade > ANY (
SELECT grade
FROM student
WHERE course_id = 101
);
Select students whose grade is greater than at least one grade in course 101.
Common uses of ANY:
= ANY is equivalent to IN> ANY means greater than the minimum value< ANY means less than the maximum valueThe ALL operator compares a value to all values in a Set. The condition is true only if the comparison is true for every value in the Set.
SELECT name, grade
FROM student
WHERE grade > ALL (
SELECT grade
FROM student
WHERE course_id = 101
);
Select students whose grade is greater than all grades in course 101.
Common uses of ALL:
!= ALL (or <> ALL) is equivalent to NOT IN> ALL means greater than the maximum value< ALL means less than the minimum valueExample comparing ANY and ALL:
-- Students with salary higher than ANY engineer (at least one)
SELECT name, salary
FROM employee
WHERE salary > ANY (
SELECT salary FROM employee WHERE department = 'Engineering'
);
-- Students with salary higher than ALL engineers (every one)
SELECT name, salary
FROM employee
WHERE salary > ALL (
SELECT salary FROM employee WHERE department = 'Engineering'
);
Use ORDER BY to sort results in ascending (ASC) or descending (DESC) order:
SELECT first_name, last_name
FROM person
ORDER BY last_name ASC, first_name ASC;
Use LIMIT to restrict the number of rows returned:
SELECT first_name, last_name
FROM person
LIMIT 10;
The INSERT INTO statement is used to add new records to a table.
INSERT INTO person (person_id, first_name, last_name)
VALUES (1, 'John', 'Doe');
Insert a new person with ID 1.
Multiple rows can be inserted at once:
INSERT INTO person (person_id, first_name, last_name)
VALUES
(2, 'Jane', 'Smith'),
(3, 'Bob', 'Johnson');
Insert multiple records in a single statement.
The UPDATE statement is used to modify existing records in a table.
UPDATE person
SET first_name = 'Jonathan'
WHERE person_id = 1;
Update the first name of person with ID 1.
Multiple columns can be updated simultaneously:
UPDATE person
SET first_name = 'Jane', last_name = 'Doe'
WHERE person_id = 2;
Update multiple fields in a single statement.
The DELETE FROM statement is used to remove records from a table.
DELETE FROM person
WHERE person_id = 3;
Delete the person with ID 3.
To delete all rows from a table (use with caution):
DELETE FROM person;
This removes all records from the table while keeping the table structure intact.
The JOIN clause combines rows from multiple tables based on related columns.
SELECT student.name, course.title
FROM enrollment
INNER JOIN student ON enrollment.student_id = student.student_id
INNER JOIN course ON enrollment.course_id = course.course_id;
Join three tables together to get student names and their enrolled courses.
INNER JOIN returns only matching rows from both tables.
SELECT orders.order_id, customers.customer_name
FROM orders
INNER JOIN customers ON orders.customer_id = customers.customer_id;
Returns only orders that have a matching customer.
LEFT JOIN returns all rows from the left table and matching rows from the right.
SELECT customers.customer_name, orders.order_id
FROM customers
LEFT JOIN orders ON customers.customer_id = orders.customer_id;
Returns all customers, including those without orders.
RIGHT JOIN returns all rows from the right table and matching rows from the left.
SELECT customers.customer_name, orders.order_id
FROM customers
RIGHT JOIN orders ON customers.customer_id = orders.customer_id;
Returns all orders, including those without a matching customer.
FULL OUTER JOIN returns all rows when there's a match in either table.
SELECT customers.customer_name, orders.order_id
FROM customers
FULL OUTER JOIN orders ON customers.customer_id = orders.customer_id;
Returns all customers and all orders, matching them where possible.
Set operations combine the results of two or more queries into a single result Set. The queries must have compatible column structures (same number of columns with compatible data types).
The UNION operator combines the results of two queries, returning all unique rows from both result sets. This corresponds to the Union operation in set theory.
SELECT first_name, last_name FROM person_table_1
UNION
SELECT first_name, last_name FROM person_table_2;
Returns all unique names from both tables.
Use UNION ALL to include duplicate rows:
SELECT first_name, last_name FROM person_table_1
UNION ALL
SELECT first_name, last_name FROM person_table_2;
Returns all names from both tables, including duplicates.
The INTERSECT operator returns only the rows that appear in both query results. This corresponds to the Intersection operation in set theory.
SELECT first_name, last_name FROM person_table_1
INTERSECT
SELECT first_name, last_name FROM person_table_2;
Returns names that appear in both tables.
The EXCEPT operator (or MINUS in some databases) returns rows from the first query that do not appear in the second query. This corresponds to the Set Difference operation in set theory.
SELECT first_name, last_name FROM person_table_1
EXCEPT
SELECT first_name, last_name FROM person_table_2;
Returns names that are in the first table but not in the second.
Aggregate functions perform calculations on multiple rows and return a single value.
SELECT COUNT(*) FROM person;
Count the total number of rows in the person table.
COUNT() - Counts the number of rows.SUM() - Calculates the sum of values.AVG() - Calculates the average value.MIN() - Finds the minimum value.MAX() - Finds the maximum value.SELECT COUNT(*), AVG(grade), MAX(grade), MIN(grade)
FROM student;
Get count, average, maximum, and minimum grades from the student table.
SELECT SUM(salary)
FROM employee
WHERE department = 'Engineering';
Calculate total salary for engineering department.
Use GROUP BY to group rows and apply aggregate functions to each group:
SELECT course_id, COUNT(*) AS enrollment_count
FROM enrollment
GROUP BY course_id;
Count enrollments for each course.
Use HAVING to filter grouped results (similar to WHERE but for aggregated data):
SELECT course_id, COUNT(*) AS enrollment_count
FROM enrollment
GROUP BY course_id
HAVING COUNT(*) > 10;
Show only courses with more than 10 enrollments.
Difference between WHERE and HAVING:
WHERE filters rows before groupingHAVING filters groups after aggregation