23. Data Analytics - Analyze Data to Answer Questions - Week 4
Learning functions in spreadsheet and SQL.
Learning how to use PIVOTTABLES.
Learn to check and recheck as part of data validation process to make sure everything is accurate.
Whenever do calculations, recheck with another simple check process for accuracy test.
See if both processes have the same results.
Data validation process is a form of data cleaning.
Learn how to use temporary tables in SQL.
Definition:
Conditional formatting (Spreadsheet) // spreadsheet tool that changes how cells appear when values meet specific conditions
Array // a collection of values in cells
Profit margin // percentage that indicates how many cents of profit has been generated for each dollar of sale
Pivot tables (Spreadsheet) // let you view data in multiple ways to find insights and trends
Calculated Field (Spreadsheet Pivot Table) // a new field within a pivot table that carries out certain calculations based on the values of other fields
Operator (SQL) // a symbol that names the type of operation or calculation to be performed in a formula
Modulo // an operator (%) returns the remainder when one number is divided by another
Underscores // line used to underline words and connect text characters
Data validation process / checking and rechecking the quality of your data so that it is complete, accurate, secure and consistent
Temporary table // database table that is created and exists temporarily on a database server
Spreadsheet Calculation Functions:
- AVERAGE() // finds average
- MIN or MAX() // finds min and max
- COUNTIF // returns a total number of cells that matches a specified value. put conditions in
parenthesis. SYNTAX: COUNTIF(RANGE,"CONDITION")
- COUNTIFS // like COUNTIF but can include multiple conditions
- SUMIF // a function that adds numeric data based on one condition
SYNTAX: SUMIF(RANGE,"CONDITION",2ndRANGE)
- SUMIFS // like SUMIF but can include multiple conditions
- AVERAGEIF // like SUMIF but averages instead of summing
- MAXIFS // finds max value in a range with conditions
SYNTAX: MAXIFS(rangetofindmaxof, rangeofcriteria, "criteria")
- SUMPRODUCT // function that multiplies arrays and returns the sum of those products.
SYNTAX: SUMPRODUCT(array1,array2, . . .)
- MOD // modulus operator
SQL Temporary Tables:
*Temp tables are deleted after SQL session ends. They can hold results of different queries, this is
called staging.
Global temp tables are made available to all database users and dropped when all uses close it.
Local temp tables are made available to one user.
- WITH // a clause that is a type of temporary table that you can query from multiple times
SYNTAX:
WITH temptablename AS (
SELECT *
FROM tablename
WHERE condition )
// Now can call this temp table as a normal tablename
- INTO // copies data from one table to another but doesnt add the new table to database
SYNTAX: SELECT>INTO>FROM>WHERE
*Not recognized in BigQuery
- CREATE TABLE // creates a new table to database so anyone can use
SYNTAX: CREATE TABLE newtablename AS (SELECT>FROM>WHERE)
- CREATE TEMP TABLE // an alternative syntax of CREATE TABLE
- DROP TABLE // removes the table from database.
SYNTAX: DROP TABLE table_name
SQL Calculation Syntax:
- Addition:
SELECT column a, column b, column a + column b AS column c
FROM table_name
- Subtraction/Multiplication/Division:
Change the operator "+" to something else or use parentheses to use more operators
- Modulo Operator:
2 % 2 // this gives a remainder if it has any
- EXTRACT // lets us pull one part of a given date type data to use
SYNTAX: EXTRACT(YEAR FROM column) AS year,
- GROUP BY // command that group rows that have the same values from a table into
summary rows. Sums every related row into a single row.
SYNTAX: SELECT > FROM > WHERE > GROUP BY
- ORDER BY // lets us sort SQL data
SYNTAX: SELECT > FROM > WHERE > GROUP BY>ORDER BY
Types of data validation:
- Validation process makes sure the data is correct and have integrity before working on the data.
1. Data type
- Purpose // check the data matches the data type defined for a field. Format should match.
- EX: a date should be a date format
2. Data range
- Purchase // check that the data falls within an acceptable range of values defined for a field.
- EX: 3 is not in range of 1-12
3. Data constraints
- Purpose // check that the data meets certain conditions/criteria for a field.
- EX: ints and not floats
4. Data consistency
- Purpose // check that the data makes sense in the context of other related data.
- EX: product shipping data can't be earlier than production(made) date.
5. Data structure
- Purpose // check that the data follows or conforms to a set structure
- EX: web pages must follow a prescribed structure to be displayed properly
6. Code validation
- Purpose // check that the application code systematically performs any of the previously
mentioned validations during user data input.
- EX: a system accepting a password that went against the specified limits/constraints
Additional Resources:
https://exceljet.net/excel-functions/excel-ifs-function
https://exceljet.net/formula/vlookup-with-multiple-criteria
https://exceljet.net/formula/index-and-match-with-multiple-criteria
https://support.microsoft.com/en-us/office/using-if-with-and-or-and-not-functions-d895f58c-b36c-419e-b1f2-5c193a236d97
Pivot Tables in Excel:
https://support.microsoft.com/en-us/office/calculate-values-in-a-pivottable-11f41417-da80-435c-a5c6-b0185e59da77
https://exceljet.net/pivot-table/pivot-table-calculated-field-example
https://powerspreadsheets.com/pivottable-calculated-fields/
https://support.microsoft.com/en-us/office/sort-data-in-a-pivottable-or-pivotchart-e41f7107-b92d-44ef-861f-24430830450a
https://www.tutorialspoint.com/excel_pivot_tables/excel_pivot_tables_sorting_data.htm
https://exceljet.net/lessons/how-to-sort-a-pivot-table-by-value
https://support.microsoft.com/en-us/office/filter-data-in-a-pivottable-cc1ed287-3a97-4e95-b377-ddfafe79fa8f
https://www.dummies.com/software/microsoft-office/excel/how-to-filter-excel-pivot-table-data/
https://support.microsoft.com/en-us/office/design-the-layout-and-format-of-a-pivottable-a9600265-95bf-4900-868e-641133c05a80
Temp Tables
https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#temporary_tables
https://www.pascallandau.com/bigquery-snippets/use-temporary-tables-with-named-subquery/?utm_source=blog&utm_medium=rss&utm_campaign=development-feed
https://codingsight.com/introduction-to-temporary-tables-in-sql-server/
https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/
https://www.red-gate.com/hub/product-learning/sql-prompt/choosing-table-variables-temporary-tables
Comments
Post a Comment