23. Data Analytics - Analyze Data to Answer Questions - Week 4

Learning functions in spreadsheet and SQL. 

Learning how to use PIVOTTABLES.

Learn to check and recheck as part of data validation process to make sure everything is accurate.

        Whenever do calculations, recheck with another simple check process for accuracy test.

        See if both processes have the same results.

Data validation process is a form of data cleaning.

Learn how to use temporary tables in SQL.


Definition:

Conditional formatting (Spreadsheet) // spreadsheet tool that changes how cells appear when values meet specific conditions

Array // a collection of values in cells

Profit margin // percentage that indicates how many cents of profit has been generated for each dollar of sale

Pivot tables (Spreadsheet) // let you view data in multiple ways to find insights and trends

Calculated Field (Spreadsheet Pivot Table) // a new field within a pivot table that carries out certain calculations based on the values of other fields

Operator (SQL) // a symbol that names the type of operation or calculation to be performed in a formula

Modulo // an operator (%) returns the remainder when one number is divided by another

Underscores // line used to underline words and connect text characters

Data validation process / checking and rechecking the quality of your data so that it is complete, accurate, secure and consistent

Temporary table // database table that is created and exists temporarily on a database server


Spreadsheet Calculation Functions:

        - AVERAGE() // finds average

        - MIN or MAX() // finds min and max

        - COUNTIF // returns a total number of cells that matches a specified value. put conditions in 

                                parenthesis. SYNTAX: COUNTIF(RANGE,"CONDITION")

        - COUNTIFS // like COUNTIF but can include multiple conditions

        - SUMIF // a function that adds numeric data based on one condition

                            SYNTAX: SUMIF(RANGE,"CONDITION",2ndRANGE)

        - SUMIFS // like SUMIF but can include multiple conditions

        - AVERAGEIF // like SUMIF but averages instead of summing

        - MAXIFS // finds max value in a range with conditions

                                SYNTAX: MAXIFS(rangetofindmaxof, rangeofcriteria, "criteria")

        - SUMPRODUCT // function that multiplies arrays and returns the sum of those products.

                                            SYNTAX: SUMPRODUCT(array1,array2, . . .)

        - MOD // modulus operator


SQL Temporary Tables:

        *Temp tables are deleted after SQL session ends. They can hold results of different queries, this is

        called staging.

        Global temp tables are made available to all database users and dropped when all uses close it.

        Local temp tables are made available to one user.


        - WITH // a clause that is a type of temporary table that you can query from multiple times

                SYNTAX

                        WITH temptablename AS (

                                SELECT *

                                FROM tablename

                                WHERE condition )

                        // Now can call this temp table as a normal tablename

        - INTO // copies data from one table to another but doesnt add the new table to database

                    SYNTAX: SELECT>INTO>FROM>WHERE

                    *Not recognized in BigQuery

        - CREATE TABLE // creates a new table to database so anyone can use

                    SYNTAX: CREATE TABLE newtablename AS (SELECT>FROM>WHERE)

        - CREATE TEMP TABLE // an alternative syntax of CREATE TABLE

        - DROP TABLE // removes the table from database.

                    SYNTAX: DROP TABLE table_name


SQL Calculation Syntax:

        - Addition:

                SELECT column a, column b, column a + column b AS column c

                FROM table_name

        - Subtraction/Multiplication/Division:

                Change the operator "+" to something else or use parentheses to use more operators

        - Modulo Operator:

                2 % 2 // this gives a remainder if it has any

        - EXTRACT // lets us pull one part of a given date type data to use

                SYNTAX: EXTRACT(YEAR FROM column) AS year,

        - GROUP BY // command that group rows that have the same values from a table into 

                                    summary rows. Sums every related row into a single row.

                SYNTAX: SELECT > FROM > WHERE > GROUP BY

        - ORDER BY // lets us sort SQL data

                SYNTAX: SELECT > FROM > WHERE > GROUP BY>ORDER BY


Types of data validation:

        - Validation process makes sure the data is correct and have integrity before working on the data.


        1. Data type

                - Purpose // check the data matches the data type defined for a field. Format should match.

                - EX: a date should be a date format

        2. Data range

                - Purchase // check that the data falls within an acceptable range of values defined for a field.

                - EX: 3 is not in range of 1-12

        3. Data constraints

                - Purpose // check that the data meets certain conditions/criteria for a field. 

                - EX: ints and not floats

        4. Data consistency

                - Purpose // check that the data makes sense in the context of other related data.

                - EX: product shipping data can't be earlier than production(made) date.

        5. Data structure

                - Purpose // check that the data follows or conforms to a set structure

                - EX: web pages must follow a prescribed structure to be displayed properly

        6. Code validation

                - Purpose // check that the application code systematically performs any of the previously 

                                    mentioned validations during user data input.

                - EX: a system accepting a password that went against the specified limits/constraints


Additional Resources:

https://exceljet.net/excel-functions/excel-ifs-function

https://exceljet.net/formula/vlookup-with-multiple-criteria

https://exceljet.net/formula/index-and-match-with-multiple-criteria

https://support.microsoft.com/en-us/office/using-if-with-and-or-and-not-functions-d895f58c-b36c-419e-b1f2-5c193a236d97

Pivot Tables in Excel:

https://support.microsoft.com/en-us/office/calculate-values-in-a-pivottable-11f41417-da80-435c-a5c6-b0185e59da77

https://exceljet.net/pivot-table/pivot-table-calculated-field-example

https://powerspreadsheets.com/pivottable-calculated-fields/

https://support.microsoft.com/en-us/office/sort-data-in-a-pivottable-or-pivotchart-e41f7107-b92d-44ef-861f-24430830450a

https://www.tutorialspoint.com/excel_pivot_tables/excel_pivot_tables_sorting_data.htm

https://exceljet.net/lessons/how-to-sort-a-pivot-table-by-value

https://support.microsoft.com/en-us/office/filter-data-in-a-pivottable-cc1ed287-3a97-4e95-b377-ddfafe79fa8f

https://www.dummies.com/software/microsoft-office/excel/how-to-filter-excel-pivot-table-data/

https://support.microsoft.com/en-us/office/design-the-layout-and-format-of-a-pivottable-a9600265-95bf-4900-868e-641133c05a80

Temp Tables

https://cloud.google.com/bigquery/docs/reference/standard-sql/data-definition-language#temporary_tables

https://www.pascallandau.com/bigquery-snippets/use-temporary-tables-with-named-subquery/?utm_source=blog&utm_medium=rss&utm_campaign=development-feed

https://codingsight.com/introduction-to-temporary-tables-in-sql-server/

https://www.sqlservertutorial.net/sql-server-basics/sql-server-temporary-tables/

https://www.red-gate.com/hub/product-learning/sql-prompt/choosing-table-variables-temporary-tables

Comments

Popular posts from this blog

2. FreeCodeCamp - Dynamic Programming - Learn to Solve Algorithmic Problems & Coding Challenges

20. Data Analytics - Analyze Data to Answer Questions - Week 1

3. Algorithms - Selection Sort