12. Data Analytics - Prepare Data for Exploration - Week 3

Definition:

Database // a collection of data stored in a computer system

Meta // referring to itself or conventional to its genre

Metadata // data about data

Relational database // a database that contains a series of related tables that can be connected via their relationship

        For tables to be relational, on or more items must exist in both tables. Usually these are "keys".

Primary Key // an identifier that references a column in which each value is unique. A table only has one.

        - Used to ensure data in specific column is unique

        - Uniquely identifies a record in a relational database table

        - Only one primary key is allowed in a table

        - Cannot contain null or blank values

Foreign keys // a field within a table that is a primary key in another table.

        It is how a table connects to another.

        - A column or group of columns in a relational database table that provides a link between the data

        in two tables.

        - Refers to the field in a table that's the primary key of another table

        - More than one foreign key is allowed to exist in a table

Composite Key // a primary key constructed using multiple columns of a table

SQL // structured query language, a query language that lets analysts communicate with a database

Metadata repository // a database specifically created to store metadata

        - Makes it easier and faster to bring together multiple sources for data analysis by:

            - Describe the state and location of metadata

            - Describe the structure of the tables inside

            - Describe how the data flows through the repository

            - Keep track of who accesses the metadata and when

Data governance // a process to ensure the formal management of a company's data assets

Data lake // a pool of data from variety of sources to be used for analysis

Openness (open data) // free access, usage, and sharing of data

CVS (comma-separated values) // a CSV file saves data in a table format. Uses plain text and are delineated by characters, such as comma.

Delineator // indicates a boundary or separation between two things

Sorting data // arranging data into a meaningful order to make it easier to understand, analyze, and visualize

Filtering // showing only the data that meets a specific criteria while hiding the rest

Data Manipulation Language // 

Type types of data:

    Internal(primary): data that lives within a company's own systems

    External(secondary): data that lives and is generated outside an organization


Metadata:

Used in database management to help data analysts interpret the contents of the data within the database.

Metadata creates a single source of truth by keeping things consistent and uniform.

Metadata also makes data more reliable by making sure it's accurate, precise, relevant, and timely.

Metadata is stored in a single, central location, and gives the company standardized information about all of its data.

Elements: title, description, tags, categories, creator, last modified, who can access


Types of Metadata:

    - descriptive

            Metadata that describes a piece of data and can be used to identify it at a later point in time

            EX: ISBN

    - structural

            Metadata that indicates how a piece of data is organized and whether it is part of one, or more

            than one, data collection.

            EX: how pages are put together to create a chapter

    - administrative

            Metadata that indicates the technical source of a digital asset

            EX: size, date modified, type, image info, etc


Google BigQuery Sandbox

    - Log in with google account.

    - Upload or use public dataset

    - Create query to start entering query searches.


Additional Resources:

https://www.thedataschool.co.uk/anna-prosvetova/web-scraping-made-easy-import-html-tables-or-lists-using-google-sheets-and-excel

https://support.google.com/docs/answer/3093339?hl=en

https://cloud.google.com/public-datasets

https://www.kaggle.com/datasets?utm_medium=paid&utm_source=google.com+search&utm_campaign=datasets&gclid=CjwKCAiAt9z-BRBCEiwA_bWv-L6PpACh6RzmrJjQjmNGCCE7kky1FCtc6Jf1qld-4NwDMYL0WsUyxBoCdwAQAvD_BwE

https://dev.mysql.com/doc/mysql-getting-started/en/

https://docs.microsoft.com/en-us/sql/relational-databases/tutorial-getting-started-with-the-database-engine?view=sql-server-ver15

https://www.postgresql.org/docs/10/tutorial-start.html

https://www.sqlite.org/quickstart.html

Comments

Popular posts from this blog

2. FreeCodeCamp - Dynamic Programming - Learn to Solve Algorithmic Problems & Coding Challenges

20. Data Analytics - Analyze Data to Answer Questions - Week 1

3. Algorithms - Selection Sort