12. Data Analytics - Prepare Data for Exploration - Week 3
Definition:
Database // a collection of data stored in a computer system
Meta // referring to itself or conventional to its genre
Metadata // data about data
Relational database // a database that contains a series of related tables that can be connected via their relationship
For tables to be relational, on or more items must exist in both tables. Usually these are "keys".
Primary Key // an identifier that references a column in which each value is unique. A table only has one.
- Used to ensure data in specific column is unique
- Uniquely identifies a record in a relational database table
- Only one primary key is allowed in a table
- Cannot contain null or blank values
Foreign keys // a field within a table that is a primary key in another table.
It is how a table connects to another.
- A column or group of columns in a relational database table that provides a link between the data
in two tables.
- Refers to the field in a table that's the primary key of another table
- More than one foreign key is allowed to exist in a table
Composite Key // a primary key constructed using multiple columns of a table
SQL // structured query language, a query language that lets analysts communicate with a database
Metadata repository // a database specifically created to store metadata
- Makes it easier and faster to bring together multiple sources for data analysis by:
- Describe the state and location of metadata
- Describe the structure of the tables inside
- Describe how the data flows through the repository
- Keep track of who accesses the metadata and when
Data governance // a process to ensure the formal management of a company's data assets
Data lake // a pool of data from variety of sources to be used for analysis
Openness (open data) // free access, usage, and sharing of data
CVS (comma-separated values) // a CSV file saves data in a table format. Uses plain text and are delineated by characters, such as comma.
Delineator // indicates a boundary or separation between two things
Sorting data // arranging data into a meaningful order to make it easier to understand, analyze, and visualize
Filtering // showing only the data that meets a specific criteria while hiding the rest
Data Manipulation Language //
Type types of data:
Internal(primary): data that lives within a company's own systems
External(secondary): data that lives and is generated outside an organization
Metadata:
Used in database management to help data analysts interpret the contents of the data within the database.
Metadata creates a single source of truth by keeping things consistent and uniform.
Metadata also makes data more reliable by making sure it's accurate, precise, relevant, and timely.
Metadata is stored in a single, central location, and gives the company standardized information about all of its data.
Elements: title, description, tags, categories, creator, last modified, who can access
Types of Metadata:
- descriptive
Metadata that describes a piece of data and can be used to identify it at a later point in time
EX: ISBN
- structural
Metadata that indicates how a piece of data is organized and whether it is part of one, or more
than one, data collection.
EX: how pages are put together to create a chapter
- administrative
Metadata that indicates the technical source of a digital asset
EX: size, date modified, type, image info, etc
Google BigQuery Sandbox
- Log in with google account.
- Upload or use public dataset
- Create query to start entering query searches.
Additional Resources:
https://www.thedataschool.co.uk/anna-prosvetova/web-scraping-made-easy-import-html-tables-or-lists-using-google-sheets-and-excel
https://support.google.com/docs/answer/3093339?hl=en
https://cloud.google.com/public-datasets
https://www.kaggle.com/datasets?utm_medium=paid&utm_source=google.com+search&utm_campaign=datasets&gclid=CjwKCAiAt9z-BRBCEiwA_bWv-L6PpACh6RzmrJjQjmNGCCE7kky1FCtc6Jf1qld-4NwDMYL0WsUyxBoCdwAQAvD_BwE
https://dev.mysql.com/doc/mysql-getting-started/en/
https://docs.microsoft.com/en-us/sql/relational-databases/tutorial-getting-started-with-the-database-engine?view=sql-server-ver15
https://www.postgresql.org/docs/10/tutorial-start.html
https://www.sqlite.org/quickstart.html
Comments
Post a Comment