10. Data Analytics - Prepare Data for Exploration - Week 1
Definitions:
First-party data // data collected by an individual or group using their own resources
Second-party data // data collected by a group directly from its audience and then sold
Third-party data // data collected from outside sources who did not collect it directly
Population // all possible data values in a certain dataset
Sample // a part of a population that is representative of the population
Data model // a model that is used for organizing data elements and how they relate to one another
Data elements // pieces of information, such as people's names, account numbers, and addresses
Data Type // a specific kind of data attribute that tells what kind of value the data is
Text or String Data Type // a sequence of characters and punctuation that contains textual information
Boolean Data Type // a data type with only two possible values, true or false
Wide Data // data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject. Example: splitting data into multiple columns for organization.
Long Data // data in which each row is one time point per subject, so each subject will have data in multiple rows. Example: squeezing data in few columns as possible.
Boolean Data Type:
- AND, OR, NOT
Data Modeling:
- Creating diagrams that represent data that is organized and structured.
- These diagrams are called data models.
Types of Data Modeling:
- Conceptual: high-level view of data structure. Often use to define business requirement for database
- Logical: Focuses on technical details of a database such as relationships, attributes, and entities.
- Physical: Depicts how database operates. Defines all entities and attributes used.
Data modeling techniques:
- Entity Relationship Diagram (ERD)
Visual way of understanding relationship between entities in data model.
- Unified Modeling Language (UML)
Very detailed diagrams that describe structure of a system by showing system's entities, attributes, operations, and their relationships.
Types of Data:
Qualitative Data // subjective and explanatory measures of qualities and characteristics
Quantitative Data // specific and objective measures of numerical facts
Discrete Data // data that is counted and has a limited number of values.
Example: INTs can't be floats. No in-between 1 and 2.
Continuous Data // data that is measured and can have almost any numeric value
Nominal Data // a type of qualitative data that is categorized without a set order
Example: yes, no, not sure. These don't have a particular order.
Ordinal Data // a type of qualitative data with a set order or scale
Example: ranking movies
Internal Data // data that lives within a company's own systems.
External Data // data that lives and is generated outside of an organization
Structured Data // data organized in a certain format such as rows and columns
Example: expense report, tax returns, quantitative data
Sources: spreadsheets, databases
Easy to search, organize, analyze. Stored in relational databases and data warehouses.
Unstructured Data // data that is not organized in any easily identifiable manner
Example: social media posts, emails, videos, photos, audio, qualitative data
Varied data types.
Difficult to search, but more freedom for analysis.
Data Types in Spreadsheet:
- Number, String, Boolean
- Columns = Fields
- Rows = Records
Software that stores and organizes data:
- Spreadsheets
- Relational databases
Preparing data correctly:
- Understand different types of data and data structures
- What type of data is right for the question you're answering
- Practical skills on how to extract, use, organize and protect your data
How data is collected:
- Interviews
- Observations: through experiments
- Forms, Questionnaires, Surveys, Cookies
Data collection considerations:
- How the data is collected
First-party? Second-party? Third-party?
- Choose data sources
Second or third-party data providers.
- Decide what data to use
- How much data to collect
Sample or population
- Select the right data type
Data metrics, formats, etc
- Determine the time frame
Immediate answers require historical data.
Additional Resources:
https://www.1keydata.com/datawarehousing/data-modeling-levels.html
https://dataedo.com/blog/basic-data-modeling-techniques
Comments
Post a Comment