10. Data Analytics - Prepare Data for Exploration - Week 1

Definitions:

First-party data // data collected by an individual or group using their own resources

Second-party data // data collected by a group directly from its audience and then sold

Third-party data // data collected from outside sources who did not collect it directly

Population // all possible data values in a certain dataset

Sample // a part of a population that is representative of the population

Data model // a model that is used for organizing data elements and how they relate to one another

Data elements // pieces of information, such as people's names, account numbers, and addresses

Data Type // a specific kind of data attribute that tells what kind of value the data is

Text or String Data Type // a sequence of characters and punctuation that contains textual information

Boolean Data Type // a data type with only two possible values, true or false

Wide Data // data in which every data subject has a single row with multiple columns to hold the values of various attributes of the subject. Example: splitting data into multiple columns for organization.

Long Data // data in which each row is one time point per subject, so each subject will have data in multiple rows. Example: squeezing data in few columns as possible.

Boolean Data Type:

    - AND, OR, NOT

    

Data Modeling:

    - Creating diagrams that represent data that is organized and structured. 

    - These diagrams are called data models.


Types of Data Modeling:

    - Conceptual: high-level view of data structure. Often use to define business requirement for database

    - Logical: Focuses on technical details of a database such as relationships, attributes, and entities.

    - Physical: Depicts how database operates. Defines all entities and attributes used.

Data modeling techniques:

    - Entity Relationship Diagram (ERD)

            Visual way of understanding relationship between entities in data model.

    - Unified Modeling Language (UML)

            Very detailed diagrams that describe structure of a system by showing system's entities,                         attributes, operations, and their relationships.


Types of Data:

Qualitative Data // subjective and explanatory measures of qualities and characteristics

Quantitative Data // specific and objective measures of numerical facts

Discrete Data // data that is counted and has a limited number of values.

        Example: INTs can't be floats. No in-between 1 and 2.

Continuous Data // data that is measured and can have almost any numeric value

Nominal Data // a type of qualitative data that is categorized without a set order

        Example: yes, no, not sure. These don't have a particular order.

Ordinal Data // a type of qualitative data with a set order or scale

        Example: ranking movies

Internal Data // data that lives within a company's own systems.

External Data // data that lives and is generated outside of an organization

Structured Data // data organized in a certain format such as rows and columns

        Example: expense report, tax returns, quantitative data

        Sources: spreadsheets, databases

        Easy to search, organize, analyze. Stored in relational databases and data warehouses.

Unstructured Data // data that is not organized in any easily identifiable manner

        Example: social media posts, emails, videos, photos, audio, qualitative data

        Varied data types.

        Difficult to search, but more freedom for analysis.


Data Types in Spreadsheet:

    - Number, String, Boolean

    - Columns = Fields

    - Rows = Records


Software that stores and organizes data:

    - Spreadsheets

    - Relational databases


Preparing data correctly:

    - Understand different types of data and data structures

    - What type of data is right for the question you're answering

    - Practical skills on how to extract, use, organize and protect your data


How data is collected:

    - Interviews

    - Observations: through experiments

    - Forms, Questionnaires, Surveys, Cookies


Data collection considerations:

    - How the data is collected

            First-party? Second-party? Third-party?

    - Choose data sources

            Second or third-party data providers.

    - Decide what data to use

    - How much data to collect

            Sample or population

    - Select the right data type

            Data metrics, formats, etc

    - Determine the time frame

            Immediate answers require historical data.


Additional Resources:

https://www.1keydata.com/datawarehousing/data-modeling-levels.html

https://dataedo.com/blog/basic-data-modeling-techniques

Comments

Popular posts from this blog

2. FreeCodeCamp - Dynamic Programming - Learn to Solve Algorithmic Problems & Coding Challenges

20. Data Analytics - Analyze Data to Answer Questions - Week 1

3. Algorithms - Selection Sort