11. Data Analytics - Prepare Data for Exploration - Week 2

Definition:

Bias // a preference in favor of or against a person, group of people, or thing

Data bias // a type of error that systematically skews results in a certain direction

Sampling bias // when a sample isn't representative of the population as a whole

Unbiased sampling // when a sample is representative of the population being measured

Ethics // well-founded standards of right and wrong that prescribe what humans ought to do, usually in terms of rights, obligations, benefits to society, fairness, or specific virtues

Data ethics // well-founded standards of right and wrong that dictate how data is collected, shared, and used

GDPR // general data protection regulation of the European union

Data anonymization // process of protecting people's private or sensitive data by eliminating personally identifiable information (PII). Processes: blanking, hashing, masking, or using codes or altered texts.

De-identification // process used to wipe data clean of all personally identifying information

Data interoperability // ability of data systems and services to openly connect and share data. is used in healthcare industry.


What data should be anonymized?

    - healthcare and financial data

    - numbers, names, license plates/numbers, SSN, IP addresses, medical records, emails, photos, etc


Aspects of data ethics

    - ownership:

            Individuals own the raw data they provide and they have primary control over its usage,

            how it's processed, and how it's shared.

    - transaction transparency

            All data-processing activities and algorithms should be completely explainable and understood

            by the individual who provides their data.

    - consent

            an individual right to know explicit details about how and why their data will be used

            before agreeing to provide it.

    - currency

            Individuals should be aware of financial transactions resulting from the use of their personal

            data and the scale of these transactions.

    - privacy

            preserving a data subject's information and activity any time a data transaction occurs.

                - Protection from unauthorized access to our private data.

                - Freedom from inappropriate use of our data.

                - The right to inspect, update, or correct our data.

                - Ability to give consent to use our data.

                - Legal right to access the data.

    - openness (or open data)

            Free access, usage, and sharing of data

                - Availability and access // available and accessable

                - Reuse and redistribution // allows reuse and redistribution of data

                - Universal participation // no restrictions on who can use the data


Identifying good data:

    R = reliable. not biased

    O = original. first party data.

    C = comprehensive. contains all information needed to answer question or solution

    C = current. usefulness of data decreases as time passes

    C = cited. makes information credible

    *good data rocccs!

Identifying bad data:

    opposite of ROCCC.

    Every good solution is found by avoiding bad data.

    

Types of data bias:

    - sampling bias 

            When a sample isn't representative of the population as a whole

    - observer bias (experimenter bias/research bias)

            The tendency for different people to observe things differently 

    - interpretation bias

            The tendency to always interpret ambiguous situations in a positive or negative way

    - confirmation bias

            The tendency to search for or interpret information in a way that confirms pre-existing beliefs


Bias Occurs:

    - during data collection

    - during planning. not being inclusive

    - subconsciously or consciously

Solution:

    - choose data randomly in a population


Ensuring Data Integrity Process:

    - Analyze data for bias and credibility

    - Good vs. bad data

    - Data ethics, privacy, and access


Additional Resources:

https://www.data.gov/

https://www.census.gov/data.html

https://www.opendatanetwork.com/

https://cloud.google.com/public-datasets

https://datasetsearch.research.google.com/

Comments

Popular posts from this blog

2. FreeCodeCamp - Dynamic Programming - Learn to Solve Algorithmic Problems & Coding Challenges

20. Data Analytics - Analyze Data to Answer Questions - Week 1

3. Algorithms - Selection Sort