24. Data Analytics - Share Data Through the Art of Visualization - Week 1

 Data visuals in two ways:

        - looking at visuals in order to understand and draw conclusions about data

        - creating visuals from raw data to tell a story

Quick rule for creating visualizations:

        - Audience should know what they're looking at within the first five seconds

        - Audience should then understand the conclusion the visualization is making 5 seconds after that

Data visualization is a great tool to fit a lot of information into a small space. 

        Steps: Organize and structure your thoughts > Think about patterns about the data and key

        findings


        The four elements of effective data visualization are the information (data), the story (concept), the

        goal (function), and the visual form (metaphor); a successful data visualization must have all four 

        elements. 


Definitions:

Data visualization // graphic representation and presentation of data

Causation // occurs when an action directly leads to an outcome

Correlation // measure of the degree to which two variables move in relationship to each other. EX: temperature goes up, ice-cream sales goes up.

            - Positive correlation  // when one factor goes up and the other goes up

            - Negative/Inverse correlation // when one factor goes down and the other goes up

            - No correlation // when one factor goes up/down and the other does nothing

            * CORRELATION DOES NOT MEAN CAUSATION

Static visualizations // do not change over time unless they're edited. Useful when you want to control your story or dataset. EX: Charts and graphs made in spreadsheet.

Dynamic visualizations // visualizations that are interactive or change over time. Users have control over what they see and you have less control over data and the story.

Tableau // a business intelligence and analytics platform that helps people see, understand, and make decisions with data

Decision tree // is a decision-making tool that allows making decisions based on key questions that you can ask yourself. Like a pathway or binary search tree.

Data composition // combining the individual parts in a visualization and displaying them together as whole

Design thinking // process used to solve complex problems in a user-centric way

Headlines // a line of words printed in large letters at the top of the visualization to communicate what data is being presented. Attention grabber, keep it bold and simple and above the chart.

Subtitle // supports the headline by adding more context and description

Labels // identifies data in relation to other data. EX: legends and keys

Legends (keys) // identifies the meaning of various elements in a data visualization

Annotation // briefly explains data or helps focus the audience on a particular aspect of the data in a visualization.

Alternative text // alternative text provides a textual alternative to non-text content


Ways to make data visualizations accessible for everyone:

        * Thinking about everyone who might access the data and what obstacles they might run into.

        - Labeling // make sure labeling is not confusing

        - Text alternatives // add more alternative ways to read the data (voice-overs, translations, etc)

        - Text-based format

        - Distinguishing // using foregrounds and background in a way to make better contrast

        - Simplify // make visualization not overly complicated


How to highlight data visualizations:

        - Headlines, subtitles, labels

        


Visualization components

Guidelines

Style checks

Headlines

- Content: Briefly describe the data - Length: Usually the width of the data frame - Position: Above the data

- Use brief language - Don’t use all caps - Don’t use italic - Don’t use acronyms - Don't use abbreviations - Don’t use humor or sarcasm

Subtitles

- Content: Clarify context for the data - Length: Same as or shorter than headline - Position: Directly below the headline

- Use smaller font size than headline - Don’t use undefined words - Don’t use all caps, bold, or italic - Don’t use acronyms - Don't use abbreviations

Labels

- Content: Replace the need for legends - Length: Usually fewer than 30 characters - Position: Next to data or below or beside axes

- Use a few words only - Use thoughtful color-coding - Use callouts to point to the data - Don’t use all caps, bold, or italic

Annotations

- Content: Draw attention to certain data - Length: Varies, limited by open space - Position: Immediately next to data annotated

- Don’t use all caps, bold, or italic - Don't use rotated text - Don’t distract viewers from the data


Five phases of the design process for design thinking:

        - Empathize // think about emotions and needs of target audience, is the visualization appropriate?

        - Define // define audience's need and problems and your insights

        - Ideate // generate data visualization ideas, brainstorm how to formulate a visualization

        - Prototype // putting charts/visualizations together or create and list potential final chart choices

        - Test // test the visualization by showing to team members


Elements for effective visuals:

        - Clear meaning // the message is clear and the visualization is easy to understand

        - Sophisticated use of contrast // knowing how to emphasize the message

        - Refined execution // deep attention to detail using visual elements 


        David McCandless's Venn Diagram:

                - Information (Data) // data is needed to create a story and to communicate new ideas/finding

                - Story (Concept) // data need a story, with only informative it is boring

                - Goal (Function) // goal of data visualization makes the data useful and usable

                - Visual Form (Metaphor) // visual elements give visualization structure beautiful


Principles of design:

        1. Balance // when key elements of a visualization is distributed evenly, color/spacing/etc is balans

        2. Emphasis // focal point for audience to concentrate, visualization should emphasize importants

        3. Movement // path the viewer's eye travel as they look at the visualization

        4. Pattern // patterns can be shown with colors and shapes, and other elements

        5. Repetition // repeating patterns/elements can add to effectiveness of visualization

        6. Proportion // using color and size can help emphasize the importance of a data in visualization

        7. Rhythm // creating a sense of flow or movement in the visualization.

        8. Variety // variety in chart types, shapes, and other elements

        9. Unity // final visualization should be cohesive


Elements of art:

        - Line // can be curve/straight, thick/thin, vertical/diagonal, etc

        - Shape // always be two-dimensional, good for size contrast

        - Color // hue, intensity, value. shade is adding dark values to a color. tint is adding light values.

        - Space // area between, around, and in objects

        - Movement // create a sense of flow/action in a visualization


Types of visualizations:

Bar graphs // use size contrast to compare two or more values. Has X-axis categories and Y-axis scale of values but can switch them. Effectively shows data that can be ranked.

Line graph // help your audience understand shifts or changes in your data. Help show change over a period of time. Has X-axis and Y-axis. (Like stock market chart)

Pie charts // show how much each part of something makes up the whole. Shows proportion differences

Maps // help organize data geographically

Histogram // a chart that shows how often data values fall into certain ranges. Sometimes bell curve looking.

Correlation charts // show relationships among data

Column charts // like a table basically, but can be used to create a basic bar visualization

Heatmap // uses color to compare categories in a dataset. mainly used to show relationship between two variables and use a system of color-coding to represent different values.

Scatter plots // show relationship between different variables. typically used for two variables for a set of data.

Bubble chart // shows size comparison

Distribution graph // displays the spread of various outcomes in a dataset


Patterns:

        - Change // this is a trend or instance of observations that become different over time. Line or

                            column chart.

        - Clustering // a collection of data points with similar or different values. Distribution graph

        - Relativity // these are observations considered in relation or in proportion to something else. Pie 

                            chart.

        - Ranking // position in a scale of achievement or status. Column chart.

        - Correlation // shows mutual relationship or connection between two or more things. Scatterplot


Organization Frameworks

        - The McCandless Method

                https://www.informationisbeautiful.net/visualizations/what-makes-a-good-data-visualization/

                INFORMATION > STORY > GOAL > VISUAL FORM

        - Kaiser Fung's Junk Charts Trifecta Checkup

                https://junkcharts.typepad.com/junk_charts/junk-charts-trifecta-checkup-the-definitive-

                guide.html

                Checkup Questions:

                        - What is the practical question?

                        - What does the data say?

                        - What does the visual say?


Marks and Channels in Data Visualizations:

        * Pre-attentive attributes // elements of a data visualization that people recognize automatically

            without conscious effort.

        Marks // basic visual objects like points, lines, shapes.

                - Position, Size, Shape, Color

        Channels // visual aspects or variables that represent characteristics of the data.

                - Accuracy // are the channels accurate at estimating the values represented?

                - Popout // are the values easily distinguished from one another?

                - Grouping // how good is a channel at communicating groups that exist in the data?


Design Principles:

Principle

Description

Choose the right visual

One of the first things you have to decide is which visual will be the most effective for your audience. Simple vs complex

Optimize the data-ink ratio

The data-ink entails focusing on the part of the visual that is essential to understanding the point of the chart. Try to minimize non-data ink like boxes around legends or shadows to optimize the data-ink ratio.

Use orientation effectively

Make sure the written components of the visual, like the labels on a bar chart, are easy to read. Change orientation if necessary.

Color

Use color consciously and meaningfully, staying consistent throughout your visuals, being considerate of what colors mean to different people, and using inclusive color scales that make sense for everyone viewing them.

Numbers of things

Think about how many elements you include in any visual. If your visualization uses lines, try to plot five or fewer. If that isn’t possible, use color or hue to emphasize important lines. Also, when using visuals like pie charts, try to keep the number of segments to less than seven since too many elements can be distracting. 


Avoid misleading or deceptive charts:

What to avoid

Why

Cutting off the y-axis

Changing the scale on the y-axis can make the differences between different groups in your data seem more dramatic, even if the difference is actually quite small. 

Misleading use of a dual y-axis

Using a dual y-axis without clearly labeling it in your data visualization can create extremely misleading charts. 

Artificially limiting the scope of the data

If you only consider the part of the data that confirms your analysis, your visualizations will be misleading because they don’t take all of the data into account. 

Problematic choices in how data is binned or grouped

It is important to make sure that the way you are grouping data isn’t misleading or misrepresenting your data and disguising important trends and insights. 

Using part-to-whole visuals when the totals do not sum up appropriately 

If you are using a part-to-whole visual like a pie chart to explain your data, the individual parts should add up to equal 100%. If they don’t, your data visualization will be misleading. 

Hiding trends in cumulative charts

Creating a cumulative chart can disguise more insightful trends by making the scale of the visualization too large to track any changes over time. 

Artificially smoothing trends

Adding smooth trend lines between points in a scatter plot can make it easier to read that plot, but replacing the points with just the line can actually make it appear that the point is more connected over time than it actually was. 


Additional Resources:

https://www.ted.com/talks/david_mccandless_the_beauty_of_data_visualization?language=en#t-150183

https://artscience.blog/home/the-mccandless-method-of-data-presentation

https://informationisbeautiful.net/

https://www.amazon.com/Street-Journal-Guide-Information-Graphics/dp/0393072959

https://visme.co/blog/best-data-visualizations/

https://www.tableau.com/learn/articles/best-data-visualization-blogs

https://datastudio.google.com/gallery?category=visualization

https://visme.co/blog/best-data-visualizations/

https://towardsdatascience.com/correlation-is-not-causation-ae05d03c1f53

https://www.data-to-viz.com/

https://www.youtube.com/watch?v=C07k0euBpr8

https://dataconomy.com/2019/05/three-critical-aspects-of-design-thinking-for-big-data-solutions/

https://www.enginess.io/insights/data-and-design-thinking

Comments

Popular posts from this blog

20. Data Analytics - Analyze Data to Answer Questions - Week 1

4. C# - List

14. Go - Methods and Pointers