28. Data Analytics - Data Analysis with R Programming - Week 1
- Introduction to programming.
Definition:
Computer programming // giving instructions to a computer to perform an action or set of actions
Programming languages // words and symbols we use to write instructions for computers to follow
Syntax // shows how to arrange words and symbols for programming
Coding // writing instructions to a computer in the syntax of a specific programming language
R // programming language frequently used for statistical analysis, visualization, and other data analysis. Based on the S language.
Open source // code that is freely available and may be modified and shared by the people who use it
Integrated Development Environment (IDE) // software application that brings together all the tools you may want to use in a single place
R Features:
- R packages // addon packages, or (libraries)
R uses for data analysis:
- Reproducing your analysis // R can reproduce every steps of your analysis
- Processing lots of data // Just like SQL
- Creating data visualizations
Also:
- take specific analysis step and perform it across many different groups of data.
- flexible visualizations
- automatically create an output of summary stats
Programming languages:
- R
- Python
- JavaScript
- SAS, Scala, Julia
Why work with R:
- Accessible // easy to use for beginners
- Data-centric // specifically designed to make data analysis easier
- Open source // freely available and ready to be modified
- Community // has an active community
Benefits using programming with data:
- Clarify the steps of your analysis
- Saves time
- Reproduce and share your work
Key question | Spreadsheets | SQL | R |
---|---|---|---|
What is it? | A program that uses rows and columns to organize data and allows for analysis and manipulation through formulas, functions, and built-in features | A database programming language used to communicate with databases to conduct an analysis of data | A general purpose programming language used for statistical analysis, visualization, and other data analysis |
What is a primary advantage? | Includes a variety of visualization tools and features | Allows users to manipulate and reorganize data as needed to aid analysis | Provides an accessible language to organize, modify, and clean data frames, and create insightful data visualizations |
Which datasets does it work best with? | Smaller datasets | Larger datasets | Larger datasets |
What is the source of the data? | Entered manually or imported from an external source | Accessed from an external database | Loaded with R when installed, imported from your computer, or loaded from external sources |
Where is the data from my analysis usually stored? | In a spreadsheet file on your computer | Inside tables in the accessed database | In an R file on your computer |
Do I use formulas and functions? | Yes | Yes | Yes |
Can I create visualizations? | Yes | Yes, by using an additional tool like a database management system (DBMS) or a business intelligence (BI) tool | Yes |
Additional Resources:
https://medium.com/analytics-and-data/r-vs-python-a-comprehensive-guide-for-data-professionals-321e8dead598
https://blog.rstudio.com/2019/12/17/r-vs-python-what-s-the-best-for-language-for-data-science/
https://www.rstudio.com/solutions/r-and-python/
https://www.r-project.org/
https://cran.r-project.org/manuals.html
https://ourcodingclub.github.io/tutorials.html
https://cran.r-project.org/doc/contrib/Paradis-rdebuts_en.pdf
https://docs.python.org/3/tutorial/
https://ourcodingclub.github.io/tutorials.html
https://lgatto.github.io/2017_11_09_Rcourse_Jena/before-we-start.html
https://www.theanalysisfactor.com/the-advantages-of-rstudio/
https://community.rstudio.com/
Comments
Post a Comment