29. Data Analytics - Data Analysis with R Programming - Week 2

Programming using RStudio. Learning syntax and packages.

Pipes make a sequence of code easier to work with and read.


Definitions:

Functions (R) // a body of reusable code used to perform specific tasks in R

Argument (R) // information that a function in R needs in order to run

Variable (R) // representation of a value in R that can be stored for use later during programming

Vectors (R) // a group of data elements of the same typed in a sequence in R

Pipe (R) // a tool in R for expressing a sequence of multiple operations, represented with "%>%"

        Basically takes output of one statement and use it as input for the next

Data frames // a collection of columns-similar to a spreadsheet or SQL table. Each column has a name at the top that represents a variable and includes one observation per row.

Operator // a symbol that names the type of operation or calculation to be performed in a formula

Assignment operators // used to assign values to variables and vectors

Packages (R) // units of reproducible R code

        - Includes Reusable R functions

        - Documentation about the functions

        - Sample datasets

        - Tests for checking your code

Base R // basic package built into R

CRAN (comprehensive R archive network) // online archive with R packages, source code, manuals, and documentation

Tidyverse (R) // a system of packages in R with a common design philosophy for data manipulation, exploration, and visualization

        - tidyverse_update() // to update tidyverse

Factors (R) // store categorical data in R where the data values are limited and usually based on a finite group like country or year

Nested // describes code that performs a particular function and is contained within code that performs a broader function

Nested function // function that is completely contained within another function


8 Core Tidyverse Analytics Packages:

        - ggplot2 (essential) // use for data visualization

                - Create a variety of data viz by applying different visual properties to the data variables

        - dplyr (essential) // 

                - Offers a consistent set of functions that help you complete some common data

                    manipulation tasks

                - has select() function to help only select relevant data

        - tidyr (essential) // clean data

                - A package used for data cleaning to make tidy data

        - readr (essential) // importing and reading data

        - purrr // works with functions and vectors

        - tibble // works with data frames

        - stringr // includes string functions

        - forcats // provides tools to solve common problems with factors


Arithmetic operators:

        - used to complete math calculations

        = Add+, subtract-, multiplication*, division/


Installing Packages:

        - install.packages("packagename")

Loading Packages:

        - library(packagename)


Fundamentals of programming using R in RStudio:

        Basic Concepts of R:

                - Functions

                - Comments

                - Variables

                - Data types

                - Vectors

                - Pipes


Basic R Stuff:

        Commenting:

            # comment 

        Variables:

            - variablename <- "hello"

            - variablename = "hello"

        Vectors: // like array but holds specific type of data. created with a "c(var,var,etc)"

            - vectorname = c(1,2,3)

        Lists:

            - listname = list(var,var,etc)

        Functions:

            - print() // prints

            - today() // gets today's date

            - now() // get the current time

            - data.frame(columnname = c(observations), column2 = c(etc)) // creates a dataframe

            - dir.create("destination_folder") // creates a new folder/directory

            - file.create ("newfile.text") // creates a new file

            - file.copy ("file.txt","newdestination folder")

            - unlink ("somefile.txt) // deletes a file

            - matrix(c(vectors), nrow = #) // a 2D array of elements of 1 data type. use nrow or ncol 

            - head(data) // previews a data header and several few rows

            - str(data) or glimpse(data) // previews data horizontally

            - colnames(data) // returns a list of column names from dataset

            - rename(data, newcolumnname = oldcolumnname) // renames data variables

            - summarize(data, column = mean(column)) // summarizes data

            - Update.packages() // update all of your packages

            - Install.packages("packagename") // update individual packages

            - browseVignettes("packagename") // pull up documents on a package

            - data("dataname") // loads installed data

            - View(dataname) // views data

            - filter(data, columnname == 0.5) // filter function to return a filtered version of the data

            - arrange(data, columnname to sort by) // sorts a dataset by column. default is ascending.

            - group_by(column) // groups data

        Plotting Functions:

            - ggplot(data = dataname, aes(x=datax,y=datay)) + geom_point() // plots data with basic points


Type

Description 

Example

Logical 

True/False 

TRUE

Integer 

Positive and negative whole values

3

Double 

Decimal values 

101.175

Character 

String/character values

“Coding” 

Date Types in R:

  • A date ("2016-08-16")

  • A time within a day (“20:11:59 UTC")

  • And a date-time. This is a date plus a time ("2018-03-31 18:15:48 UTC")

Converting String to Date or Time or Both:

            - ymd("datestring") // rearrange ymd to match the date strings format of month, year, day

            - ymd_hms("date time") // rearrange ymd and hms to match hours,minutes,seconds


Logical Operators:

        - AND & or &&

        - OR ||

        - NOT !

        Example: x > 5 & x < 10, returns true if X is between 5 and 10

IF/ELSE:

        if (condition) {

        } else if (condition){

        } else {

        }


Using Pipes:

        Use %>% each step, except the end



Additional Resources:

https://r4ds.had.co.nz/vectors.html#vectors

https://lubridate.tidyverse.org/index.html

https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf

https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/files

http://statseducation.com/Introduction-to-R/modules/getting%20data/data-wrangling/

https://www.datacamp.com/community/tutorials/conditionals-and-control-flow-in-r

https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages

https://cran.r-project.org/web/views/

https://rstudio.com/

https://www.r-bloggers.com/2015/12/how-to-learn-r-2/#h.y5b98o9o2h1r

Comments

Popular posts from this blog

20. Data Analytics - Analyze Data to Answer Questions - Week 1

2. FreeCodeCamp - Dynamic Programming - Learn to Solve Algorithmic Problems & Coding Challenges

5. SQL Injection - Blind SQL Injection