29. Data Analytics - Data Analysis with R Programming - Week 2
Programming using RStudio. Learning syntax and packages.
Pipes make a sequence of code easier to work with and read.
Definitions:
Functions (R) // a body of reusable code used to perform specific tasks in R
Argument (R) // information that a function in R needs in order to run
Variable (R) // representation of a value in R that can be stored for use later during programming
Vectors (R) // a group of data elements of the same typed in a sequence in R
Pipe (R) // a tool in R for expressing a sequence of multiple operations, represented with "%>%"
        Basically takes output of one statement and use it as input for the next
Data frames // a collection of columns-similar to a spreadsheet or SQL table. Each column has a name at the top that represents a variable and includes one observation per row.
Operator // a symbol that names the type of operation or calculation to be performed in a formula
Assignment operators // used to assign values to variables and vectors
Packages (R) // units of reproducible R code
        - Includes Reusable R functions
        - Documentation about the functions
        - Sample datasets
        - Tests for checking your code
Base R // basic package built into R
CRAN (comprehensive R archive network) // online archive with R packages, source code, manuals, and documentation
Tidyverse (R) // a system of packages in R with a common design philosophy for data manipulation, exploration, and visualization
        - tidyverse_update() // to update tidyverse
Factors (R) // store categorical data in R where the data values are limited and usually based on a finite group like country or year
Nested // describes code that performs a particular function and is contained within code that performs a broader function
Nested function // function that is completely contained within another function
8 Core Tidyverse Analytics Packages:
        - ggplot2 (essential) // use for data visualization
                - Create a variety of data viz by applying different visual properties to the data variables
- dplyr (essential) //
                - Offers a consistent set of functions that help you complete some common data
                    manipulation tasks
                - has select() function to help only select relevant data
- tidyr (essential) // clean data
                - A package used for data cleaning to make tidy data
- readr (essential) // importing and reading data
        - purrr // works with functions and vectors
- tibble // works with data frames
        - stringr // includes string functions
        - forcats // provides tools to solve common problems with factors
Arithmetic operators:
        - used to complete math calculations
        = Add+, subtract-, multiplication*, division/
Installing Packages:
        - install.packages("packagename")
Loading Packages:
        - library(packagename)
Fundamentals of programming using R in RStudio:
        Basic Concepts of R:
                - Functions
                - Comments
                - Variables
                - Data types
                - Vectors
                - Pipes
Basic R Stuff:
        Commenting:
            # comment 
        Variables:
            - variablename <- "hello"
            - variablename = "hello"
        Vectors: // like array but holds specific type of data. created with a "c(var,var,etc)"
            - vectorname = c(1,2,3)
        Lists:
            - listname = list(var,var,etc)
        Functions:
            - print() // prints
            - today() // gets today's date
            - now() // get the current time
            - data.frame(columnname = c(observations), column2 = c(etc)) // creates a dataframe
            - dir.create("destination_folder") // creates a new folder/directory
            - file.create ("newfile.text") // creates a new file
            - file.copy ("file.txt","newdestination folder")
            - unlink ("somefile.txt) // deletes a file
            - matrix(c(vectors), nrow = #) // a 2D array of elements of 1 data type. use nrow or ncol 
            - head(data) // previews a data header and several few rows
            - str(data) or glimpse(data) // previews data horizontally
            - colnames(data) // returns a list of column names from dataset
            - rename(data, newcolumnname = oldcolumnname) // renames data variables
- summarize(data, column = mean(column)) // summarizes data
            - Update.packages() // update all of your packages
            - Install.packages("packagename") // update individual packages
            - browseVignettes("packagename") // pull up documents on a package
            - data("dataname") // loads installed data
            - View(dataname) // views data
            - filter(data, columnname == 0.5) // filter function to return a filtered version of the data
            - arrange(data, columnname to sort by) // sorts a dataset by column. default is ascending.
            - group_by(column) // groups data
Plotting Functions:
            - ggplot(data = dataname, aes(x=datax,y=datay)) + geom_point() // plots data with basic points
| Type | Description | Example | 
|---|---|---|
| Logical | True/False | TRUE | 
| Integer | Positive and negative whole values | 3 | 
| Double | Decimal values | 101.175 | 
| Character | String/character values | “Coding” | 
Date Types in R:
- A date ("2016-08-16") 
- A time within a day (“20:11:59 UTC") 
- And a date-time. This is a date plus a time ("2018-03-31 18:15:48 UTC") 
Converting String to Date or Time or Both:
- ymd("datestring") // rearrange ymd to match the date strings format of month, year, day
            - ymd_hms("date time") // rearrange ymd and hms to match hours,minutes,seconds
Logical Operators:
        - AND & or &&
        - OR ||
        - NOT !
        Example: x > 5 & x < 10, returns true if X is between 5 and 10
IF/ELSE:
if (condition) {
} else if (condition){
} else {
}
Using Pipes:
Use %>% each step, except the end
Additional Resources:
https://r4ds.had.co.nz/vectors.html#vectors
https://lubridate.tidyverse.org/index.html
https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/files
http://statseducation.com/Introduction-to-R/modules/getting%20data/data-wrangling/
https://www.datacamp.com/community/tutorials/conditionals-and-control-flow-in-r
https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages
https://cran.r-project.org/web/views/
https://rstudio.com/
https://www.r-bloggers.com/2015/12/how-to-learn-r-2/#h.y5b98o9o2h1r
Comments
Post a Comment