29. Data Analytics - Data Analysis with R Programming - Week 2
Programming using RStudio. Learning syntax and packages.
Pipes make a sequence of code easier to work with and read.
Definitions:
Functions (R) // a body of reusable code used to perform specific tasks in R
Argument (R) // information that a function in R needs in order to run
Variable (R) // representation of a value in R that can be stored for use later during programming
Vectors (R) // a group of data elements of the same typed in a sequence in R
Pipe (R) // a tool in R for expressing a sequence of multiple operations, represented with "%>%"
Basically takes output of one statement and use it as input for the next
Data frames // a collection of columns-similar to a spreadsheet or SQL table. Each column has a name at the top that represents a variable and includes one observation per row.
Operator // a symbol that names the type of operation or calculation to be performed in a formula
Assignment operators // used to assign values to variables and vectors
Packages (R) // units of reproducible R code
- Includes Reusable R functions
- Documentation about the functions
- Sample datasets
- Tests for checking your code
Base R // basic package built into R
CRAN (comprehensive R archive network) // online archive with R packages, source code, manuals, and documentation
Tidyverse (R) // a system of packages in R with a common design philosophy for data manipulation, exploration, and visualization
- tidyverse_update() // to update tidyverse
Factors (R) // store categorical data in R where the data values are limited and usually based on a finite group like country or year
Nested // describes code that performs a particular function and is contained within code that performs a broader function
Nested function // function that is completely contained within another function
8 Core Tidyverse Analytics Packages:
- ggplot2 (essential) // use for data visualization
- Create a variety of data viz by applying different visual properties to the data variables
- dplyr (essential) //
- Offers a consistent set of functions that help you complete some common data
manipulation tasks
- has select() function to help only select relevant data
- tidyr (essential) // clean data
- A package used for data cleaning to make tidy data
- readr (essential) // importing and reading data
- purrr // works with functions and vectors
- tibble // works with data frames
- stringr // includes string functions
- forcats // provides tools to solve common problems with factors
Arithmetic operators:
- used to complete math calculations
= Add+, subtract-, multiplication*, division/
Installing Packages:
- install.packages("packagename")
Loading Packages:
- library(packagename)
Fundamentals of programming using R in RStudio:
Basic Concepts of R:
- Functions
- Comments
- Variables
- Data types
- Vectors
- Pipes
Basic R Stuff:
Commenting:
# comment
Variables:
- variablename <- "hello"
- variablename = "hello"
Vectors: // like array but holds specific type of data. created with a "c(var,var,etc)"
- vectorname = c(1,2,3)
Lists:
- listname = list(var,var,etc)
Functions:
- print() // prints
- today() // gets today's date
- now() // get the current time
- data.frame(columnname = c(observations), column2 = c(etc)) // creates a dataframe
- dir.create("destination_folder") // creates a new folder/directory
- file.create ("newfile.text") // creates a new file
- file.copy ("file.txt","newdestination folder")
- unlink ("somefile.txt) // deletes a file
- matrix(c(vectors), nrow = #) // a 2D array of elements of 1 data type. use nrow or ncol
- head(data) // previews a data header and several few rows
- str(data) or glimpse(data) // previews data horizontally
- colnames(data) // returns a list of column names from dataset
- rename(data, newcolumnname = oldcolumnname) // renames data variables
- summarize(data, column = mean(column)) // summarizes data
- Update.packages() // update all of your packages
- Install.packages("packagename") // update individual packages
- browseVignettes("packagename") // pull up documents on a package
- data("dataname") // loads installed data
- View(dataname) // views data
- filter(data, columnname == 0.5) // filter function to return a filtered version of the data
- arrange(data, columnname to sort by) // sorts a dataset by column. default is ascending.
- group_by(column) // groups data
Plotting Functions:
- ggplot(data = dataname, aes(x=datax,y=datay)) + geom_point() // plots data with basic points
Type | Description | Example |
---|---|---|
Logical | True/False | TRUE |
Integer | Positive and negative whole values | 3 |
Double | Decimal values | 101.175 |
Character | String/character values | “Coding” |
Date Types in R:
A date ("2016-08-16")
A time within a day (“20:11:59 UTC")
And a date-time. This is a date plus a time ("2018-03-31 18:15:48 UTC")
Converting String to Date or Time or Both:
- ymd("datestring") // rearrange ymd to match the date strings format of month, year, day
- ymd_hms("date time") // rearrange ymd and hms to match hours,minutes,seconds
Logical Operators:
- AND & or &&
- OR ||
- NOT !
Example: x > 5 & x < 10, returns true if X is between 5 and 10
IF/ELSE:
if (condition) {
} else if (condition){
} else {
}
Using Pipes:
Use %>% each step, except the end
Additional Resources:
https://r4ds.had.co.nz/vectors.html#vectors
https://lubridate.tidyverse.org/index.html
https://rawgit.com/rstudio/cheatsheets/master/lubridate.pdf
https://www.rdocumentation.org/packages/base/versions/3.6.2/topics/files
http://statseducation.com/Introduction-to-R/modules/getting%20data/data-wrangling/
https://www.datacamp.com/community/tutorials/conditionals-and-control-flow-in-r
https://support.rstudio.com/hc/en-us/articles/201057987-Quick-list-of-useful-R-packages
https://cran.r-project.org/web/views/
https://rstudio.com/
https://www.r-bloggers.com/2015/12/how-to-learn-r-2/#h.y5b98o9o2h1r
Comments
Post a Comment