4.3 out of 5
4.3
485 reviews on Udemy

Data Science Masterclass With R 8 Case Studies + 4 Projects

Data Science by IITian -Data Science+R Programming ,Data analysis, Data Visualization, Data Pre-processing etc
Instructor:
Up Degree
6,692 students enrolled
English [Auto]
Learn what is Data Science and how it is helping the modern world!
What are the benefits of Data Science and Machine Learning
Able to Solve Data Science Related Problem with the Help of R Programming
Why R is a Must Have for Data Science , AI and Machine Learning!
Right Guidance of the Path if You want to be a Data Scientist + Data science Interview Preparation Guide
How to switch career in Data Science?
R Data Structure - Matrix, Array, Data Frame, Factor, List
Work with R’s conditional statements, functions, and loops
Systematically Explore data in R
Data Science Package: Dplyr , GGPlot 2
Index, slice, and Subset Data
Get your data in and out of R - CSV, Excel, Database, Web, Text Data
Data Visualization : plot different types of data & draw insights like: Line Chart, Bar Plot, Pie Chart, Histogram, Density Plot, Box Plot, 3D Plot, Mosaic Plot
Data Manipulation - Apply function, mutate(), filter(), arrange (), summarise(), groupby(), date in R
Statistics - A Must have for Data Sciecne
Hypothesis Testing
Have fun with real Life Data Sets

Are you planing to build your career in Data Science in This Year?

Do you the the Average Salary of a Data Scientist is $100,000/yr?

Do you know over 10 Million+ New Job will be created for the Data Science Filed in Just Next 3 years??

If you are a Student / a Job Holder/ a Job Seeker then it is the Right time for you to go for Data Science!

Do you Ever Wonder that Data Science is the “Hottest” Job Globally in 2018 – 2019!

>> 30+ Hours Video

>> 4 Capstone Projects

>> 8+ Case Studies

>> 24×7 Support

                >>ENROLL TODAY & GET DATA SCIENCE INTERVIEW PREPARATION COURSE FOR FREE <<

What Projects We are Going to Cover In the Course?

Project 1– Titanic Case Study which is based on Classification Problem.

Project 2 – E-commerce Sale Data Analysis – based on Regression.

Project 3 – Customer Segmentation which is based on Unsupervised learning.

Final Project – Market Basket Analysis – based on Association rule mining

Why Data Science is a MUST HAVE for Now A Days?

The Answer Why Data Science is a Must have for Now a days will take a lot of time to explain. Let’s have a look into the Company name who are using Data Science and Machine Learning. Then You will get the Idea How it BOOST your Salary if you have Depth Knowledge in Data Science & Machine Learning!

What Students Are Saying:

A great course to kick-start journey in Machine Learning. It gives a clear contextual overview in most areas of Machine Learning . The effort in explaining the intuition of algorithms is especially useful

– John DoeCo-Founder, Impressive LLC

I simply love this course and I definitely learned a ton of new concepts.

Nevertheless, I wish there was some real life examples at the end of the course. A few homework problems and solutions would’ve been good enough.

– – Brain Dee, Data Scientist

It was amazing experience. I really liked the course. The way the trainers explained the concepts were too good. The only think which I thought was missing was more of real world datasets and application in the course. Overall it was great experience. The course will really help the beginners to gain knowledge. Cheers to the team

– – Devon Smeeth, Software Developer

Above, we just give you a very few examples why you Should move into Data Science and Test the Hot Demanding Job Market Ever Created!

The Good News is That From this Hands On Data Science and Machine Learning in R course You will Learn All the Knowledge what you need to be a MASTER in Data Science.

Why Data Science is a MUST HAVE for Now A Days?

The Answer Why Data Science is a Must have for Now a days will take a lot of time to explain. Let’s have a look into the Company name who are using Data Science and Machine Learning. Then You will get the Idea How it BOOST your Salary if you have Depth Knowledge in Data Science & Machine Learning!

Here we list a Very Few Companies : –

  • Google – For Advertise Serving, Advertise Targeting, Self Driving Car, Super Computer, Google Home etc. Google use Data Science + ML + AI to Take Decision

  • Apple: Apple Use Data Science in different places like: Siri, Face Detection etc

  • Facebook: Data Science , Machine Learning and AI used in Graph Algorithm for Find a Friend, Photo Tagging, Advertising Targeting, Chat bot, Face Detection etc

  • NASA: Use Data Science For different Purpose

  • Microsoft: Amplifying human ingenuity with Data Science

So From the List of the Companies you can Understand all Big Giant to Very Small Startups all are chessing Data Science and Artificial Intelligence and it the Opportunity for You!

Why Choose This Data Science with R Course?

  • We not only “How” to do it but also Cover “WHY” to do it?

  • Theory explained by Hands On Example!

  • 30+ Hours Long Data Science Course

  • 100+ Study Materials on Each and Every Topic of Data Science!

  • Code Templates are Ready to Download! Save a lot of Time

What You Will Learn From The Data Science MASTERCLASS Course:

  • Learn what is Data science and how Data Science is helping the modern world!

  • What are the benefits of Data Science , Machine Learning and Artificial Intelligence

  • Able to Solve Data Science Related Problem with the Help of R Programming

  • Why R is a Must Have for Data Science , AI and Machine Learning!

  • Right Guidance of the Path if You want to be a Data Scientist + Data Science Interview Preparation Guide

  • How to switch career in Data Science?

  • R Data Structure – Matrix, Array, Data Frame, Factor, List

  • Work with R’s conditional statements, functions, and loops

  • Systematically explore data in R

  • Data Science Package: Dplyr , GGPlot 2

  • Index, slice, and Subset Data

  • Get your data in and out of R – CSV, Excel, Database, Web, Text Data

  • Data Science – Data Visualization : plot different types of data & draw insights like: Line Chart, Bar Plot, Pie Chart, Histogram, Density Plot, Box Plot, 3D Plot, Mosaic Plot

  • Data Science – Data Manipulation – Apply function, mutate(), filter(), arrange (), summarise(), groupby(), date in R

  • Statistics – A Must have for Data Science

  • Data Science – Hypothesis Testing

  • Business Use Case Understanding 

  • Data Pre-processing 

  • Supervised Learning

  • Logistic Regression 

  • K-NN 

  • SVM 

  • Naive Bayes 

  • Decision Tree 

  • Random Forest

  • K-Mean Clustering 

  • Hierarchical Clustering 

  • DBScan Clustering 

  • PCA (Principal Component Analysis)

  • Association Rule Mining 

  • Model Deployment 

>> 30+ Hours Video

>> 4 Capstone Projects

>> 8+ Case Studies

>> 24×7 Support

                >>ENROLL TODAY & GET DATA SCIENCE INTERVIEW PREPARATION COURSE FOR FREE <<

Meet Your Instructor

1
Meet Your Instructor
2
Course Curriculum Overview

INTRODUCTION TO DATA SCIENCE

1
Introduction to Business Analytics
2
Introduction to Business Analytics
3
Introduction to Machine Learning
4
Introduction to Machine Learning
5
Introduction To Data Scientist
6
Introduction To Data Scientist
7
How to switch your career into ML
8
How to switch your career into ML
9
How to switch your career into ML
10
How to switch your career into ML Part #2

Course Curriculum Overview

1
What We are Going to Discuss Over the Course

INTRODUCTION TO R

1
Introduction to R
2
Introduction to R
3
Setting up R

R Programming

1
R Programming - R Operator
2
R Conditional Statement & Loop
3
R Conditional Statement & Loop Study Note
4
R Programming - R Function
5
R Programming - Function in R Study Note #1
6
R Programming - R Function #2
7
R Programming - R Function #2
8
R Programming - R Function #3
9
R Programming - R Function #3
10
All Codes : R Programming Study Note

R Data Structure

1
R Data Structure - Vector
2
Vector Study Note
3
All Code - Vector
4
Matrix, Array and Data Frame
5
Matrix, Array and Data Frame - Study Note
6
CODES - Matrix, Array and Data Frame
7
Code - Data Frame Part #2
8
A Deep Drive to R Data Frame
9
A Deep Drive to R Data Frame - Study Note
10
R Data Structure - Factor
11
R Data Structure - Factor Study Notes
12
Codes - Factor
13
R Data Structure - List
14
List - Study Note
15
Code - List
16
All Code : R Data Structure

Import and Export in R

1
Import CSV Data in R

Import and Export in R

You might find that loading data into R can be quite frustrating. Almost every single type of file that you want to get into R seems to require its own function, and even then you might get lost in the functions’ arguments. In short, you might agree that it can be fairly easy to mix up things from time to time, whether you are a beginner or a more advanced R user.

Types of files that we‘ll import

  • Importing CSV file

  • Importing Text file

  • Importing Excel file

  • Importing files from Database

  • Importing files from Web

  • Importing files from Statistical Tool

And lastly Exporting the Data

Importing CSV file

The utils package, which is automatically loaded in your R session on startup, can import CSV files with the read.csv() function. 

                                                                                                

                                                                                                                 

Use read.csv() to import a data frame 



Now use this commands to import CSV Files


#Importing csv file

# read.csv()

titanic_train<- read.csv(file.choose())

class(titanic_train)

titanic <- read.csv("titanic_train.csv")

str(titanic)

#Using readr package

install.packages("readr")

library(readr)

titanic <- read_csv("titanic_train.csv")

titanic


All the codes which are used in this video is given at the end of this chapter.The CSV files which are used here is available in the resource section of this lecture

This brings an end to this post, I encourage you to re read the post to understand it completely if you haven’t and THANK YOU.

2
Import CSV Data in R study note
3
Code - Import CSV Data in R
4
Import Text Data in R

Importing Text File   

The utils package, which is automatically loaded in your R session on startup, can import text files with the read-table function.    

                                                                                                   

       

                                                                                                            

                                                                        

Use read-table to import a data frame    



                                                                                    

Now use this commands to import Text Files   

If you have a  .txt  or a tab-delimited text file, you can easily import it with the basic R function read.table(). In other words, the contents of your file will look similar to this and can be imported as follows:   


# Importing table/text   

# read.table ()   

# Import the hotdogs.txt file: hotdogs   

?read.table   

hotdogs <- read.table( "hotdog.txt",sep = "t", header = TRUE)   

# Call head() on hotdogs   

head(hotdogs)   

All the codes which are used in this video is given at the end of this chapter.The Text files which are used here is available in the resource section of this lecture   

This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.   

5
Import Text Data in R STUDY NOTE
6
CODE - Import Text Data in R
7
Import Excel, Web Data in R

Importing Of Excel Files

As most of you know, Excel is a spreadsheet application developed by Microsoft. It is an easily accessible tool for organizing, analyzing and storing data in tables and has a widespread use in many different application fields all over the world. It doesn't need to surprise that R has implemented some ways to read, write and manipulate Excel files (and spreadsheets in general).

How To Import Excel Files

Before you start thinking about how to load your Excel files and spreadsheets into R, you need to first make sure that your data is well prepared to be imported.

The readxl package, which is automatically loaded in your R session on startup, can import Excel files with the read_excel() function. 


Use the read_excel() to import a data frame 



If you would neglect to do this, you might experience problems when using the R functions


Using this command you can import Excel File in R

#Importing xls file using readxl package - read_excel()

#install redxl package

install.packages("readxl")

# Load the readxl package

library(readxl)

# Print out the names of both spreadsheets

excel_sheets("urbanpop.xlsx")

# Read the sheets, one by one

pop_1 <- read_excel("urbanpop.xlsx", sheet = 1)

pop_2 <- read_excel("urbanpop.xlsx", sheet = 2)

pop_3 <- read_excel("urbanpop.xlsx", sheet = 3)

# Put pop_1, pop_2 and pop_3 in a list: pop_list

pop_list <- list(pop_1,pop_2,pop_3)

# Display the structure of pop_list

str(pop_list)

# Explore other packages - XLConnect, xlsx, gdata

 

All the codes which are used in this video is given at the end of this chapter.This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

8
Import Excel, Web Data in R STUDY NOTE
9
Export Data in R - Text

Export Data in R - Text,CSV,Excel - Text Study Note

Section 7, Lecture 45

Export Data in R - Text,CSV,Excel

In this tutorial, we will learn how to export data from R environment to different formats.

To export data to the hard drive, you need the file path and an extension. First of all, the path is the location where the data will be stored.

Exporting Text File

You can export text files with  write.table(mydata, "Path../../mydata.txt", sep="t")function. 


Now use this commands to Export Text Files

# Export data in a text file

write.table(hotdogs, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\NewHotdog.txt", sep = "t")


Exporting CSV File

You can export text files with  write csv(mydata, " Path../../mydata.csv")function. 




Now use this commands to Export CSV Files

#Export data in csv

write.csv(my_df, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\my_df.csv")

Exporting Excel File

You can export text files with  write xlsx(mydata, " Path../../mydata.xlsx")function. 




Now use this commands to Export Excel Files

# Export data in excel

install.packages("writexl")

library(writexl)

my_df <- mtcars[1:3,]

write_xlsx(my_df,"D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\Newmtcars.xlsx")


All the codes which are used in this video is given at the end of this chapter.The Text,CSV,Excel files which are used here is available in the resource section of this lecture

This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

10
Export Data in R - CSV & Excel

Export Data in R - Text,CSV,Excel - Text Study Note

Section 7, Lecture 45

Export Data in R - Text,CSV,Excel

In this tutorial, we will learn how to export data from R environment to different formats.

To export data to the hard drive, you need the file path and an extension. First of all, the path is the location where the data will be stored.

Exporting Text File

You can export text files with  write.table(mydata, "Path../../mydata.txt", sep="t")function. 


Now use this commands to Export Text Files

# Export data in a text file

write.table(hotdogs, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\NewHotdog.txt", sep = "t")


Exporting CSV File

You can export text files with  write csv(mydata, " Path../../mydata.csv")function. 




Now use this commands to Export CSV Files

#Export data in csv

write.csv(my_df, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\my_df.csv")

Exporting Excel File

You can export text files with  write xlsx(mydata, " Path../../mydata.xlsx")function. 




Now use this commands to Export Excel Files

# Export data in excel

install.packages("writexl")

library(writexl)

my_df <- mtcars[1:3,]

write_xlsx(my_df,"D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\Newmtcars.xlsx")


All the codes which are used in this video is given at the end of this chapter.The Text,CSV,Excel files which are used here is available in the resource section of this lecture

This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

11
Export Data in R - Text,CSV & Excel Study Note
12
All Code: Import and Export in R

Data Manipulation

1
Data Manipulation - Apply Function

Data Manipulation

The apply() functions form the basis of more complex combinations and helps to perform operations with very few lines of code. More specifically, the family is made up of the 

  • apply()

  • lapply()

  • sapply()

  • tapply() 

  • by functions.

How To Use apply() in R

Let’s start with the apply(), which operates on arrays.

The R base manual tells you that it’s called as follows: apply(X, MARGIN, FUNCTION)

where:

  • X is an array or a matrix if the dimension of the array is 2;

  • MARGIN is a variable defining how the function is applied,

when

  • MARGIN=1, it applies over rows,

whereas with

  • MARGIN=2, it works over columns.

  • FUNCTION which is the function that you want to apply to the data. It can be any R function, including a User Defined Function (UDF).

By this command you can  use Apply() function

  1. # Topic 1: Apply Function

  2. ###################################################################################

  3. # apply function helps to apply a function to a matrix row or a column and returns a vector, array or list

  4. # Syntax : apply(x, margin, function), where margin indicates whether the function is to be applied to a row or a column

  5. # margin =1 indicates that the function needs to be applied to  a row

  6. # margin =2  indicates that the function needs to be applied to  a column

  7. # function can be any function such as mean , average, sum


  8. m <- matrix(c(1,2,3,4),2,2)

  9. m

  10. apply(m, 1, sum)

  11. apply(m, 2,sum)

  12. apply(m, 1, mean)

  13. apply(m, 2, mean)




The lapply() Function

You want to apply a given function to every element of a list and obtain a list as result. When you execute ?lapply, you see that the syntax looks like the apply() function.

The difference is that:

It can be used for other objects like dataframes, lists or vectors; 

 And

The output returned is a list (which explains the “l” in the function name), which has the same number of elements as the object passed to it.

By this command you can  use lapply() function


  1. ################################################

  2. #Using sapply and lapply

  3. ################################################

  4. #Lapply() function

  5. #lapply is similar to apply, but it takes a list as an input, and returns a list as the output.

  6. # syntax is lapply(list, function)


  7. #example 1:

  8. data <- list(x = 1:5, y = 6:10, z = 11:15)

  9. data

  10. lapply(data, FUN = median)


  11. #example 2:

  12. data2 <- list(a=c(1,1), b=c(2,2), c=c(3,3))

  13. data2

  14. lapply(data2, sum)

  15. lapply(data2, mean)




The sapply() Function

The sapply() function works like lapply(), but it tries to simplify the output to the most elementary data structure that is possible. And indeed, sapply() is a ‘wrapper’ function for lapply().

An example may help to understand this: let’s say that you want to repeat the extraction operation of a single element as in the last example, but now take the first element of the second row for each matrix.

Applying the lapply() function would give us a list, unless you pass simplify=FALSE as parameter to sapply(). Then, a list will be returned. 

By this command you can  use sapply() function


  1. #Sapply function

  2. # sapply is the same as lapply, but returns a vector instead of a list.

  3. # syntax is sapply(list, function)


  4. #example 1 :

  5. data <- list(x = 1:5, y = 6:10, z = 11:15)

  6. data

  7. lapply(data, FUN = sum)

  8. lapply(data, FUN = median)

  9. unlist(lapply(data, FUN = median))


  10. sapply(data, FUN = sum)

  11. sapply(data, FUN = median)

  12. #Note : if the result are all scalars, then a vector is returned


  13. # however if the result are of same size (>1) then a matrix is returned. Otherwise, the result is returned as list itself

  14. sapply(data, FUN = range)

   

The vapply() Function

And lastly the vapply function .This function is shown in below

      

Arguments

  • .x: A vector.

  • .f: A function to be applied.

  • fun_value: A (generalized) vector; a template for the return value from .f.

  • ... : Optional arguments to .f.

  • use_names: Logical; if TRUE and if X is character, use .x as names for the result unless it had names already.


By this command you can  use vapply() function


  1. #vapply function

  2. # vapply() is similar to sapply() but it explicitly specify the type of return value (integer, double, characters).

  3. vapply(data,sum, FUN.VALUE = double(1))

  4. vapply(data,range, FUN.VALUE = double(2))




By this command you can  use tapply() and mapply() function


  1. ################################################

  2. # Using tapply() and mapply()

  3. ################################################

  4. # tapply() tapply works on vector,

  5. # it apply the function by grouping factors inside the vector.

  6. # syntax is tapply(x, factor, function)

  7. #example 1:

  8. age <- c(23,33,28,21,20,19,34)

  9. gender <- c("m" , "m", "m" , "f", "f", "f" , "m")

  10. f <- factor(gender)

  11. f

  12. tapply(age, f, mean)

  13. tapply(age, gender, mean)



  14. #example number 2

  15. #load the datasets

  16. library(datasets)

  17. #you can view all the datasets

  18. data()

  19. View(mtcars)

  20. class(mtcars)


  21. mtcars$wt

  22. mtcars$cyl

  23. f <- factor(mtcars$cyl)

  24. f

  25. tapply(mtcars$wt, f, mean)

  26. ##############################################################################

  27. # mapply() - mapply is a multivariate version of sapply. It will apply the specified function

  28. # to the first element of each argument first, followed by the second element, and so on.

  29. # syntax is mapply(function...)


  30. ## example number 1

  31. # create a list:

  32. rep(1,4)

  33. rep(2,3)

  34. rep(3,2)

  35. rep(4,1)

  36. a <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))

  37. a


  38. # We can see that we are calling the same function (rep)  where th first argument

  39. # variaes from 1 to 4 and second argument varies from 4 to 1.

  40. # instaed we can use mapply function

  41. b <- mapply(rep, 1:4, 4:1)   

  42. b

  43. #####################################################################################

  44. ####################################################################################


This brings an end to this post, I encourage you to re read the post to understand it completely if you haven’t and THANK YOU.

2
Data Manipulation - Apply Function STUDY NOTE
3
Data Manipulation - select

Data Manipulation

The apply() functions form the basis of more complex combinations and helps to perform operations with very few lines of code. More specifically, the family is made up of the 

  • apply()

  • lapply()

  • sapply()

  • tapply() 

  • by functions.

How To Use apply() in R

Let’s start with the apply(), which operates on arrays.

The R base manual tells you that it’s called as follows: apply(X, MARGIN, FUNCTION)

where:

  • X is an array or a matrix if the dimension of the array is 2;

  • MARGIN is a variable defining how the function is applied,

when

  • MARGIN=1, it applies over rows,

whereas with

  • MARGIN=2, it works over columns.

  • FUNCTION which is the function that you want to apply to the data. It can be any R function, including a User Defined Function (UDF).

By this command you can  use Apply() function

  1. # Topic 1: Apply Function

  2. ###################################################################################

  3. # apply function helps to apply a function to a matrix row or a column and returns a vector, array or list

  4. # Syntax : apply(x, margin, function), where margin indicates whether the function is to be applied to a row or a column

  5. # margin =1 indicates that the function needs to be applied to  a row

  6. # margin =2  indicates that the function needs to be applied to  a column

  7. # function can be any function such as mean , average, sum


  8. m <- matrix(c(1,2,3,4),2,2)

  9. m

  10. apply(m, 1, sum)

  11. apply(m, 2,sum)

  12. apply(m, 1, mean)

  13. apply(m, 2, mean)




The lapply() Function

You want to apply a given function to every element of a list and obtain a list as result. When you execute ?lapply, you see that the syntax looks like the apply() function.

The difference is that:

It can be used for other objects like dataframes, lists or vectors; 

 And

The output returned is a list (which explains the “l” in the function name), which has the same number of elements as the object passed to it.

By this command you can  use lapply() function


  1. ################################################

  2. #Using sapply and lapply

  3. ################################################

  4. #Lapply() function

  5. #lapply is similar to apply, but it takes a list as an input, and returns a list as the output.

  6. # syntax is lapply(list, function)


  7. #example 1:

  8. data <- list(x = 1:5, y = 6:10, z = 11:15)

  9. data

  10. lapply(data, FUN = median)


  11. #example 2:

  12. data2 <- list(a=c(1,1), b=c(2,2), c=c(3,3))

  13. data2

  14. lapply(data2, sum)

  15. lapply(data2, mean)




The sapply() Function

The sapply() function works like lapply(), but it tries to simplify the output to the most elementary data structure that is possible. And indeed, sapply() is a ‘wrapper’ function for lapply().

An example may help to understand this: let’s say that you want to repeat the extraction operation of a single element as in the last example, but now take the first element of the second row for each matrix.

Applying the lapply() function would give us a list, unless you pass simplify=FALSE as parameter to sapply(). Then, a list will be returned. 

By this command you can  use sapply() function


  1. #Sapply function

  2. # sapply is the same as lapply, but returns a vector instead of a list.

  3. # syntax is sapply(list, function)


  4. #example 1 :

  5. data <- list(x = 1:5, y = 6:10, z = 11:15)

  6. data

  7. lapply(data, FUN = sum)

  8. lapply(data, FUN = median)

  9. unlist(lapply(data, FUN = median))


  10. sapply(data, FUN = sum)

  11. sapply(data, FUN = median)

  12. #Note : if the result are all scalars, then a vector is returned


  13. # however if the result are of same size (>1) then a matrix is returned. Otherwise, the result is returned as list itself

  14. sapply(data, FUN = range)

   

The vapply() Function

And lastly the vapply function .This function is shown in below

      

Arguments

  • .x: A vector.

  • .f: A function to be applied.

  • fun_value: A (generalized) vector; a template for the return value from .f.

  • ... : Optional arguments to .f.

  • use_names: Logical; if TRUE and if X is character, use .x as names for the result unless it had names already.


By this command you can  use vapply() function


  1. #vapply function

  2. # vapply() is similar to sapply() but it explicitly specify the type of return value (integer, double, characters).

  3. vapply(data,sum, FUN.VALUE = double(1))

  4. vapply(data,range, FUN.VALUE = double(2))




By this command you can  use tapply() and mapply() function


  1. ################################################

  2. # Using tapply() and mapply()

  3. ################################################

  4. # tapply() tapply works on vector,

  5. # it apply the function by grouping factors inside the vector.

  6. # syntax is tapply(x, factor, function)

  7. #example 1:

  8. age <- c(23,33,28,21,20,19,34)

  9. gender <- c("m" , "m", "m" , "f", "f", "f" , "m")

  10. f <- factor(gender)

  11. f

  12. tapply(age, f, mean)

  13. tapply(age, gender, mean)



  14. #example number 2

  15. #load the datasets

  16. library(datasets)

  17. #you can view all the datasets

  18. data()

  19. View(mtcars)

  20. class(mtcars)


  21. mtcars$wt

  22. mtcars$cyl

  23. f <- factor(mtcars$cyl)

  24. f

  25. tapply(mtcars$wt, f, mean)

  26. ##############################################################################

  27. # mapply() - mapply is a multivariate version of sapply. It will apply the specified function

  28. # to the first element of each argument first, followed by the second element, and so on.

  29. # syntax is mapply(function...)


  30. ## example number 1

  31. # create a list:

  32. rep(1,4)

  33. rep(2,3)

  34. rep(3,2)

  35. rep(4,1)

  36. a <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))

  37. a


  38. # We can see that we are calling the same function (rep)  where th first argument

  39. # variaes from 1 to 4 and second argument varies from 4 to 1.

  40. # instaed we can use mapply function

  41. b <- mapply(rep, 1:4, 4:1)   

  42. b

  43. #####################################################################################

  44. ####################################################################################


This brings an end to this post, I encourage you to re read the post to understand it completely if you haven’t and THANK YOU.

4
Data Manipulation - mutate

Selecting columns using select()

select() keeps only the variables you mention


Use This Command To Perform The Above Mentioned Function

#######################################
#select(): Select specific column from tbl
#######################################
tbl <- select (hflights, ActualElapsedTime, AirTime, ArrDelay, DepDelay )
glimpse(tbl)

#starts_with("X"): every name that starts with "X",
#ends_with("X"): every name that ends with "X",
#contains("X"): every name that contains "X",
#matches("X"): every name that matches "X", where "X" can be a regular expression,
#num_range("x", 1:5): the variables named x01, x02, x03, x04 and x05,
#one_of(x): every name that appears in x, which should be a character vector.

#Example: print out only the UniqueCarrier, FlightNum, TailNum, Cancelled, and CancellationCode columns of hflights

select(hflights, ends_with("Num"))
select(hflights, starts_with("Cancel"))
select(hflights, UniqueCarrier, ends_with("Num"), starts_with("Cancel"))


Create new columns using mutate()

mutate() is the second of five data manipulation functions you will get familiar with in this course. mutate() creates new columns which are added to a copy of the dataset.


Use This Command To Perform The Above Mentioned Function

#######################################
#mutate():  Add columns from existing data
#######################################
g2 <- mutate(hflights, loss = ArrDelay - DepDelay)
g2

g1 <- mutate(hflights, ActualGroundTime = ActualElapsedTime - AirTime)
g1

#hflights$ActualGroundTime <- hflights$ActualElapsedTime - hflights$AirTime

#######################################


Selecting rows using filter()

Filtering data is one of the very basic operation when you work with data. You want to remove a part of the data that is invalid or simply you’re not interested in. Or, you want to zero in on a particular part of the data you want to know more about. Of course, dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.

R comes with a set of logical operators that you can use inside filter():
• <
• <=
• == 
• !=
• !=
• > 


Use This Command To Perform The Above Mentioned Function


#filter() : Filter specific rows which matches the logical condition
#######################################
#R comes with a set of logical operators that you can use inside filter():

#x < y, TRUE if x is less than y
#x <= y, TRUE if x is less than or equal to y
#x == y, TRUE if x equals y
#x != y, TRUE if x does not equal y
#x >= y, TRUE if x is greater than or equal to y
#x > y, TRUE if x is greater than y
#x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)

# All flights that traveled 3000 miles or more
long_flight <- filter(hflights, Distance >= 3000)
View(long_flight)
glimpse(long_flight)

# All flights where taxing took longer than flying
long_journey <- filter(hflights, TaxiIn + TaxiOut > AirTime)
View(long_journey)

# All flights that departed before 5am or arrived after 10pm
All_Day_Journey <- filter(hflights, DepTime < 500 | ArrTime > 2200)

# All flights that departed late but arrived ahead of schedule
Early_Flight <- filter(hflights, DepDelay > 0, ArrDelay < 0)
glimpse(Early_Flight)

# All flights that were cancelled after being delayed
Cancelled_Delay <- filter(hflights, Cancelled == 1, DepDelay > 0)

#How many weekend flights flew a distance of more than 1000 miles but 
#had a total taxiing time below 15 minutes?

w <- filter(hflights, DayOfWeek == 6 |DayOfWeek == 7, Distance >1000, TaxiIn + TaxiOut <15)
nrow(w)

y <- filter(hflights, DayOfWeek %in% c(6,7), Distance > 1000, TaxiIn + TaxiOut < 15)
nrow(y)

#######################################


Arrange or re-order rows using arrange()

To arrange (or re-order) rows by a particular column such as the taxonomic order, list the name of the column you want to arrange the rows 


Use This Command To Perform The Above Mentioned Function


#######################################
#arrange(): reorders the rows according to single or multiple variables,
#######################################
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay)) #Delay not equal to NA
glimpse(dtc)

# Arrange dtc by departure delays
d <- arrange(dtc, DepDelay)

# Arrange dtc so that cancellation reasons are grouped
c <- arrange(dtc,CancellationCode )

#By default, arrange() arranges the rows from smallest to largest. 
#Rows with the smallest value of the variable will appear at the top of the data set. 
#You can reverse this behavior with the desc() function. 

# Arrange according to carrier and decreasing departure delays
des_Flight <- arrange(hflights, desc(DepDelay))

# Arrange flights by total delay (normal order).
arrange(hflights, ArrDelay + DepDelay)

#######################################

5
Data Manipulation - filter

Selecting columns using select()

select() keeps only the variables you mention


Use This Command To Perform The Above Mentioned Function

#######################################
#select(): Select specific column from tbl
#######################################
tbl <- select (hflights, ActualElapsedTime, AirTime, ArrDelay, DepDelay )
glimpse(tbl)

#starts_with("X"): every name that starts with "X",
#ends_with("X"): every name that ends with "X",
#contains("X"): every name that contains "X",
#matches("X"): every name that matches "X", where "X" can be a regular expression,
#num_range("x", 1:5): the variables named x01, x02, x03, x04 and x05,
#one_of(x): every name that appears in x, which should be a character vector.

#Example: print out only the UniqueCarrier, FlightNum, TailNum, Cancelled, and CancellationCode columns of hflights

select(hflights, ends_with("Num"))
select(hflights, starts_with("Cancel"))
select(hflights, UniqueCarrier, ends_with("Num"), starts_with("Cancel"))


Create new columns using mutate()

mutate() is the second of five data manipulation functions you will get familiar with in this course. mutate() creates new columns which are added to a copy of the dataset.


Use This Command To Perform The Above Mentioned Function

#######################################
#mutate():  Add columns from existing data
#######################################
g2 <- mutate(hflights, loss = ArrDelay - DepDelay)
g2

g1 <- mutate(hflights, ActualGroundTime = ActualElapsedTime - AirTime)
g1

#hflights$ActualGroundTime <- hflights$ActualElapsedTime - hflights$AirTime

#######################################


Selecting rows using filter()

Filtering data is one of the very basic operation when you work with data. You want to remove a part of the data that is invalid or simply you’re not interested in. Or, you want to zero in on a particular part of the data you want to know more about. Of course, dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.

R comes with a set of logical operators that you can use inside filter():
• <
• <=
• == 
• !=
• !=
• > 


Use This Command To Perform The Above Mentioned Function


#filter() : Filter specific rows which matches the logical condition
#######################################
#R comes with a set of logical operators that you can use inside filter():

#x < y, TRUE if x is less than y
#x <= y, TRUE if x is less than or equal to y
#x == y, TRUE if x equals y
#x != y, TRUE if x does not equal y
#x >= y, TRUE if x is greater than or equal to y
#x > y, TRUE if x is greater than y
#x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)

# All flights that traveled 3000 miles or more
long_flight <- filter(hflights, Distance >= 3000)
View(long_flight)
glimpse(long_flight)

# All flights where taxing took longer than flying
long_journey <- filter(hflights, TaxiIn + TaxiOut > AirTime)
View(long_journey)

# All flights that departed before 5am or arrived after 10pm
All_Day_Journey <- filter(hflights, DepTime < 500 | ArrTime > 2200)

# All flights that departed late but arrived ahead of schedule
Early_Flight <- filter(hflights, DepDelay > 0, ArrDelay < 0)
glimpse(Early_Flight)

# All flights that were cancelled after being delayed
Cancelled_Delay <- filter(hflights, Cancelled == 1, DepDelay > 0)

#How many weekend flights flew a distance of more than 1000 miles but 
#had a total taxiing time below 15 minutes?

w <- filter(hflights, DayOfWeek == 6 |DayOfWeek == 7, Distance >1000, TaxiIn + TaxiOut <15)
nrow(w)

y <- filter(hflights, DayOfWeek %in% c(6,7), Distance > 1000, TaxiIn + TaxiOut < 15)
nrow(y)

#######################################


Arrange or re-order rows using arrange()

To arrange (or re-order) rows by a particular column such as the taxonomic order, list the name of the column you want to arrange the rows 


Use This Command To Perform The Above Mentioned Function


#######################################
#arrange(): reorders the rows according to single or multiple variables,
#######################################
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay)) #Delay not equal to NA
glimpse(dtc)

# Arrange dtc by departure delays
d <- arrange(dtc, DepDelay)

# Arrange dtc so that cancellation reasons are grouped
c <- arrange(dtc,CancellationCode )

#By default, arrange() arranges the rows from smallest to largest. 
#Rows with the smallest value of the variable will appear at the top of the data set. 
#You can reverse this behavior with the desc() function. 

# Arrange according to carrier and decreasing departure delays
des_Flight <- arrange(hflights, desc(DepDelay))

# Arrange flights by total delay (normal order).
arrange(hflights, ArrDelay + DepDelay)

#######################################

6
Data Manipulation - arrange

Selecting columns using select()

select() keeps only the variables you mention


Use This Command To Perform The Above Mentioned Function

#######################################
#select(): Select specific column from tbl
#######################################
tbl <- select (hflights, ActualElapsedTime, AirTime, ArrDelay, DepDelay )
glimpse(tbl)

#starts_with("X"): every name that starts with "X",
#ends_with("X"): every name that ends with "X",
#contains("X"): every name that contains "X",
#matches("X"): every name that matches "X", where "X" can be a regular expression,
#num_range("x", 1:5): the variables named x01, x02, x03, x04 and x05,
#one_of(x): every name that appears in x, which should be a character vector.

#Example: print out only the UniqueCarrier, FlightNum, TailNum, Cancelled, and CancellationCode columns of hflights

select(hflights, ends_with("Num"))
select(hflights, starts_with("Cancel"))
select(hflights, UniqueCarrier, ends_with("Num"), starts_with("Cancel"))


Create new columns using mutate()

mutate() is the second of five data manipulation functions you will get familiar with in this course. mutate() creates new columns which are added to a copy of the dataset.


Use This Command To Perform The Above Mentioned Function

#######################################
#mutate():  Add columns from existing data
#######################################
g2 <- mutate(hflights, loss = ArrDelay - DepDelay)
g2

g1 <- mutate(hflights, ActualGroundTime = ActualElapsedTime - AirTime)
g1

#hflights$ActualGroundTime <- hflights$ActualElapsedTime - hflights$AirTime

#######################################


Selecting rows using filter()

Filtering data is one of the very basic operation when you work with data. You want to remove a part of the data that is invalid or simply you’re not interested in. Or, you want to zero in on a particular part of the data you want to know more about. Of course, dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.

R comes with a set of logical operators that you can use inside filter():
• <
• <=
• == 
• !=
• !=
• > 


Use This Command To Perform The Above Mentioned Function


#filter() : Filter specific rows which matches the logical condition
#######################################
#R comes with a set of logical operators that you can use inside filter():

#x < y, TRUE if x is less than y
#x <= y, TRUE if x is less than or equal to y
#x == y, TRUE if x equals y
#x != y, TRUE if x does not equal y
#x >= y, TRUE if x is greater than or equal to y
#x > y, TRUE if x is greater than y
#x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)

# All flights that traveled 3000 miles or more
long_flight <- filter(hflights, Distance >= 3000)
View(long_flight)
glimpse(long_flight)

# All flights where taxing took longer than flying
long_journey <- filter(hflights, TaxiIn + TaxiOut > AirTime)
View(long_journey)

# All flights that departed before 5am or arrived after 10pm
All_Day_Journey <- filter(hflights, DepTime < 500 | ArrTime > 2200)

# All flights that departed late but arrived ahead of schedule
Early_Flight <- filter(hflights, DepDelay > 0, ArrDelay < 0)
glimpse(Early_Flight)

# All flights that were cancelled after being delayed
Cancelled_Delay <- filter(hflights, Cancelled == 1, DepDelay > 0)

#How many weekend flights flew a distance of more than 1000 miles but 
#had a total taxiing time below 15 minutes?

w <- filter(hflights, DayOfWeek == 6 |DayOfWeek == 7, Distance >1000, TaxiIn + TaxiOut <15)
nrow(w)

y <- filter(hflights, DayOfWeek %in% c(6,7), Distance > 1000, TaxiIn + TaxiOut < 15)
nrow(y)

#######################################


Arrange or re-order rows using arrange()

To arrange (or re-order) rows by a particular column such as the taxonomic order, list the name of the column you want to arrange the rows 


Use This Command To Perform The Above Mentioned Function


#######################################
#arrange(): reorders the rows according to single or multiple variables,
#######################################
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay)) #Delay not equal to NA
glimpse(dtc)

# Arrange dtc by departure delays
d <- arrange(dtc, DepDelay)

# Arrange dtc so that cancellation reasons are grouped
c <- arrange(dtc,CancellationCode )

#By default, arrange() arranges the rows from smallest to largest. 
#Rows with the smallest value of the variable will appear at the top of the data set. 
#You can reverse this behavior with the desc() function. 

# Arrange according to carrier and decreasing departure delays
des_Flight <- arrange(hflights, desc(DepDelay))

# Arrange flights by total delay (normal order).
arrange(hflights, ArrDelay + DepDelay)

#######################################

7
mutate(),filter(),arrange() Function Study Note
8
Data Manipulation - Pipe Operator

Create summaries of the data frame using summarise()

The summarise() function will create summary statistics for a given column in the data frame such as finding the mean.


Use This Command To Perform The Above Mentioned Function

#######################################
#summarise(): reduces each group to a single row by calculating aggregate measures.
#######################################
#summarise(), follows the same syntax as mutate(), 
#but the resulting dataset consists of a single row instead of an entire new column in the case of mutate()

#min(x) - minimum value of vector x.
#max(x) - maximum value of vector x.
#mean(x) - mean value of vector x.
#median(x) - median value of vector x.
#quantile(x, p) - pth quantile of vector x.
#sd(x) - standard deviation of vector x.
#var(x) - variance of vector x.
#IQR(x) - Inter Quartile Range (IQR) of vector x.
#diff(range(x)) - total range of vector x.

# Print out a summary with variables 
# min_dist, the shortest distance flown, and max_dist, the longest distance flown
summarise(hflights, max_dist = max(Distance),min_dist = min(Distance))

# Print out a summary of hflights with max_div: the longest Distance for diverted flights.
# Print out a summary with variable max_div
div <- filter(hflights, Diverted ==1 )
summarise(div, max_div = max(Distance))

summarise(filter(hflights, Diverted == 1), max_div = max(Distance))

###########################################################


Pipe operator: %>%

Before we go any futher, let’s introduce the pipe operator: %>%. dplyr imports this operator from another package (magrittr). This operator allows you to pipe the output from one function to the input of another function. Instead of nesting functions (reading from the inside to the outside), the idea of of piping is to read the functions from left to right.

Use This Command To Perform The Above Mentioned Function

#######################################
#Chaining function using Pipe Operators
#######################################
hflights %>%
  filter(DepDelay>240) %>%
  mutate(TaxingTime = TaxiIn + TaxiOut) %>%
  arrange(TaxingTime)%>%
  select(TailNum )

# Write the 'piped' version of the English sentences.
# Use dplyr functions and the pipe operator to transform the following English sentences into R code:

# Take the hflights data set and then ...
# Add a variable named diff that is the result of subtracting TaxiIn from TaxiOut, and then ...
# Pick all of the rows whose diff value does not equal NA, and then ...
# Summarise the data set with a value named avg that is the mean diff value.

hflights %>%
  mutate(diff = TaxiOut - TaxiIn) %>%
  filter(!is.na(diff)) %>%
  summarise(avg = mean(diff))

# mutate() the hflights dataset and add two variables:
# RealTime: the actual elapsed time plus 100 minutes (for the overhead that flying involves) and
# mph: calculated as Distance / RealTime * 60, then
# filter() to keep observations that have an mph that is not NA and that is below 70, finally
# summarise() the result by creating four summary variables:
# n_less, the number of observations,
# n_dest, the number of destinations,
# min_dist, the minimum distance and
# max_dist, the maximum distance.

# Chain together mutate(), filter() and summarise()
hflights %>%
  mutate(RealTime = ActualElapsedTime + 100, mph = Distance / RealTime * 60) %>%
  filter(!is.na(mph), mph < 70) %>%
  summarise(n_less = n(), 
            n_dest = n_distinct(Dest), 
            min_dist = min(Distance), 
            max_dist = max(Distance))

#######################################

9
Pipe operator Study Note
10
Data Manipulation - group by
11
Group By Function Study Note
12
Data Manipulation - Date

Date with R

Dates can be imported from character, numeric formats using the as.Date function from the base package.

If your data were exported from Excel, they will possibly be in numeric format. Otherwise, they will most likely be stored in character format. If your dates are stored as characters, you simply need to provide as.Date with your vector of dates and the format they are currently stored in

There are a number of different formats you can specify, here are a few of them:

  • %Y: 4-digit year (1982)

  • %y: 2-digit year (82)

  • %m: 2-digit month (01)

  • %d: 2-digit day of the month (13)

  • %A: weekday (Wednesday)

  • %a: abbreviated weekday (Wed)

  • %B: month (January)

  • %b: abbreviated month (Jan)


Use This Command To Perform The Above Mentioned Function


  1. ####################################################################################

  2. ####################################################################################

  3. # Lesson 6:

  4. # Topic 3: Date in R

  5. ###################################################################################

  6. # Today's date


  7. today <- Sys.Date()

  8. today

  9. class(today)



  10. #Creating date from character

  11. character_date <- "1957-03-04"

  12. class(character_date)


  13. # Convert into date class by as.Date function

  14. sp500_birthday <- as.Date(character_date)

  15. sp500_birthday

  16. class(sp500_birthday)



  17. # Date format

  18. #default - ISO 8601 ISO 8601 Standard: year-month-day

  19. as.Date("2017-01-28")


  20. # Alternative form: year/month/day

  21. as.Date("2017/01/28")


  22. #Fails: month/day/year

  23. as.Date("01/28/2017")


  24. # Explicitly tell R the format

  25. as.Date("01/28/2017", format = "%m/%d/%Y")


  26. #Date format

  27. # %d - Day of the month (01-31)

  28. # %m - Month (01-12)

  29. # %y - Year without century (00-99)

  30. # %Y - Year with century (0-9999)

  31. # %b - Abbreviated month name

  32. # %B - Full month name

  33. # "/" "-" "," Common separators


  34. # Example: September 15, 2008

  35. as.Date("September 15, 2008", format = "%B %d, %Y")



  36. # Extract the Weekdays

  37. dates <- as.Date(c("2017-01-02", "2017-05-03", "2017-08-04", "2017-10-17"))

  38. dates

  39. weekdays(dates)



  40. # Extract the months

  41. months(dates)



  42. # Extract the quarters

  43. quarters(dates)

13
Data Manipulation - Date with R STUDY NOTE
14
All Code: Data Manipulation

Data Visualization

1
Introduction to Data Visualization & Scatter Plot

Data Visualization


Basic Visualization


Scatter Plot

Line Chart

Bar Plot

Pie Chart

Histogram

Density plot

Box Plot

Advanced Visualization


Mosaic Plot

Heat Map

3D charts

Correlation Plot

Word Cloud

Scatter Plot


Scatterplots use a collection of points placed using Cartesian Coordinates to display values from two variables. By displaying a variable  in each axis, you can detect if a relationship or correlation between the two variables exists.


                                                 

                                                                 

Use This Command To Perform Above Mentioned Function:


######################################################################

# Lesson 7

# Topic 1: Types of Graphic in R

######################################################################

#########################################################################

#########################################################################

#Following are the basic types of graphs, which can be chosen based on

#the situation and the data available.

# Basic Visualization

# Scatter Plot

# Line Chart

# Bar Plot

# Pie Chart

# Histogram

# Density plot

# Box Plot

# Advanced Visualization

# Mosaic Plot

# Heat Map

# 3D charts

# Correlation Plot

# Word Cloud

#########################################################################

# Basic plot - Scatter Plot

# Example -1

x <- c (1, 2, 3, 4, 5)

y <- c (1, 5, 3, 2, 0)

plot (x, y)

# Example -2

dose <- c(20, 30, 40, 50, 60)

drugA <- c(16, 20, 27, 40, 60)

drugB <- c(40, 31, 25, 18, 12)

plot(dose, drugA)

plot(dose, drugB)

help(plot)

#type argument

#"p" for points,

#"l" for lines,

#"b" for both,

#"c" for the lines part alone of "b",

#"o" for both 'overplotted',

#"h" for 'histogram' like (or 'high-density') vertical lines,

#"s" for stair steps,

#"S" for other steps, see 'Details' below,

#"n" for no plotting.

#Different types of plot

plot(dose, drugA, type="p")

plot(dose, drugA, type="l")

plot(dose, drugA, type="b")

plot(dose, drugA, type="c")

plot(dose, drugA, type="o")

plot(dose, drugA, type="h")

plot(dose, drugA, type="s")

plot(dose, drugA, type="n")

#Example 3

# Load the MASS package

library(MASS)

str(mtcars)

# https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html

########################################################

#[, 1] mpg Miles/(US) gallon

#[, 2] cyl Number of cylinders

#[, 3] disp Displacement (cu.in.)

#[, 4] hp Gross horsepower

#[, 5] drat Rear axle ratio

#[, 6] wt Weight (1000 lbs)

#[, 7] qsec 1/4 mile time

#[, 8] vs Engine (0 = V-shaped, 1 = straight)

#[, 9] am Transmission (0 = automatic, 1 = manual)

#[,10] gear Number of forward gears

#[,11] carb Number of carburetors

########################################################

summary(mtcars)

plot(mtcars$hp, mtcars$mpg)

plot(mtcars$hp, mtcars$mpg, xlab = "Horsepower", ylab = "Gas mileage")

plot(mtcars$hp, mtcars$mpg, xlab = "Horsepower", ylab = "Gas mileage", main = "MPG vs Horsepower")

# Compute max_hp

max_hp <- max(mtcars$hp)

# Compute max_mpg

max_mpg <- max(mtcars$mpg)

plot(mtcars$hp, mtcars$mpg,type = "p",

     xlim = c(0, max_hp),

     ylim = c(0, max_mpg), xlab = "Horsepower",

     ylab = "Miles per gallon", main = "Horsepower vs Mileage")

#################################################################################

2
Data Visualization - Scatter Plot Study Note
3
Data Visualization - mfrow

Data Visualization – mfrow

Create a multi-paneled plotting window. The par(mfrow) function is handy for creating a simple multi-paneled plot, while layout should be used for customized panel plots of varying sizes.


Use This Command To Perform Above Mentioned Function:

  1. # Adding details with par function

  2. #########################################################################

  3. # par function

  4. #View current setting

  5. par()


  6. # Assign the return value from the par() function to plot_pars

  7. plot_pars <- par()


  8. # Display the names of the par() function's list elements

  9. names(plot_pars)


  10. # Display the number of par() function list elements

  11. length(plot_pars)


  12. #########################################################################

  13. #mfrow =c(row,col)

  14. # Creating plot array with mfrow parameter

  15. # Set up a two-by-two plot array

  16. par(mfrow = c(2, 2))


  17. # Plot y1 vs. x1

  18. plot(anscombe$x1, anscombe$y1)


  19. # Plot y2 vs. x2

  20. plot(anscombe$x2, anscombe$y2)


  21. # Plot y3 vs. x3

  22. plot(anscombe$x3, anscombe$y3)


  23. # Plot y4 vs. x4

  24. plot(anscombe$x4, anscombe$y4)



  25. # Define common x and y limits for the four plots

  26. xmin <- min(anscombe$x1, anscombe$x2, anscombe$x3, anscombe$x4)

  27. xmax <- max(anscombe$x1, anscombe$x2, anscombe$x3, anscombe$x4)

  28. ymin <- min(anscombe$y1, anscombe$y2, anscombe$y3, anscombe$y4)

  29. ymax <- max(anscombe$y1, anscombe$y2, anscombe$y3, anscombe$y4)


  30. # Set up a two-by-two plot array

  31. par(mfrow = c(2, 2))


  32. # Plot y1 vs. x1 with common x and y limits, labels & title

  33. plot(anscombe$x1, anscombe$y1,

  34.      xlim = c(xmin, xmax),

  35.      ylim = c(ymin, ymax),

  36.      xlab = "x value", ylab = "y value",

  37.      main = "First dataset")


  38. # Do the same for the y2 vs. x2 plot

  39. plot(anscombe$x2, anscombe$y2,

  40.      xlim = c(xmin, xmax),

  41.      ylim = c(ymin, ymax),

  42.      xlab = "x value", ylab = "y value",

  43.      main = "Second dataset")


  44. # Do the same for the y3 vs. x3 plot

  45. plot(anscombe$x3, anscombe$y3,

  46.      xlim = c(xmin, xmax),

  47.      ylim = c(ymin, ymax),

  48.      xlab = "x value", ylab = "y value",

  49.      main = "Third dataset")


  50. # Do the same for the y4 vs. x4 plot

  51. plot(anscombe$x4, anscombe$y4,

  52.      xlim = c(xmin, xmax),

  53.      ylim = c(ymin, ymax),

  54.      xlab = "x value", ylab = "y value",

  55.      main = "Fourth dataset")

4
Data Visualization - mfrow Study Note
5
Data Visualization - pch

Data Visualization - pch

Different plotting symbols are available in R. The graphical argument used to specify point shapes is pch.

Use This Command To Perform Above Mentioned Function:

  1. #######################################################################

  2. library(MASS)

  3. data("mtcars")


  4. # pch

  5. # Create plot with type = "n"               

  6. plot(mtcars$hp, mtcars$mpg,

  7.      type = "n", xlim = c(0, max_hp),

  8.      ylim = c(0, max_mpg), xlab = "Horsepower",

  9.      ylab = "Miles per gallon")


  10. # Add solid squares to plot

  11. points(mtcars$hp, mtcars$mpg,pch = 15)


  12. # Add open circles to plot

  13. points(mtcars$hp, mtcars$mpg, pch = 1)



  14. # Add open triangles to plot

  15. points(mtcars$hp, mtcars$mpg,pch = 2)



  16. # Create an empty plot using type = "n"

  17. plot(mtcars$hp, mtcars$mpg,

  18.      type = "n", xlim = c(0, max_hp),

  19.      ylim = c(0, max_mpg), xlab = "Horsepower",

  20.      ylab = "Miles per gallon")


  21. # Add points with shapes determined by cylinder number

  22. points(mtcars$hp, mtcars$mpg, pch = mtcars$cyl)


  23. # Create a second empty plot

  24. plot(mtcars$hp, mtcars$mpg, type = "n",

  25.      xlab = "Horsepower", ylab = "Gas mileage")


  26. # Add points with shapes as cylinder characters

  27. points(mtcars$hp, mtcars$mpg,

  28.        pch = as.character(mtcars$cyl))



  29. # Adjusting text position, size, and font

  30. # Create a second empty plot

  31. plot(mtcars$hp, mtcars$mpg, type = "n",

  32.      xlab = "Horsepower", ylab = "Gas mileage")


  33. # Create index3, pointing to 3-cylinder cars

  34. index6 <- which(mtcars$cyl == 6)


  35. # Highlight 6-cylinder cars as solid circles

  36. points(mtcars$hp[index6],

  37.        mtcars$mpg[index6],

  38.        pch = 19)


  39. # Add car names, offset from points, with larger bold text

  40. text(mtcars$hp[index6],

  41.      mtcars$mpg[index6],

  42.      adj = -0.2, cex = 1.2, font = 4)


  43. #################################################################


6
Data Visualization - pch Study Note
7
Data Visualization - Color

Data Visualization – Color

Data visualization (visualisation), or the visual communication of data, is the study or creation of data represented visually. A good graph is easy to read. A goal when creating data visualizations is to convey information in a clear and concise way. One of the most prominent features of most data visualizations is color.Color is important because it lets you set the mood and color lets you guide the viewer’s eye, draw attention to something and therefore tell a story.Both aspects are important for data visualisations.

In data visualization

  • There are 657  builtin color names

  • R uses hexadecimal to represent colors

  • You can create vectors of using rainbow(n),heat.colos(n),terrain.color(n),topo.colors(n) and cm.colors(n).

8
Data Visualization - Color Study Note
9
Data Visualization - Line Chart

Data Visualization -Line Chart

Line charts display information as a series of data points connected by straight line segments on an X-Y axis. They are best used to track changes over time, using equal intervals of time between each data point.


CHARACTERISTICS

  • INCLUDE A ZERO BASELINE IF POSSIBLE

  • DON’T PLOT MORE THAN 4 LINES

  • USE SOLID LINES ONLY

  • USE THE RIGHT HEIGHT

  • LABEL THE LINES DIRECTLY


When to use a line chart

  • Line graphs are useful in that they show data variables and trends very clearly.

  • It helps to make predictions about the results of data not yet recorded. If seeing the trend of your data is the goal, then this is the chart to use.

  • Line charts show time-series relationships using continuous data.

  • They allow a quick assessment of acceleration (lines curving upward), deceleration (lines curving downward), and volatility (up/down frequency).

  • They are excellent for tracking multiple data sets on the same chart to see any correlation in trends.

  • They can also be used to display several dependent variables against one independent variable.

  • Line charts are great visualizations to see how a metric changes over time. For example, the exchange rate for GBP to USD.


By this command we can perform the above mentioned package

#################################################################################
# Line Chart
plot(AirPassengers,type="l")  #Simple Line Plot

#Example 2
# Create the data for the chart.
v <- c(7,12,28,3,41)

# Plot the bar chart. 
plot(v,type = "o")

# Plot the bar chart.
plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall",
     main = "Rain fall chart")

#Multiple Lines
# More than line can be drawn on the same chart by using the line() function
# Create the data for the chart.
t <- c(14,7,6,19,3)

lines(t, type = "o", col = "blue")

#################################################################################


This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

10
Data Visualization - Line Chart Study Note
11
Data Visualization - Bar Plot
12
Data Visualization - Bar Plot STUDY NOTE
13
Data Visualization - Pie Chart
14
Data Visualization - Pie Chart STUDY NOTE
15
Data Visualization - Histogram

Data Visualization - Histogram 

  • A Histogram visualizes the distribution of data over a continuous interval or certain time period. Each bar in a histogram represents the tabulated frequency at each interval/bin.

  • Histograms help give an estimate as to where values are concentrated, what the extremes are and whether there are any gaps or unusual values.

  • They are also useful for giving a rough view of the probability distribution.

  • Histogram is a common variation of charts used to present distribution and relationships of a single variable over a set of categories.

By this command we can perform the above-mentioned package

 

###############################################################################

###############################################################################

#Histogram

#Simple histogram

hist(mtcars$mpg)

#Colored histogram

?hist

#The width of each of the bar can be decided by using breaks.

hist(mtcars$mpg, breaks = 4, col = "lightblue", xlab = "mpg", ylab = "freq")

hist(mtcars$mpg, breaks = 15, col=rainbow(7), xlab = "mpg", ylab = "freq")

#Change of bin

hist(AirPassengers, col=rainbow(7))

#Histogram of the AirPassengers dataset with 5 breakpoints

hist(AirPassengers, breaks=5)

# If you want to have more control over the breakpoints between bins,

# you can enrich the breaks argument by giving it a vector of breakpoints.

# You can do this by using the c() function:

# Compute a histogram for the data values in AirPassengers,

# and set the bins such that they run from 100 to 300, 300 to 500 and 500 to 700.

hist(AirPassengers, breaks= c(100, 300, 500, 700))

# We can use seq(x, y, z) function instaed of c()

# x = begin number of the x-axis,

# y = end number of the x-axis

# z = the interval in which these numbers appear.

hist(AirPassengers, breaks= seq(100, 700, 100))

# Note that you can also combine the two functions:

# Make a histogram for the AirPassengers dataset, start at 100 on the x-axis,

# and from values 200 to 700, make the bins 150 wide

hist(AirPassengers, breaks=c(100, seq(200,600, 150), 700))

###############################################################################

 

This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

16
Data Visualization - Histogram STUDY NOTE
17
Data Visualization - Density Plot
18
Data Visualization - Density Plot STUDY NOTE
19
Data Visualization - Box Plot

Data Visualization - Box Plot

A Box Plot is a convenient way of visually displaying the data distribution through their quartiles.

Box Plots can be drawn either vertically or horizontally.

Although Box Plots may seem primitive in comparison to a Histogram or Density Plot, they have the advantage of taking up less space, which is useful when comparing distributions between many groups or data sets.


The types of observations from viewing a Box Plot:

·         What the key values are, such as: the average, median percentile etc.

·         If there are any outliers and what their values are.

·         Is the data symmetrical.

·         How tightly is the data grouped?

·         If the data is skewed and if so, in what direction.

Two of the most commonly used variation of Box Plot are:

  • Variable-width Box Plots

  • Notched Box Plots.

By this command we can perform the above mentioned package

###############################################################################

# Boxplot

vec <- c(3,2,5,6,4,8,1,2,3,2,4,30,36)

?boxplot

boxplot(vec)

boxplot(vec, varwidth = TRUE)

# Boxplot of MPG by Car Cylinders

# a formula, such as y ~ grp, where y is a numeric vector of data values

# to be split into groups according to the grouping variable grp (usually a factor).

boxplot(mpg~cyl, data = mtcars)

boxplot(mpg~cyl,data=mtcars, main="Car Milage Data",

        xlab="Number of Cylinders", ylab="Miles Per Gallon",col=(c("gold","darkgreen","Blue")))

###############################################################################

#########################################################################

This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

20
Data Visualization - Box Plot STUDY NOTE
21
Data Visualization - Mosaic Plot and Heat Map

Data Visualization - Mosaic Plot

Mosaic plots were introduced by Hartigan and Kleiner in 1981 and expanded on by Friendly in 1994.Mosaic plots are also called Mekko charts due to their resemblance to a Marimekko print.

  • The function Mosaic Plot summarizes the conditional probabilities of co-occurrence of the categorical values in a list of records of the same length. The list of records is assumed to be a full array and the columns to represent categorical values.

  • mosaic plot Z is a graphical method for visualizing data from two or more qualitative variables.

  • It is the multidimensional extension of spine plots, which graphically display the same information for only one variable.

  • It gives an overview of the data and makes it possible to recognize relationships between different variables. For example, independence is shown when the boxes across categories all have the same areas. 


Data Visualization - Heat Map

  • Heatmaps visualize data through variations in coloring.

  • Heatmaps are useful for cross-examining multivariate data, through placing variables in the rows and columns and coloring the cells within the table.

  • Heatmaps are good for showing variance across multiple variables, revealing any patterns, displaying whether any variables are similar to each other, and for detecting if any correlations exist in-between them.

  • Heatmaps can also be used to show the changes in data over time if one of the rows or columns are set to time intervals.

  • Heatmaps are a chart better suited to displaying a more generalized view of numerical data


By this command we can perform the above mentioned package

###############################################################################

#########################################################################

# Mosiac Plot

data(HairEyeColor)

mosaicplot(HairEyeColor)

?mosaicplot

###############################################################################

# Heatmap

# Heat map uses color gradient to make comparisons and

# when you want compare different categories across two dimensions you can make use heat map.

library(MASS)

mtcars

heatmap(as.matrix(mtcars))

?heatmap

heatmap(as.matrix(mtcars), Rowv = NA, Colv = NA, scale = "column", col = cm.colors(256),

        xlab = "Attributes", main = "heatmap")

#########################################################################


This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

22
Data Visualization - Mosaic Plot and Heat Map SYUDY NOTE
23
Data Visualization - 3D Plot

Data Visualization - 3D Plot

3D Plot is used where 2D Plots fails creating a chart.

  • We use the lattice package which acts as Graphical User Interface (GUI).

  • Simply install and load lattice package

  • Use the cloud function

Plotly is a platform for data analysis, graphing, and collaboration. Now, you can you can also make 3D plots. In this post we will show how to make 3D plots with Plotly's R API.


By this command we can perform the above-mentioned package

 

#########################################################################

#3D graph with lattice package

library(lattice)

attach(mtcars)

# Change am column to factor as "Automatic" and "Manual"

mtcars$am[which(mtcars$am == 0)] <- 'Automatic'

mtcars$am[which(mtcars$am == 1)] <- 'Manual'

mtcars$am <- as.factor(mtcars$am)

#3d scatterplot by factor level

cloud(hp~mpg*wt, data = mtcars)

cloud(hp~mpg*wt, data = mtcars, main = "3D Scatterplot")

cloud(hp~mpg*wt, data = mtcars, main = "3D Scatterplot", col = cyl)

cloud(hp~mpg*wt, data = mtcars, main = "3D Scatterplot", col = cyl, pch = 17)

cloud(hp~mpg*wt|am, data = mtcars, main = "3D Scatterplot", col = cyl, pch = 17)

?cloud

##############################################################

# 3D graph with plotly packaage

install.packages("plotly")

library(plotly)

data(mtcars)

# Basic 3D Scatter Plot

plot_ly(mtcars, x = ~wt, y = ~hp, z = ~qsec)

# Basic 3D Scatter Plot with Color

plot_ly(mtcars, x = ~wt, y = ~hp, z = ~qsec, color = ~am, colors = c('#BF382A', '#0C4B8E')) %>%

  add_markers() %>%

  layout(scene = list(xaxis = list(title = 'Weight'),

                      yaxis = list(title = 'horsepower'),

                      zaxis = list(title = 'qsec')))

#3D Scatter Plot with color scaling

plot_ly(mtcars, x = ~wt, y = ~hp, z = ~qsec,

             marker = list(color = ~mpg, colorscale = c('#FFE1A1', '#683531'), showscale = TRUE)) %>%

  add_markers() %>%

  layout(scene = list(xaxis = list(title = 'Weight'),

                      yaxis = list(title = 'horsepower'),

                      zaxis = list(title = 'qsec')),

         annotations = list(

           x = 1.13,

           y = 1.05,

           text = 'Miles/(US) gallon',

           xref = 'paper',

           yref = 'paper',

           showarrow = FALSE

         ))

       

# Load the `plotly` library

library(plotly)

# Your volcano data

str(volcano)

volcano

# The 3d surface map

plot_ly(z = ~volcano, type = "surface")

#########################################################################

 

 

This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.

You can view and review the lecture materials indefinitely, like an on-demand channel.
Definitely! If you have an internet connection, courses on Udemy are available on any device at any time. If you don't have an internet connection, some instructors also let their students download course lectures. That's up to the instructor though, so make sure you get on their good side!
4.3
4.3 out of 5
485 Ratings

Detailed Rating

Stars 5
200
Stars 4
191
Stars 3
60
Stars 2
16
Stars 1
18
30-Day Money-Back Guarantee

Includes

29 hours on-demand video
103 articles
Full lifetime access
Access on mobile and TV
Certificate of Completion
Data Science Masterclass With R 8 Case Studies + 4 Projects
Price:
$29.98 $23

Community

For Professionals

For Businesses

We support Sales, Marketing, Account Management and CX professionals. Learn new skills. Share your expertise. Connect with experts. Get inspired.

Community

Partnership Opportunities

Layer 1
samcx.com
Logo
Register New Account
Compare items
  • Total (0)
Compare
0