Are you planing to build your career in Data Science in This Year?
Do you the the Average Salary of a Data Scientist is $100,000/yr?
Do you know over 10 Million+ New Job will be created for the Data Science Filed in Just Next 3 years??
If you are a Student / a Job Holder/ a Job Seeker then it is the Right time for you to go for Data Science!
Do you Ever Wonder that Data Science is the “Hottest” Job Globally in 2018 – 2019!
>> 30+ Hours Video
>> 4 Capstone Projects
>> 8+ Case Studies
>> 24×7 Support
>>ENROLL TODAY & GET DATA SCIENCE INTERVIEW PREPARATION COURSE FOR FREE <<
What Projects We are Going to Cover In the Course?
Project 1– Titanic Case Study which is based on Classification Problem.
Project 2 – E-commerce Sale Data Analysis – based on Regression.
Project 3 – Customer Segmentation which is based on Unsupervised learning.
Final Project – Market Basket Analysis – based on Association rule mining
Why Data Science is a MUST HAVE for Now A Days?
The Answer Why Data Science is a Must have for Now a days will take a lot of time to explain. Let’s have a look into the Company name who are using Data Science and Machine Learning. Then You will get the Idea How it BOOST your Salary if you have Depth Knowledge in Data Science & Machine Learning!
What Students Are Saying:
“A great course to kick-start journey in Machine Learning. It gives a clear contextual overview in most areas of Machine Learning . The effort in explaining the intuition of algorithms is especially useful“
– John Doe, Co-Founder, Impressive LLC
I simply love this course and I definitely learned a ton of new concepts.
Nevertheless, I wish there was some real life examples at the end of the course. A few homework problems and solutions would’ve been good enough.
– – Brain Dee, Data Scientist
It was amazing experience. I really liked the course. The way the trainers explained the concepts were too good. The only think which I thought was missing was more of real world datasets and application in the course. Overall it was great experience. The course will really help the beginners to gain knowledge. Cheers to the team
– – Devon Smeeth, Software Developer
Above, we just give you a very few examples why you Should move into Data Science and Test the Hot Demanding Job Market Ever Created!
The Good News is That From this Hands On Data Science and Machine Learning in R course You will Learn All the Knowledge what you need to be a MASTER in Data Science.
Why Data Science is a MUST HAVE for Now A Days?
The Answer Why Data Science is a Must have for Now a days will take a lot of time to explain. Let’s have a look into the Company name who are using Data Science and Machine Learning. Then You will get the Idea How it BOOST your Salary if you have Depth Knowledge in Data Science & Machine Learning!
Here we list a Very Few Companies : –
Google – For Advertise Serving, Advertise Targeting, Self Driving Car, Super Computer, Google Home etc. Google use Data Science + ML + AI to Take Decision
Apple: Apple Use Data Science in different places like: Siri, Face Detection etc
Facebook: Data Science , Machine Learning and AI used in Graph Algorithm for Find a Friend, Photo Tagging, Advertising Targeting, Chat bot, Face Detection etc
NASA: Use Data Science For different Purpose
Microsoft: Amplifying human ingenuity with Data Science
So From the List of the Companies you can Understand all Big Giant to Very Small Startups all are chessing Data Science and Artificial Intelligence and it the Opportunity for You!
Why Choose This Data Science with R Course?
We not only “How” to do it but also Cover “WHY” to do it?
Theory explained by Hands On Example!
30+ Hours Long Data Science Course
100+ Study Materials on Each and Every Topic of Data Science!
Code Templates are Ready to Download! Save a lot of Time
What You Will Learn From The Data Science MASTERCLASS Course:
Learn what is Data science and how Data Science is helping the modern world!
What are the benefits of Data Science , Machine Learning and Artificial Intelligence
Able to Solve Data Science Related Problem with the Help of R Programming
Why R is a Must Have for Data Science , AI and Machine Learning!
Right Guidance of the Path if You want to be a Data Scientist + Data Science Interview Preparation Guide
How to switch career in Data Science?
R Data Structure – Matrix, Array, Data Frame, Factor, List
Work with R’s conditional statements, functions, and loops
Systematically explore data in R
Data Science Package: Dplyr , GGPlot 2
Index, slice, and Subset Data
Get your data in and out of R – CSV, Excel, Database, Web, Text Data
Data Science – Data Visualization : plot different types of data & draw insights like: Line Chart, Bar Plot, Pie Chart, Histogram, Density Plot, Box Plot, 3D Plot, Mosaic Plot
Data Science – Data Manipulation – Apply function, mutate(), filter(), arrange (), summarise(), groupby(), date in R
Statistics – A Must have for Data Science
Data Science – Hypothesis Testing
Business Use Case Understanding
Data Pre-processing
Supervised Learning
Logistic Regression
K-NN
SVM
Naive Bayes
Decision Tree
Random Forest
K-Mean Clustering
Hierarchical Clustering
DBScan Clustering
PCA (Principal Component Analysis)
Association Rule Mining
Model Deployment
>> 30+ Hours Video
>> 4 Capstone Projects
>> 8+ Case Studies
>> 24×7 Support
>>ENROLL TODAY & GET DATA SCIENCE INTERVIEW PREPARATION COURSE FOR FREE <<
Import and Export in R
You might find that loading data into R can be quite frustrating. Almost every single type of file that you want to get into R seems to require its own function, and even then you might get lost in the functions’ arguments. In short, you might agree that it can be fairly easy to mix up things from time to time, whether you are a beginner or a more advanced R user.
Types of files that we‘ll import
Importing CSV file
Importing Text file
Importing Excel file
Importing files from Database
Importing files from Web
Importing files from Statistical Tool
And lastly Exporting the Data
Importing CSV file
The utils package, which is automatically loaded in your R session on startup, can import CSV files with the read.csv() function.
Use read.csv() to import a data frame
Now use this commands to import CSV Files
#Importing csv file
# read.csv()
titanic_train<- read.csv(file.choose())
class(titanic_train)
titanic <- read.csv("titanic_train.csv")
str(titanic)
#Using readr package
install.packages("readr")
library(readr)
titanic <- read_csv("titanic_train.csv")
titanic
All the codes which are used in this video is given at the end of this chapter.The CSV files which are used here is available in the resource section of this lecture
This brings an end to this post, I encourage you to re read the post to understand it completely if you haven’t and THANK YOU.
Importing Text File
The utils package, which is automatically loaded in your R session on startup, can import text files with the read-table function.
Use read-table to import a data frame
Now use this commands to import Text Files
If you have a .txt or a tab-delimited text file, you can easily import it with the basic R function read.table(). In other words, the contents of your file will look similar to this and can be imported as follows:
# Importing table/text
# read.table ()
# Import the hotdogs.txt file: hotdogs
?read.table
hotdogs <- read.table( "hotdog.txt",sep = "t", header = TRUE)
# Call head() on hotdogs
head(hotdogs)
All the codes which are used in this video is given at the end of this chapter.The Text files which are used here is available in the resource section of this lecture
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Importing Of Excel Files
As most of you know, Excel is a spreadsheet application developed by Microsoft. It is an easily accessible tool for organizing, analyzing and storing data in tables and has a widespread use in many different application fields all over the world. It doesn't need to surprise that R has implemented some ways to read, write and manipulate Excel files (and spreadsheets in general).
How To Import Excel Files
Before you start thinking about how to load your Excel files and spreadsheets into R, you need to first make sure that your data is well prepared to be imported.
The readxl package, which is automatically loaded in your R session on startup, can import Excel files with the read_excel() function.
Use the read_excel() to import a data frame
If you would neglect to do this, you might experience problems when using the R functions
Using this command you can import Excel File in R
#Importing xls file using readxl package - read_excel()
#install redxl package
install.packages("readxl")
# Load the readxl package
library(readxl)
# Print out the names of both spreadsheets
excel_sheets("urbanpop.xlsx")
# Read the sheets, one by one
pop_1 <- read_excel("urbanpop.xlsx", sheet = 1)
pop_2 <- read_excel("urbanpop.xlsx", sheet = 2)
pop_3 <- read_excel("urbanpop.xlsx", sheet = 3)
# Put pop_1, pop_2 and pop_3 in a list: pop_list
pop_list <- list(pop_1,pop_2,pop_3)
# Display the structure of pop_list
str(pop_list)
# Explore other packages - XLConnect, xlsx, gdata
All the codes which are used in this video is given at the end of this chapter.This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Export Data in R - Text,CSV,Excel - Text Study Note
Section 7, Lecture 45
Export Data in R - Text,CSV,Excel
In this tutorial, we will learn how to export data from R environment to different formats.
To export data to the hard drive, you need the file path and an extension. First of all, the path is the location where the data will be stored.
Exporting Text File
You can export text files with write.table(mydata, "Path../../mydata.txt", sep="t")function.
Now use this commands to Export Text Files
# Export data in a text file
write.table(hotdogs, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\NewHotdog.txt", sep = "t")
Exporting CSV File
You can export text files with write csv(mydata, " Path../../mydata.csv")function.
Now use this commands to Export CSV Files
#Export data in csv
write.csv(my_df, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\my_df.csv")
Exporting Excel File
You can export text files with write xlsx(mydata, " Path../../mydata.xlsx")function.
Now use this commands to Export Excel Files
# Export data in excel
install.packages("writexl")
library(writexl)
my_df <- mtcars[1:3,]
write_xlsx(my_df,"D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\Newmtcars.xlsx")
All the codes which are used in this video is given at the end of this chapter.The Text,CSV,Excel files which are used here is available in the resource section of this lecture
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Export Data in R - Text,CSV,Excel - Text Study Note
Section 7, Lecture 45
Export Data in R - Text,CSV,Excel
In this tutorial, we will learn how to export data from R environment to different formats.
To export data to the hard drive, you need the file path and an extension. First of all, the path is the location where the data will be stored.
Exporting Text File
You can export text files with write.table(mydata, "Path../../mydata.txt", sep="t")function.
Now use this commands to Export Text Files
# Export data in a text file
write.table(hotdogs, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\NewHotdog.txt", sep = "t")
Exporting CSV File
You can export text files with write csv(mydata, " Path../../mydata.csv")function.
Now use this commands to Export CSV Files
#Export data in csv
write.csv(my_df, "D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\my_df.csv")
Exporting Excel File
You can export text files with write xlsx(mydata, " Path../../mydata.xlsx")function.
Now use this commands to Export Excel Files
# Export data in excel
install.packages("writexl")
library(writexl)
my_df <- mtcars[1:3,]
write_xlsx(my_df,"D:\Rajib Backup\Project\Innovation\Analytics\Machine Learning\Tutorial\EduCBA\Chap5 -Import and Export\Newmtcars.xlsx")
All the codes which are used in this video is given at the end of this chapter.The Text,CSV,Excel files which are used here is available in the resource section of this lecture
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Data Manipulation
The apply() functions form the basis of more complex combinations and helps to perform operations with very few lines of code. More specifically, the family is made up of the
apply()
lapply()
sapply()
tapply()
by functions.
How To Use apply() in R
Let’s start with the apply(), which operates on arrays.
The R base manual tells you that it’s called as follows: apply(X, MARGIN, FUNCTION)
where:
X is an array or a matrix if the dimension of the array is 2;
MARGIN is a variable defining how the function is applied,
when
MARGIN=1, it applies over rows,
whereas with
MARGIN=2, it works over columns.
FUNCTION which is the function that you want to apply to the data. It can be any R function, including a User Defined Function (UDF).
By this command you can use Apply() function
# Topic 1: Apply Function
###################################################################################
# apply function helps to apply a function to a matrix row or a column and returns a vector, array or list
# Syntax : apply(x, margin, function), where margin indicates whether the function is to be applied to a row or a column
# margin =1 indicates that the function needs to be applied to a row
# margin =2 indicates that the function needs to be applied to a column
# function can be any function such as mean , average, sum
m <- matrix(c(1,2,3,4),2,2)
m
apply(m, 1, sum)
apply(m, 2,sum)
apply(m, 1, mean)
apply(m, 2, mean)
The lapply() Function
You want to apply a given function to every element of a list and obtain a list as result. When you execute ?lapply, you see that the syntax looks like the apply() function.
The difference is that:
It can be used for other objects like dataframes, lists or vectors;
And
The output returned is a list (which explains the “l” in the function name), which has the same number of elements as the object passed to it.
By this command you can use lapply() function
################################################
#Using sapply and lapply
################################################
#Lapply() function
#lapply is similar to apply, but it takes a list as an input, and returns a list as the output.
# syntax is lapply(list, function)
#example 1:
data <- list(x = 1:5, y = 6:10, z = 11:15)
data
lapply(data, FUN = median)
#example 2:
data2 <- list(a=c(1,1), b=c(2,2), c=c(3,3))
data2
lapply(data2, sum)
lapply(data2, mean)
The sapply() Function
The sapply() function works like lapply(), but it tries to simplify the output to the most elementary data structure that is possible. And indeed, sapply() is a ‘wrapper’ function for lapply().
An example may help to understand this: let’s say that you want to repeat the extraction operation of a single element as in the last example, but now take the first element of the second row for each matrix.
Applying the lapply() function would give us a list, unless you pass simplify=FALSE as parameter to sapply(). Then, a list will be returned.
By this command you can use sapply() function
#Sapply function
# sapply is the same as lapply, but returns a vector instead of a list.
# syntax is sapply(list, function)
#example 1 :
data <- list(x = 1:5, y = 6:10, z = 11:15)
data
lapply(data, FUN = sum)
lapply(data, FUN = median)
unlist(lapply(data, FUN = median))
sapply(data, FUN = sum)
sapply(data, FUN = median)
#Note : if the result are all scalars, then a vector is returned
# however if the result are of same size (>1) then a matrix is returned. Otherwise, the result is returned as list itself
sapply(data, FUN = range)
The vapply() Function
And lastly the vapply function .This function is shown in below
Arguments
.x: A vector.
.f: A function to be applied.
fun_value: A (generalized) vector; a template for the return value from .f.
... : Optional arguments to .f.
use_names: Logical; if TRUE and if X is character, use .x as names for the result unless it had names already.
By this command you can use vapply() function
#vapply function
# vapply() is similar to sapply() but it explicitly specify the type of return value (integer, double, characters).
vapply(data,sum, FUN.VALUE = double(1))
vapply(data,range, FUN.VALUE = double(2))
By this command you can use tapply() and mapply() function
################################################
# Using tapply() and mapply()
################################################
# tapply() tapply works on vector,
# it apply the function by grouping factors inside the vector.
# syntax is tapply(x, factor, function)
#example 1:
age <- c(23,33,28,21,20,19,34)
gender <- c("m" , "m", "m" , "f", "f", "f" , "m")
f <- factor(gender)
f
tapply(age, f, mean)
tapply(age, gender, mean)
#example number 2
#load the datasets
library(datasets)
#you can view all the datasets
data()
View(mtcars)
class(mtcars)
mtcars$wt
mtcars$cyl
f <- factor(mtcars$cyl)
f
tapply(mtcars$wt, f, mean)
##############################################################################
# mapply() - mapply is a multivariate version of sapply. It will apply the specified function
# to the first element of each argument first, followed by the second element, and so on.
# syntax is mapply(function...)
## example number 1
# create a list:
rep(1,4)
rep(2,3)
rep(3,2)
rep(4,1)
a <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
a
# We can see that we are calling the same function (rep) where th first argument
# variaes from 1 to 4 and second argument varies from 4 to 1.
# instaed we can use mapply function
b <- mapply(rep, 1:4, 4:1)
b
#####################################################################################
####################################################################################
This brings an end to this post, I encourage you to re read the post to understand it completely if you haven’t and THANK YOU.
Data Manipulation
The apply() functions form the basis of more complex combinations and helps to perform operations with very few lines of code. More specifically, the family is made up of the
apply()
lapply()
sapply()
tapply()
by functions.
How To Use apply() in R
Let’s start with the apply(), which operates on arrays.
The R base manual tells you that it’s called as follows: apply(X, MARGIN, FUNCTION)
where:
X is an array or a matrix if the dimension of the array is 2;
MARGIN is a variable defining how the function is applied,
when
MARGIN=1, it applies over rows,
whereas with
MARGIN=2, it works over columns.
FUNCTION which is the function that you want to apply to the data. It can be any R function, including a User Defined Function (UDF).
By this command you can use Apply() function
# Topic 1: Apply Function
###################################################################################
# apply function helps to apply a function to a matrix row or a column and returns a vector, array or list
# Syntax : apply(x, margin, function), where margin indicates whether the function is to be applied to a row or a column
# margin =1 indicates that the function needs to be applied to a row
# margin =2 indicates that the function needs to be applied to a column
# function can be any function such as mean , average, sum
m <- matrix(c(1,2,3,4),2,2)
m
apply(m, 1, sum)
apply(m, 2,sum)
apply(m, 1, mean)
apply(m, 2, mean)
The lapply() Function
You want to apply a given function to every element of a list and obtain a list as result. When you execute ?lapply, you see that the syntax looks like the apply() function.
The difference is that:
It can be used for other objects like dataframes, lists or vectors;
And
The output returned is a list (which explains the “l” in the function name), which has the same number of elements as the object passed to it.
By this command you can use lapply() function
################################################
#Using sapply and lapply
################################################
#Lapply() function
#lapply is similar to apply, but it takes a list as an input, and returns a list as the output.
# syntax is lapply(list, function)
#example 1:
data <- list(x = 1:5, y = 6:10, z = 11:15)
data
lapply(data, FUN = median)
#example 2:
data2 <- list(a=c(1,1), b=c(2,2), c=c(3,3))
data2
lapply(data2, sum)
lapply(data2, mean)
The sapply() Function
The sapply() function works like lapply(), but it tries to simplify the output to the most elementary data structure that is possible. And indeed, sapply() is a ‘wrapper’ function for lapply().
An example may help to understand this: let’s say that you want to repeat the extraction operation of a single element as in the last example, but now take the first element of the second row for each matrix.
Applying the lapply() function would give us a list, unless you pass simplify=FALSE as parameter to sapply(). Then, a list will be returned.
By this command you can use sapply() function
#Sapply function
# sapply is the same as lapply, but returns a vector instead of a list.
# syntax is sapply(list, function)
#example 1 :
data <- list(x = 1:5, y = 6:10, z = 11:15)
data
lapply(data, FUN = sum)
lapply(data, FUN = median)
unlist(lapply(data, FUN = median))
sapply(data, FUN = sum)
sapply(data, FUN = median)
#Note : if the result are all scalars, then a vector is returned
# however if the result are of same size (>1) then a matrix is returned. Otherwise, the result is returned as list itself
sapply(data, FUN = range)
The vapply() Function
And lastly the vapply function .This function is shown in below
Arguments
.x: A vector.
.f: A function to be applied.
fun_value: A (generalized) vector; a template for the return value from .f.
... : Optional arguments to .f.
use_names: Logical; if TRUE and if X is character, use .x as names for the result unless it had names already.
By this command you can use vapply() function
#vapply function
# vapply() is similar to sapply() but it explicitly specify the type of return value (integer, double, characters).
vapply(data,sum, FUN.VALUE = double(1))
vapply(data,range, FUN.VALUE = double(2))
By this command you can use tapply() and mapply() function
################################################
# Using tapply() and mapply()
################################################
# tapply() tapply works on vector,
# it apply the function by grouping factors inside the vector.
# syntax is tapply(x, factor, function)
#example 1:
age <- c(23,33,28,21,20,19,34)
gender <- c("m" , "m", "m" , "f", "f", "f" , "m")
f <- factor(gender)
f
tapply(age, f, mean)
tapply(age, gender, mean)
#example number 2
#load the datasets
library(datasets)
#you can view all the datasets
data()
View(mtcars)
class(mtcars)
mtcars$wt
mtcars$cyl
f <- factor(mtcars$cyl)
f
tapply(mtcars$wt, f, mean)
##############################################################################
# mapply() - mapply is a multivariate version of sapply. It will apply the specified function
# to the first element of each argument first, followed by the second element, and so on.
# syntax is mapply(function...)
## example number 1
# create a list:
rep(1,4)
rep(2,3)
rep(3,2)
rep(4,1)
a <- list(rep(1,4), rep(2,3), rep(3,2), rep(4,1))
a
# We can see that we are calling the same function (rep) where th first argument
# variaes from 1 to 4 and second argument varies from 4 to 1.
# instaed we can use mapply function
b <- mapply(rep, 1:4, 4:1)
b
#####################################################################################
####################################################################################
This brings an end to this post, I encourage you to re read the post to understand it completely if you haven’t and THANK YOU.
Selecting columns using select()
select() keeps only the variables you mention
Use This Command To Perform The Above Mentioned Function
#######################################
#select(): Select specific column from tbl
#######################################
tbl <- select (hflights, ActualElapsedTime, AirTime, ArrDelay, DepDelay )
glimpse(tbl)
#starts_with("X"): every name that starts with "X",
#ends_with("X"): every name that ends with "X",
#contains("X"): every name that contains "X",
#matches("X"): every name that matches "X", where "X" can be a regular expression,
#num_range("x", 1:5): the variables named x01, x02, x03, x04 and x05,
#one_of(x): every name that appears in x, which should be a character vector.
#Example: print out only the UniqueCarrier, FlightNum, TailNum, Cancelled, and CancellationCode columns of hflights
select(hflights, ends_with("Num"))
select(hflights, starts_with("Cancel"))
select(hflights, UniqueCarrier, ends_with("Num"), starts_with("Cancel"))
Create new columns using mutate()
mutate() is the second of five data manipulation functions you will get familiar with in this course. mutate() creates new columns which are added to a copy of the dataset.
Use This Command To Perform The Above Mentioned Function
#######################################
#mutate(): Add columns from existing data
#######################################
g2 <- mutate(hflights, loss = ArrDelay - DepDelay)
g2
g1 <- mutate(hflights, ActualGroundTime = ActualElapsedTime - AirTime)
g1
#hflights$ActualGroundTime <- hflights$ActualElapsedTime - hflights$AirTime
#######################################
Selecting rows using filter()
Filtering data is one of the very basic operation when you work with data. You want to remove a part of the data that is invalid or simply you’re not interested in. Or, you want to zero in on a particular part of the data you want to know more about. Of course, dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.
R comes with a set of logical operators that you can use inside filter():
• <
• <=
• ==
• !=
• !=
• >
Use This Command To Perform The Above Mentioned Function
#filter() : Filter specific rows which matches the logical condition
#######################################
#R comes with a set of logical operators that you can use inside filter():
#x < y, TRUE if x is less than y
#x <= y, TRUE if x is less than or equal to y
#x == y, TRUE if x equals y
#x != y, TRUE if x does not equal y
#x >= y, TRUE if x is greater than or equal to y
#x > y, TRUE if x is greater than y
#x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)
# All flights that traveled 3000 miles or more
long_flight <- filter(hflights, Distance >= 3000)
View(long_flight)
glimpse(long_flight)
# All flights where taxing took longer than flying
long_journey <- filter(hflights, TaxiIn + TaxiOut > AirTime)
View(long_journey)
# All flights that departed before 5am or arrived after 10pm
All_Day_Journey <- filter(hflights, DepTime < 500 | ArrTime > 2200)
# All flights that departed late but arrived ahead of schedule
Early_Flight <- filter(hflights, DepDelay > 0, ArrDelay < 0)
glimpse(Early_Flight)
# All flights that were cancelled after being delayed
Cancelled_Delay <- filter(hflights, Cancelled == 1, DepDelay > 0)
#How many weekend flights flew a distance of more than 1000 miles but
#had a total taxiing time below 15 minutes?
w <- filter(hflights, DayOfWeek == 6 |DayOfWeek == 7, Distance >1000, TaxiIn + TaxiOut <15)
nrow(w)
y <- filter(hflights, DayOfWeek %in% c(6,7), Distance > 1000, TaxiIn + TaxiOut < 15)
nrow(y)
#######################################
Arrange or re-order rows using arrange()
To arrange (or re-order) rows by a particular column such as the taxonomic order, list the name of the column you want to arrange the rows
Use This Command To Perform The Above Mentioned Function
#######################################
#arrange(): reorders the rows according to single or multiple variables,
#######################################
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay)) #Delay not equal to NA
glimpse(dtc)
# Arrange dtc by departure delays
d <- arrange(dtc, DepDelay)
# Arrange dtc so that cancellation reasons are grouped
c <- arrange(dtc,CancellationCode )
#By default, arrange() arranges the rows from smallest to largest.
#Rows with the smallest value of the variable will appear at the top of the data set.
#You can reverse this behavior with the desc() function.
# Arrange according to carrier and decreasing departure delays
des_Flight <- arrange(hflights, desc(DepDelay))
# Arrange flights by total delay (normal order).
arrange(hflights, ArrDelay + DepDelay)
#######################################
Selecting columns using select()
select() keeps only the variables you mention
Use This Command To Perform The Above Mentioned Function
#######################################
#select(): Select specific column from tbl
#######################################
tbl <- select (hflights, ActualElapsedTime, AirTime, ArrDelay, DepDelay )
glimpse(tbl)
#starts_with("X"): every name that starts with "X",
#ends_with("X"): every name that ends with "X",
#contains("X"): every name that contains "X",
#matches("X"): every name that matches "X", where "X" can be a regular expression,
#num_range("x", 1:5): the variables named x01, x02, x03, x04 and x05,
#one_of(x): every name that appears in x, which should be a character vector.
#Example: print out only the UniqueCarrier, FlightNum, TailNum, Cancelled, and CancellationCode columns of hflights
select(hflights, ends_with("Num"))
select(hflights, starts_with("Cancel"))
select(hflights, UniqueCarrier, ends_with("Num"), starts_with("Cancel"))
Create new columns using mutate()
mutate() is the second of five data manipulation functions you will get familiar with in this course. mutate() creates new columns which are added to a copy of the dataset.
Use This Command To Perform The Above Mentioned Function
#######################################
#mutate(): Add columns from existing data
#######################################
g2 <- mutate(hflights, loss = ArrDelay - DepDelay)
g2
g1 <- mutate(hflights, ActualGroundTime = ActualElapsedTime - AirTime)
g1
#hflights$ActualGroundTime <- hflights$ActualElapsedTime - hflights$AirTime
#######################################
Selecting rows using filter()
Filtering data is one of the very basic operation when you work with data. You want to remove a part of the data that is invalid or simply you’re not interested in. Or, you want to zero in on a particular part of the data you want to know more about. Of course, dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.
R comes with a set of logical operators that you can use inside filter():
• <
• <=
• ==
• !=
• !=
• >
Use This Command To Perform The Above Mentioned Function
#filter() : Filter specific rows which matches the logical condition
#######################################
#R comes with a set of logical operators that you can use inside filter():
#x < y, TRUE if x is less than y
#x <= y, TRUE if x is less than or equal to y
#x == y, TRUE if x equals y
#x != y, TRUE if x does not equal y
#x >= y, TRUE if x is greater than or equal to y
#x > y, TRUE if x is greater than y
#x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)
# All flights that traveled 3000 miles or more
long_flight <- filter(hflights, Distance >= 3000)
View(long_flight)
glimpse(long_flight)
# All flights where taxing took longer than flying
long_journey <- filter(hflights, TaxiIn + TaxiOut > AirTime)
View(long_journey)
# All flights that departed before 5am or arrived after 10pm
All_Day_Journey <- filter(hflights, DepTime < 500 | ArrTime > 2200)
# All flights that departed late but arrived ahead of schedule
Early_Flight <- filter(hflights, DepDelay > 0, ArrDelay < 0)
glimpse(Early_Flight)
# All flights that were cancelled after being delayed
Cancelled_Delay <- filter(hflights, Cancelled == 1, DepDelay > 0)
#How many weekend flights flew a distance of more than 1000 miles but
#had a total taxiing time below 15 minutes?
w <- filter(hflights, DayOfWeek == 6 |DayOfWeek == 7, Distance >1000, TaxiIn + TaxiOut <15)
nrow(w)
y <- filter(hflights, DayOfWeek %in% c(6,7), Distance > 1000, TaxiIn + TaxiOut < 15)
nrow(y)
#######################################
Arrange or re-order rows using arrange()
To arrange (or re-order) rows by a particular column such as the taxonomic order, list the name of the column you want to arrange the rows
Use This Command To Perform The Above Mentioned Function
#######################################
#arrange(): reorders the rows according to single or multiple variables,
#######################################
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay)) #Delay not equal to NA
glimpse(dtc)
# Arrange dtc by departure delays
d <- arrange(dtc, DepDelay)
# Arrange dtc so that cancellation reasons are grouped
c <- arrange(dtc,CancellationCode )
#By default, arrange() arranges the rows from smallest to largest.
#Rows with the smallest value of the variable will appear at the top of the data set.
#You can reverse this behavior with the desc() function.
# Arrange according to carrier and decreasing departure delays
des_Flight <- arrange(hflights, desc(DepDelay))
# Arrange flights by total delay (normal order).
arrange(hflights, ArrDelay + DepDelay)
#######################################
Selecting columns using select()
select() keeps only the variables you mention
Use This Command To Perform The Above Mentioned Function
#######################################
#select(): Select specific column from tbl
#######################################
tbl <- select (hflights, ActualElapsedTime, AirTime, ArrDelay, DepDelay )
glimpse(tbl)
#starts_with("X"): every name that starts with "X",
#ends_with("X"): every name that ends with "X",
#contains("X"): every name that contains "X",
#matches("X"): every name that matches "X", where "X" can be a regular expression,
#num_range("x", 1:5): the variables named x01, x02, x03, x04 and x05,
#one_of(x): every name that appears in x, which should be a character vector.
#Example: print out only the UniqueCarrier, FlightNum, TailNum, Cancelled, and CancellationCode columns of hflights
select(hflights, ends_with("Num"))
select(hflights, starts_with("Cancel"))
select(hflights, UniqueCarrier, ends_with("Num"), starts_with("Cancel"))
Create new columns using mutate()
mutate() is the second of five data manipulation functions you will get familiar with in this course. mutate() creates new columns which are added to a copy of the dataset.
Use This Command To Perform The Above Mentioned Function
#######################################
#mutate(): Add columns from existing data
#######################################
g2 <- mutate(hflights, loss = ArrDelay - DepDelay)
g2
g1 <- mutate(hflights, ActualGroundTime = ActualElapsedTime - AirTime)
g1
#hflights$ActualGroundTime <- hflights$ActualElapsedTime - hflights$AirTime
#######################################
Selecting rows using filter()
Filtering data is one of the very basic operation when you work with data. You want to remove a part of the data that is invalid or simply you’re not interested in. Or, you want to zero in on a particular part of the data you want to know more about. Of course, dplyr has ’filter()’ function to do such filtering, but there is even more. With dplyr you can do the kind of filtering, which could be hard to perform or complicated to construct with tools like SQL and traditional BI tools, in such a simple and more intuitive way.
R comes with a set of logical operators that you can use inside filter():
• <
• <=
• ==
• !=
• !=
• >
Use This Command To Perform The Above Mentioned Function
#filter() : Filter specific rows which matches the logical condition
#######################################
#R comes with a set of logical operators that you can use inside filter():
#x < y, TRUE if x is less than y
#x <= y, TRUE if x is less than or equal to y
#x == y, TRUE if x equals y
#x != y, TRUE if x does not equal y
#x >= y, TRUE if x is greater than or equal to y
#x > y, TRUE if x is greater than y
#x %in% c(a, b, c), TRUE if x is in the vector c(a, b, c)
# All flights that traveled 3000 miles or more
long_flight <- filter(hflights, Distance >= 3000)
View(long_flight)
glimpse(long_flight)
# All flights where taxing took longer than flying
long_journey <- filter(hflights, TaxiIn + TaxiOut > AirTime)
View(long_journey)
# All flights that departed before 5am or arrived after 10pm
All_Day_Journey <- filter(hflights, DepTime < 500 | ArrTime > 2200)
# All flights that departed late but arrived ahead of schedule
Early_Flight <- filter(hflights, DepDelay > 0, ArrDelay < 0)
glimpse(Early_Flight)
# All flights that were cancelled after being delayed
Cancelled_Delay <- filter(hflights, Cancelled == 1, DepDelay > 0)
#How many weekend flights flew a distance of more than 1000 miles but
#had a total taxiing time below 15 minutes?
w <- filter(hflights, DayOfWeek == 6 |DayOfWeek == 7, Distance >1000, TaxiIn + TaxiOut <15)
nrow(w)
y <- filter(hflights, DayOfWeek %in% c(6,7), Distance > 1000, TaxiIn + TaxiOut < 15)
nrow(y)
#######################################
Arrange or re-order rows using arrange()
To arrange (or re-order) rows by a particular column such as the taxonomic order, list the name of the column you want to arrange the rows
Use This Command To Perform The Above Mentioned Function
#######################################
#arrange(): reorders the rows according to single or multiple variables,
#######################################
dtc <- filter(hflights, Cancelled == 1, !is.na(DepDelay)) #Delay not equal to NA
glimpse(dtc)
# Arrange dtc by departure delays
d <- arrange(dtc, DepDelay)
# Arrange dtc so that cancellation reasons are grouped
c <- arrange(dtc,CancellationCode )
#By default, arrange() arranges the rows from smallest to largest.
#Rows with the smallest value of the variable will appear at the top of the data set.
#You can reverse this behavior with the desc() function.
# Arrange according to carrier and decreasing departure delays
des_Flight <- arrange(hflights, desc(DepDelay))
# Arrange flights by total delay (normal order).
arrange(hflights, ArrDelay + DepDelay)
#######################################
Create summaries of the data frame using summarise()
The summarise() function will create summary statistics for a given column in the data frame such as finding the mean.
Use This Command To Perform The Above Mentioned Function
#######################################
#summarise(): reduces each group to a single row by calculating aggregate measures.
#######################################
#summarise(), follows the same syntax as mutate(),
#but the resulting dataset consists of a single row instead of an entire new column in the case of mutate()
#min(x) - minimum value of vector x.
#max(x) - maximum value of vector x.
#mean(x) - mean value of vector x.
#median(x) - median value of vector x.
#quantile(x, p) - pth quantile of vector x.
#sd(x) - standard deviation of vector x.
#var(x) - variance of vector x.
#IQR(x) - Inter Quartile Range (IQR) of vector x.
#diff(range(x)) - total range of vector x.
# Print out a summary with variables
# min_dist, the shortest distance flown, and max_dist, the longest distance flown
summarise(hflights, max_dist = max(Distance),min_dist = min(Distance))
# Print out a summary of hflights with max_div: the longest Distance for diverted flights.
# Print out a summary with variable max_div
div <- filter(hflights, Diverted ==1 )
summarise(div, max_div = max(Distance))
summarise(filter(hflights, Diverted == 1), max_div = max(Distance))
###########################################################
Pipe operator: %>%
Before we go any futher, let’s introduce the pipe operator: %>%. dplyr imports this operator from another package (magrittr). This operator allows you to pipe the output from one function to the input of another function. Instead of nesting functions (reading from the inside to the outside), the idea of of piping is to read the functions from left to right.
Use This Command To Perform The Above Mentioned Function
#######################################
#Chaining function using Pipe Operators
#######################################
hflights %>%
filter(DepDelay>240) %>%
mutate(TaxingTime = TaxiIn + TaxiOut) %>%
arrange(TaxingTime)%>%
select(TailNum )
# Write the 'piped' version of the English sentences.
# Use dplyr functions and the pipe operator to transform the following English sentences into R code:
# Take the hflights data set and then ...
# Add a variable named diff that is the result of subtracting TaxiIn from TaxiOut, and then ...
# Pick all of the rows whose diff value does not equal NA, and then ...
# Summarise the data set with a value named avg that is the mean diff value.
hflights %>%
mutate(diff = TaxiOut - TaxiIn) %>%
filter(!is.na(diff)) %>%
summarise(avg = mean(diff))
# mutate() the hflights dataset and add two variables:
# RealTime: the actual elapsed time plus 100 minutes (for the overhead that flying involves) and
# mph: calculated as Distance / RealTime * 60, then
# filter() to keep observations that have an mph that is not NA and that is below 70, finally
# summarise() the result by creating four summary variables:
# n_less, the number of observations,
# n_dest, the number of destinations,
# min_dist, the minimum distance and
# max_dist, the maximum distance.
# Chain together mutate(), filter() and summarise()
hflights %>%
mutate(RealTime = ActualElapsedTime + 100, mph = Distance / RealTime * 60) %>%
filter(!is.na(mph), mph < 70) %>%
summarise(n_less = n(),
n_dest = n_distinct(Dest),
min_dist = min(Distance),
max_dist = max(Distance))
#######################################
Date with R
Dates can be imported from character, numeric formats using the as.Date function from the base package.
If your data were exported from Excel, they will possibly be in numeric format. Otherwise, they will most likely be stored in character format. If your dates are stored as characters, you simply need to provide as.Date with your vector of dates and the format they are currently stored in
There are a number of different formats you can specify, here are a few of them:
%Y: 4-digit year (1982)
%y: 2-digit year (82)
%m: 2-digit month (01)
%d: 2-digit day of the month (13)
%A: weekday (Wednesday)
%a: abbreviated weekday (Wed)
%B: month (January)
%b: abbreviated month (Jan)
Use This Command To Perform The Above Mentioned Function
####################################################################################
####################################################################################
# Lesson 6:
# Topic 3: Date in R
###################################################################################
# Today's date
today <- Sys.Date()
today
class(today)
#Creating date from character
character_date <- "1957-03-04"
class(character_date)
# Convert into date class by as.Date function
sp500_birthday <- as.Date(character_date)
sp500_birthday
class(sp500_birthday)
# Date format
#default - ISO 8601 ISO 8601 Standard: year-month-day
as.Date("2017-01-28")
# Alternative form: year/month/day
as.Date("2017/01/28")
#Fails: month/day/year
as.Date("01/28/2017")
# Explicitly tell R the format
as.Date("01/28/2017", format = "%m/%d/%Y")
#Date format
# %d - Day of the month (01-31)
# %m - Month (01-12)
# %y - Year without century (00-99)
# %Y - Year with century (0-9999)
# %b - Abbreviated month name
# %B - Full month name
# "/" "-" "," Common separators
# Example: September 15, 2008
as.Date("September 15, 2008", format = "%B %d, %Y")
# Extract the Weekdays
dates <- as.Date(c("2017-01-02", "2017-05-03", "2017-08-04", "2017-10-17"))
dates
weekdays(dates)
# Extract the months
months(dates)
# Extract the quarters
quarters(dates)
Data Visualization
Basic Visualization
Scatter Plot
Line Chart
Bar Plot
Pie Chart
Histogram
Density plot
Box Plot
Advanced Visualization
Mosaic Plot
Heat Map
3D charts
Correlation Plot
Word Cloud
Scatter Plot
Scatterplots use a collection of points placed using Cartesian Coordinates to display values from two variables. By displaying a variable in each axis, you can detect if a relationship or correlation between the two variables exists.
Use This Command To Perform Above Mentioned Function:
######################################################################
# Lesson 7
# Topic 1: Types of Graphic in R
######################################################################
#########################################################################
#########################################################################
#Following are the basic types of graphs, which can be chosen based on
#the situation and the data available.
# Basic Visualization
# Scatter Plot
# Line Chart
# Bar Plot
# Pie Chart
# Histogram
# Density plot
# Box Plot
# Advanced Visualization
# Mosaic Plot
# Heat Map
# 3D charts
# Correlation Plot
# Word Cloud
#########################################################################
# Basic plot - Scatter Plot
# Example -1
x <- c (1, 2, 3, 4, 5)
y <- c (1, 5, 3, 2, 0)
plot (x, y)
# Example -2
dose <- c(20, 30, 40, 50, 60)
drugA <- c(16, 20, 27, 40, 60)
drugB <- c(40, 31, 25, 18, 12)
plot(dose, drugA)
plot(dose, drugB)
help(plot)
#type argument
#"p" for points,
#"l" for lines,
#"b" for both,
#"c" for the lines part alone of "b",
#"o" for both 'overplotted',
#"h" for 'histogram' like (or 'high-density') vertical lines,
#"s" for stair steps,
#"S" for other steps, see 'Details' below,
#"n" for no plotting.
#Different types of plot
plot(dose, drugA, type="p")
plot(dose, drugA, type="l")
plot(dose, drugA, type="b")
plot(dose, drugA, type="c")
plot(dose, drugA, type="o")
plot(dose, drugA, type="h")
plot(dose, drugA, type="s")
plot(dose, drugA, type="n")
#Example 3
# Load the MASS package
library(MASS)
str(mtcars)
# https://stat.ethz.ch/R-manual/R-devel/library/datasets/html/mtcars.html
########################################################
#[, 1] mpg Miles/(US) gallon
#[, 2] cyl Number of cylinders
#[, 3] disp Displacement (cu.in.)
#[, 4] hp Gross horsepower
#[, 5] drat Rear axle ratio
#[, 6] wt Weight (1000 lbs)
#[, 7] qsec 1/4 mile time
#[, 8] vs Engine (0 = V-shaped, 1 = straight)
#[, 9] am Transmission (0 = automatic, 1 = manual)
#[,10] gear Number of forward gears
#[,11] carb Number of carburetors
########################################################
summary(mtcars)
plot(mtcars$hp, mtcars$mpg)
plot(mtcars$hp, mtcars$mpg, xlab = "Horsepower", ylab = "Gas mileage")
plot(mtcars$hp, mtcars$mpg, xlab = "Horsepower", ylab = "Gas mileage", main = "MPG vs Horsepower")
# Compute max_hp
max_hp <- max(mtcars$hp)
# Compute max_mpg
max_mpg <- max(mtcars$mpg)
plot(mtcars$hp, mtcars$mpg,type = "p",
xlim = c(0, max_hp),
ylim = c(0, max_mpg), xlab = "Horsepower",
ylab = "Miles per gallon", main = "Horsepower vs Mileage")
#################################################################################
Data Visualization – mfrow
Create a multi-paneled plotting window. The par(mfrow) function is handy for creating a simple multi-paneled plot, while layout should be used for customized panel plots of varying sizes.
Use This Command To Perform Above Mentioned Function:
# Adding details with par function
#########################################################################
# par function
#View current setting
par()
# Assign the return value from the par() function to plot_pars
plot_pars <- par()
# Display the names of the par() function's list elements
names(plot_pars)
# Display the number of par() function list elements
length(plot_pars)
#########################################################################
#mfrow =c(row,col)
# Creating plot array with mfrow parameter
# Set up a two-by-two plot array
par(mfrow = c(2, 2))
# Plot y1 vs. x1
plot(anscombe$x1, anscombe$y1)
# Plot y2 vs. x2
plot(anscombe$x2, anscombe$y2)
# Plot y3 vs. x3
plot(anscombe$x3, anscombe$y3)
# Plot y4 vs. x4
plot(anscombe$x4, anscombe$y4)
# Define common x and y limits for the four plots
xmin <- min(anscombe$x1, anscombe$x2, anscombe$x3, anscombe$x4)
xmax <- max(anscombe$x1, anscombe$x2, anscombe$x3, anscombe$x4)
ymin <- min(anscombe$y1, anscombe$y2, anscombe$y3, anscombe$y4)
ymax <- max(anscombe$y1, anscombe$y2, anscombe$y3, anscombe$y4)
# Set up a two-by-two plot array
par(mfrow = c(2, 2))
# Plot y1 vs. x1 with common x and y limits, labels & title
plot(anscombe$x1, anscombe$y1,
xlim = c(xmin, xmax),
ylim = c(ymin, ymax),
xlab = "x value", ylab = "y value",
main = "First dataset")
# Do the same for the y2 vs. x2 plot
plot(anscombe$x2, anscombe$y2,
xlim = c(xmin, xmax),
ylim = c(ymin, ymax),
xlab = "x value", ylab = "y value",
main = "Second dataset")
# Do the same for the y3 vs. x3 plot
plot(anscombe$x3, anscombe$y3,
xlim = c(xmin, xmax),
ylim = c(ymin, ymax),
xlab = "x value", ylab = "y value",
main = "Third dataset")
# Do the same for the y4 vs. x4 plot
plot(anscombe$x4, anscombe$y4,
xlim = c(xmin, xmax),
ylim = c(ymin, ymax),
xlab = "x value", ylab = "y value",
main = "Fourth dataset")
Data Visualization - pch
Different plotting symbols are available in R. The graphical argument used to specify point shapes is pch.
Use This Command To Perform Above Mentioned Function:
#######################################################################
library(MASS)
data("mtcars")
# pch
# Create plot with type = "n"
plot(mtcars$hp, mtcars$mpg,
type = "n", xlim = c(0, max_hp),
ylim = c(0, max_mpg), xlab = "Horsepower",
ylab = "Miles per gallon")
# Add solid squares to plot
points(mtcars$hp, mtcars$mpg,pch = 15)
# Add open circles to plot
points(mtcars$hp, mtcars$mpg, pch = 1)
# Add open triangles to plot
points(mtcars$hp, mtcars$mpg,pch = 2)
# Create an empty plot using type = "n"
plot(mtcars$hp, mtcars$mpg,
type = "n", xlim = c(0, max_hp),
ylim = c(0, max_mpg), xlab = "Horsepower",
ylab = "Miles per gallon")
# Add points with shapes determined by cylinder number
points(mtcars$hp, mtcars$mpg, pch = mtcars$cyl)
# Create a second empty plot
plot(mtcars$hp, mtcars$mpg, type = "n",
xlab = "Horsepower", ylab = "Gas mileage")
# Add points with shapes as cylinder characters
points(mtcars$hp, mtcars$mpg,
pch = as.character(mtcars$cyl))
# Adjusting text position, size, and font
# Create a second empty plot
plot(mtcars$hp, mtcars$mpg, type = "n",
xlab = "Horsepower", ylab = "Gas mileage")
# Create index3, pointing to 3-cylinder cars
index6 <- which(mtcars$cyl == 6)
# Highlight 6-cylinder cars as solid circles
points(mtcars$hp[index6],
mtcars$mpg[index6],
pch = 19)
# Add car names, offset from points, with larger bold text
text(mtcars$hp[index6],
mtcars$mpg[index6],
adj = -0.2, cex = 1.2, font = 4)
#################################################################
Data Visualization – Color
Data visualization (visualisation), or the visual communication of data, is the study or creation of data represented visually. A good graph is easy to read. A goal when creating data visualizations is to convey information in a clear and concise way. One of the most prominent features of most data visualizations is color.Color is important because it lets you set the mood and color lets you guide the viewer’s eye, draw attention to something and therefore tell a story.Both aspects are important for data visualisations.
In data visualization
There are 657 builtin color names
R uses hexadecimal to represent colors
You can create vectors of using rainbow(n),heat.colos(n),terrain.color(n),topo.colors(n) and cm.colors(n).
Data Visualization -Line Chart
Line charts display information as a series of data points connected by straight line segments on an X-Y axis. They are best used to track changes over time, using equal intervals of time between each data point.
CHARACTERISTICS
INCLUDE A ZERO BASELINE IF POSSIBLE
DON’T PLOT MORE THAN 4 LINES
USE SOLID LINES ONLY
USE THE RIGHT HEIGHT
LABEL THE LINES DIRECTLY
When to use a line chart
Line graphs are useful in that they show data variables and trends very clearly.
It helps to make predictions about the results of data not yet recorded. If seeing the trend of your data is the goal, then this is the chart to use.
Line charts show time-series relationships using continuous data.
They allow a quick assessment of acceleration (lines curving upward), deceleration (lines curving downward), and volatility (up/down frequency).
They are excellent for tracking multiple data sets on the same chart to see any correlation in trends.
They can also be used to display several dependent variables against one independent variable.
Line charts are great visualizations to see how a metric changes over time. For example, the exchange rate for GBP to USD.
By this command we can perform the above mentioned package
#################################################################################
# Line Chart
plot(AirPassengers,type="l") #Simple Line Plot
#Example 2
# Create the data for the chart.
v <- c(7,12,28,3,41)
# Plot the bar chart.
plot(v,type = "o")
# Plot the bar chart.
plot(v,type = "o", col = "red", xlab = "Month", ylab = "Rain fall",
main = "Rain fall chart")
#Multiple Lines
# More than line can be drawn on the same chart by using the line() function
# Create the data for the chart.
t <- c(14,7,6,19,3)
lines(t, type = "o", col = "blue")
#################################################################################
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Data Visualization - Histogram
A Histogram visualizes the distribution of data over a continuous interval or certain time period. Each bar in a histogram represents the tabulated frequency at each interval/bin.
Histograms help give an estimate as to where values are concentrated, what the extremes are and whether there are any gaps or unusual values.
They are also useful for giving a rough view of the probability distribution.
Histogram is a common variation of charts used to present distribution and relationships of a single variable over a set of categories.
By this command we can perform the above-mentioned package
###############################################################################
###############################################################################
#Histogram
#Simple histogram
hist(mtcars$mpg)
#Colored histogram
?hist
#The width of each of the bar can be decided by using breaks.
hist(mtcars$mpg, breaks = 4, col = "lightblue", xlab = "mpg", ylab = "freq")
hist(mtcars$mpg, breaks = 15, col=rainbow(7), xlab = "mpg", ylab = "freq")
#Change of bin
hist(AirPassengers, col=rainbow(7))
#Histogram of the AirPassengers dataset with 5 breakpoints
hist(AirPassengers, breaks=5)
# If you want to have more control over the breakpoints between bins,
# you can enrich the breaks argument by giving it a vector of breakpoints.
# You can do this by using the c() function:
# Compute a histogram for the data values in AirPassengers,
# and set the bins such that they run from 100 to 300, 300 to 500 and 500 to 700.
hist(AirPassengers, breaks= c(100, 300, 500, 700))
# We can use seq(x, y, z) function instaed of c()
# x = begin number of the x-axis,
# y = end number of the x-axis
# z = the interval in which these numbers appear.
hist(AirPassengers, breaks= seq(100, 700, 100))
# Note that you can also combine the two functions:
# Make a histogram for the AirPassengers dataset, start at 100 on the x-axis,
# and from values 200 to 700, make the bins 150 wide
hist(AirPassengers, breaks=c(100, seq(200,600, 150), 700))
###############################################################################
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Data Visualization - Box Plot
A Box Plot is a convenient way of visually displaying the data distribution through their quartiles.
Box Plots can be drawn either vertically or horizontally.
Although Box Plots may seem primitive in comparison to a Histogram or Density Plot, they have the advantage of taking up less space, which is useful when comparing distributions between many groups or data sets.
The types of observations from viewing a Box Plot:
· What the key values are, such as: the average, median percentile etc.
· If there are any outliers and what their values are.
· Is the data symmetrical.
· How tightly is the data grouped?
· If the data is skewed and if so, in what direction.
Two of the most commonly used variation of Box Plot are:
Variable-width Box Plots
Notched Box Plots.
By this command we can perform the above mentioned package
###############################################################################
# Boxplot
vec <- c(3,2,5,6,4,8,1,2,3,2,4,30,36)
?boxplot
boxplot(vec)
boxplot(vec, varwidth = TRUE)
# Boxplot of MPG by Car Cylinders
# a formula, such as y ~ grp, where y is a numeric vector of data values
# to be split into groups according to the grouping variable grp (usually a factor).
boxplot(mpg~cyl, data = mtcars)
boxplot(mpg~cyl,data=mtcars, main="Car Milage Data",
xlab="Number of Cylinders", ylab="Miles Per Gallon",col=(c("gold","darkgreen","Blue")))
###############################################################################
#########################################################################
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Data Visualization - Mosaic Plot
Mosaic plots were introduced by Hartigan and Kleiner in 1981 and expanded on by Friendly in 1994.Mosaic plots are also called Mekko charts due to their resemblance to a Marimekko print.
The function Mosaic Plot summarizes the conditional probabilities of co-occurrence of the categorical values in a list of records of the same length. The list of records is assumed to be a full array and the columns to represent categorical values.
A mosaic plot Z is a graphical method for visualizing data from two or more qualitative variables.
It is the multidimensional extension of spine plots, which graphically display the same information for only one variable.
It gives an overview of the data and makes it possible to recognize relationships between different variables. For example, independence is shown when the boxes across categories all have the same areas.
Data Visualization - Heat Map
Heatmaps visualize data through variations in coloring.
Heatmaps are useful for cross-examining multivariate data, through placing variables in the rows and columns and coloring the cells within the table.
Heatmaps are good for showing variance across multiple variables, revealing any patterns, displaying whether any variables are similar to each other, and for detecting if any correlations exist in-between them.
Heatmaps can also be used to show the changes in data over time if one of the rows or columns are set to time intervals.
Heatmaps are a chart better suited to displaying a more generalized view of numerical data
By this command we can perform the above mentioned package
###############################################################################
#########################################################################
# Mosiac Plot
data(HairEyeColor)
mosaicplot(HairEyeColor)
?mosaicplot
###############################################################################
# Heatmap
# Heat map uses color gradient to make comparisons and
# when you want compare different categories across two dimensions you can make use heat map.
library(MASS)
mtcars
heatmap(as.matrix(mtcars))
?heatmap
heatmap(as.matrix(mtcars), Rowv = NA, Colv = NA, scale = "column", col = cm.colors(256),
xlab = "Attributes", main = "heatmap")
#########################################################################
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.
Data Visualization - 3D Plot
3D Plot is used where 2D Plots fails creating a chart.
We use the lattice package which acts as Graphical User Interface (GUI).
Simply install and load lattice package
Use the cloud function
Plotly is a platform for data analysis, graphing, and collaboration. Now, you can you can also make 3D plots. In this post we will show how to make 3D plots with Plotly's R API.
By this command we can perform the above-mentioned package
#########################################################################
#3D graph with lattice package
library(lattice)
attach(mtcars)
# Change am column to factor as "Automatic" and "Manual"
mtcars$am[which(mtcars$am == 0)] <- 'Automatic'
mtcars$am[which(mtcars$am == 1)] <- 'Manual'
mtcars$am <- as.factor(mtcars$am)
#3d scatterplot by factor level
cloud(hp~mpg*wt, data = mtcars)
cloud(hp~mpg*wt, data = mtcars, main = "3D Scatterplot")
cloud(hp~mpg*wt, data = mtcars, main = "3D Scatterplot", col = cyl)
cloud(hp~mpg*wt, data = mtcars, main = "3D Scatterplot", col = cyl, pch = 17)
cloud(hp~mpg*wt|am, data = mtcars, main = "3D Scatterplot", col = cyl, pch = 17)
?cloud
##############################################################
# 3D graph with plotly packaage
install.packages("plotly")
library(plotly)
data(mtcars)
# Basic 3D Scatter Plot
plot_ly(mtcars, x = ~wt, y = ~hp, z = ~qsec)
# Basic 3D Scatter Plot with Color
plot_ly(mtcars, x = ~wt, y = ~hp, z = ~qsec, color = ~am, colors = c('#BF382A', '#0C4B8E')) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'Weight'),
yaxis = list(title = 'horsepower'),
zaxis = list(title = 'qsec')))
#3D Scatter Plot with color scaling
plot_ly(mtcars, x = ~wt, y = ~hp, z = ~qsec,
marker = list(color = ~mpg, colorscale = c('#FFE1A1', '#683531'), showscale = TRUE)) %>%
add_markers() %>%
layout(scene = list(xaxis = list(title = 'Weight'),
yaxis = list(title = 'horsepower'),
zaxis = list(title = 'qsec')),
annotations = list(
x = 1.13,
y = 1.05,
text = 'Miles/(US) gallon',
xref = 'paper',
yref = 'paper',
showarrow = FALSE
))
# Load the `plotly` library
library(plotly)
# Your volcano data
str(volcano)
volcano
# The 3d surface map
plot_ly(z = ~volcano, type = "surface")
#########################################################################
This brings an end to this post, I encourage you to re-read the post to understand it completely if you haven’t and THANK YOU.