This is the third post in the series Data Visualization With R. In the previous post, we learned how to add title, subtitle and axis labels. We also learned how to modify the range of the axis. In this post, we will learn how to create scatter plots. - adding color to the points - modify shape of the points - modify size of the points
Libraries, Code & Data
Let us recreate the plot that we had created in the first post by using the
mtcars data set. We will use the
disp (displacement) and
mpg (miles per gallon) variables.
disp will be on the X axis and
mpg will be on the Y axis.
We have created a very basic plot and any one looking at it for the first time will get confused with the axis labels
mtcars$mpg. Let us put into practice what we learnt in the second post, and add a title to the plot, and make the axis labels more meaningful.
plot(mtcars$disp, mtcars$mpg, main = 'Displacement vs Miles Per Gallon', xlab = 'Displacement', ylab = 'Miles Per Gallon')
Now the plot clearly communicates that it represents the relationship between the displacement and mileage of cars. Now the color of the points in the plot is black by default. Some of us may agree that black is beautiful but not all of us will like it. As a first step in enhancing the way our plot looks, let us change the shape of the points.
The shape of the point can be specified using the
pch argument. It will take values between 0 and 25. Below is an example:
# point shape plot(mtcars$disp, mtcars$mpg, pch = 6)
Let us check out a few of the other shapes:
We can specify the shape based on a third (categorical variable as well). In the below plot, the shape is based on the levels of the categorical variable
cyl (number of cylinders) from the
mtcars data set:
# shape based on number of levels of a third variable plot(mtcars$disp, mtcars$mpg, pch = nlevels(factor(mtcars$cyl)))
# shape based on a third categorical variable plot(mtcars$disp, mtcars$mpg, pch = unclass(mtcars$cyl))
The size of the points in the scatter plot can be specified using the
cex argument in the
plot() function. The default value for
cex is 1.
# point size plot(mtcars$disp, mtcars$mpg, cex = 1.5)
The below plots show the size of the points for values relative to 1.
We can specify a border color for the points using the
col argument and a background color using the
bg argument. The background color can be specified only for points whose
pch argument takes values between 21 and 25. Let us look at some examples to understand this distinction between border and background color.
# shape between 0 and 21 plot(mtcars$disp, mtcars$mpg, pch = 5, col = 'blue', bg = 'red')
You can observe that although we have specified a background color using the
bg argument, we do not see the red background color as the value specified for the
pch (shape) argument is not between 21 and 25. In the next example, we will use a value between 21 and 25 so that the
pch argument is effective.
# shape between 22 and 25 plot(mtcars$disp, mtcars$mpg, pch = 24, col = 'red', bg = 'blue')
The color of the points can be specified using (levels) of a categorical variable as well. In the next example, we will use the
cyl variable to specify the color of the points.
# color based on a third variable plot(mtcars$disp, mtcars$mpg, pch = 5, col = factor(mtcars$cyl))
cyl is a categorical variable with 3 levels, we can see that the points now have 3 different colors. The above method is useful when you want to segregate the points in a scatter plot based on a third variable.
In this post, we learned how to
- create scatter plots
- add color to the points
- modify shape of the points
- modify size of the points