


Scatter plots and types of data Continuous data: appropriate for scatter plots Annotations explaining the colors and markers could further enhance the matrix.įor your data, you can use a scatter plot matrix to explore many variables at the same time. The colors reveal that all these points are from cars made in the US, while the markers reveal that the cars are either sporty, medium, or large. There are several points outside the ellipse at the right side of the scatter plot. From the density ellipse for the Displacement by Horsepower scatter plot, the reason for the possible outliers appear in the histogram for Displacement. In the Displacement by Horsepower plot, this point is highlighted in the middle of the density ellipse.īy deselecting the point, all points will appear with the same brightness, as shown in Figure 17. This point is also an outlier in some of the other scatter plots but not all of them. In Figure 16, the single blue circle that is an outlier in the Weight by Turning Circle scatter plot has been selected. It's possible to explore the points outside the circles to see if they are multivariate outliers. The red circles contain about 95% of the data. Using this line, we can predict how much money Mateo will earn in his 20th week of work (assuming he continues this pattern).īased on this line, Mateo will earn approximately $157 in week 20.The scatter plot matrix in Figure 16 shows density ellipses in each individual scatter plot. If there is a point that is much higher or lower (an outlier), it shouldn't be on the line. When drawing the line, you want to make sure that the line fits with most of the data. The line we draw through the points on the graph just needs to look like it fits the trend of the data. There are many complicated statistical formulas we could use to find this line, but for now, we will just estimate it. We use a "line of best fit" to make predictions based on past data. Mateo's scatter plot has a pretty strong positive correlation as the weeks increase his paycheck does too. Video game scores and shoe size appear to have no correlation as one increases, the other one is not affected. No Correlation: there is no apparent relationship between the variables.Time spent studying and time spent on video games are negatively correlated as your time studying increases, time spent on video games decreases. Negative Correlation: as one variable increases, the other decreases.Height and shoe size are an example as one's height increases so does the shoe size. Positive Correlation: as one variable increases so does the other.There are three types of correlation: positive, negative, and none (no correlation). With scatter plots we often talk about how the variables relate to each other. Maybe his father is giving him more hours per week or more responsibilities. For example, with this dataset, it is clear that Mateo is earning more each week. Using this plot, we can see that in week 2 Mateo earned about $125, and in week 18 he earned about $165. In general, the independent variable (the variable that isn't influenced by anything) is on the x-axis, and the dependent variable (the one that is affected by the independent variable) is plotted on the y-axis. The weeks are plotted on the x-axis, and the amount of money he earned for that week is plotted on the y-axis. Here's a scatter plot of the amount of money Mateo earned each week working at his father's store: These types of plots show individual data values, as opposed to histograms and box-and-whisker plots. Scatter plots are an awesome way to display two-variable data (that is, data with only two variables) and make predictions based on the data.
