Correlations and line of best fit are Embedded Insights that are available in the scatter plot chart type in One Model Storyboards
Overview
Correlation coefficients and line of best fit are two methods used to measure the relationship between two metrics in One Model. Both of these methods are available along with explanations of the results in all scatter plot charts in Storyboards. When correlations are enabled, statistical significance explanations are provided. Drawing a line of best fit on the chart results in the statistical calculation being displayed in the chart header as well. Correlations and line of best fit can be enabled together, or each on its own.
Definitions
Correlations are a statistical measurement of the relationship between two variables.
When one variable increases as the other increases the correlation is positive; when one decreases as the other increases it is negative.
A line of best fit (also known as a linear regression) is a line through a scatter plot of data points that best expresses the relationship between those points.
This article covers similarities and differences between the two: https://www.graphpad.com/support/faq/what-is-the-difference-between-correlation-and-linear-regression/
Running Correlations and Drawing Line of Best Fit
To run correlations or draw a line of best fit click the lightbulb icon on a scatter plot chart on a storyboard. The dropdown menu shows the two options - Run Correlations or Draw Line of Best Fit. Click on each to enable the feature.
- The equation used to calculate the line of best fit (linear regression)
- A statement describing the correlation between the two metrics
- A statement describing the statistical significance of the correlation
- More information
- Line of best fit
Storyboard designers can also enable correlations and/or line of best fit on charts as the default behavior. This means that whenever a particular Storyboard is opened the chart will automatically include the Embedded Insights. This option is available in the Tile Settings on the Discover tab. The desired type of correlation method can also be selected here.
Supporting data and detailed explanations for correlation coefficients and significance are available by clicking on the More Information icon.
Note that scatter plot charts currently only display 1000 data points and both correlations and line of best fit only apply to the data displayed in the chart. If there are more than 1000 points, only the 1000 points displayed are considered. If the scatter plot data were formatted as a list report in Explore, the 1000 rows included with default sorting applied are the ones included.
Another thing to be aware of is that there must be some variance in the values on both axis. If either the X or Y axis contain only one value, a message stating "Error generating correlation data" will be displayed.
Interpreting the Results
Correlations determine how strongly two datasets are related to each other. In this case, two metrics are compared and a 'correlation coefficient' is calculated. The correlation coefficient value indicates the degree of the relationship between the two variables, e.g., as one variable increases, the other increases by a certain degree.
Correlation Coefficient Explanation
The correlation coefficient ranges from -1 to 1.
- -1 indicates a perfect negative correlation between the two datasets
- 0 indicates no correlation between the two datasets
- 1 indicates a perfect positive correlation between the two datasets
Interpreting the correlation coefficient can be case dependent, but some generally accepted thresholds are:
- Greater than .8 is a very strong correlation
- Between .8 and .6 is a moderate correlation
- Less than .6 is a weak correlation
- Near zero is barely any correlation
P-value Explanation
The p-value is essentially the chance that the reported correlation coefficient is a result of a completely uncorrelated dataset. In other words: the lower the p-value, the higher the chance that the correlation coefficient is accurate. Interpreting the p-value is also case dependent, but the generally accepted threshold is:
- Greater than .05 is not statistically significant
- Less than (and equal to) .05 is statistically significant
Comments
0 comments
Please sign in to leave a comment.