Bike Prediction

Bike Prediction

Tags
Analysis
R
Published
September 10, 2021
For this project, I used R to analyze bike rental data over a 2 year period. The goal of the analysis is to understand how bike rentals vary by time and weather conditions, as well as predict the number of unregistered users on any given day.
One useful metric in model selection is the residual sums of squares (RSS). However, for this project, I'm using a slightly altered sum of squares. Since some of the days we are trying to predict have casual users in the hundreds, and some have casual users in the thousands, residuals become difficult to evaluate. This is because a day with 3000 users and a predicted value of 3100 has a residual of 100, and a day with 50 users and a predicted value of 150 has the same residual. The percentage difference in both of these cases is very different though, so for this project, I am using a ratio of the predicted to the target value. This allows us to see how far off we were for each day while taking into account the different magnitudes days will have.
The formula I will use for this is :
Instead of using
I'm using
is the magnitude of the error, which is added to in order to get a value > 1 when we divide by .
If we just divide predicted by target, we get some values that are less than 1, which doesn't work well for the sum of squares, since errors that are less than the target will have a smaller impact than errors that are greater than the target

Part of the analysis that I did in advance was to investigate the way different variables interacted with each other and the target variable. The most useful tool for this was a color-coded pairs plot. The following plot is a condensed version of the original, with the original having around 30 variables, which is a little more information than necessary for this post. Color coding was done with a color library called Viridis. This color library is designed to be maximally readable for people with common forms of color blindness/deficiency.
notion image