Whenever it’s cold like it has been for the past week, the subject of my Local Fairbanks Temperatures page comes up in various places. Folks on Facebook were curious about how much traffic the page sees, so I thought I’d take a quick look. I don’t keep web server logs longer than two weeks and I don’t use Google Analytics or any of those other services because I don’t really care all that much how much “engagement” I’m getting or whether my “SEO” is good or not.
The first step is to figure out how to analyze the logs in the first place. I tried using a couple of the standard packages that are available, but something about my configuration was different than what they were expecting.
So I did it myself. It starts with a regular expression that pulls apart the bits and delimits them with the pipe character (|), which is unlikley to be found normally in the log files. Then I use the tidyr::separate function to split the delimited string into columns, do a bit of fiddling to correct the timestamp, identify the operating system of the clients, and pull out the request URL.
Rows: 1
Columns: 11
$ host <chr> "swingleydev.com"
$ ip <chr> "66.223.139.211"
$ ts <dttm> 2024-01-28 00:01:41
$ get_post <chr> "GET"
$ request_url <chr> "/weather/local_weather.php"
$ protocol <chr> "HTTP/1.1"
$ response <chr> "200"
$ bytes <chr> "2743"
$ referrer <chr> "https://www.google.com/url?q=https://swingleydev.com/weat…
$ agent <chr> "Mozilla/5.0 (iPhone; CPU iPhone OS 16_1 like Mac OS X) Ap…
$ os <chr> "iOS"
At just after midnight, someone at an IP of 66.223.139.211 (a GCI address), hit the swingledev.com hostname (I’ve got several different ones that all point to the same site) after doing a Google search that yielded the Fairbanks Local Temperatures pages. They did this on an iPhone running iOS 16.1.
These are the sorts of details one can gather from the web server logs.
Throwing out the data from today (which isn’t over yet), here’s some of the things we can extract from this information.
Unique Visitors
The number of different IP addresses that have loaded the page is a way of estimating how many different people have visited a site. We’ll remove the visitors using an unknown operating system, because those are probably web crawlers and not real people. I’m also adding a column for minimum daily temperature, since it has been suggested that more people visit the site when the temperatures are more extreme.
Call:
lm(formula = unique_hosts ~ min_temp, data = hosts_by_day)
Residuals:
Min 1Q Median 3Q Max
-75.73 -58.94 -12.30 25.73 211.40
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 65.423 41.143 1.590 0.138
min_temp -7.441 1.219 -6.106 0.0000529 ***
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 83.54 on 12 degrees of freedom
Multiple R-squared: 0.7565, Adjusted R-squared: 0.7362
F-statistic: 37.28 on 1 and 12 DF, p-value: 0.00005289
Yes indeed, when it’s colder more folks are looking at the page. However, another explanation could be the site has become more popular over time since the temperatures started getting colder and the site has been mentioned more frequently on other sites. There’s also the problem that over the period of interest, the temperature has tended colder, making temperature and date related to one another.
Call:
lm(formula = unique_hosts ~ dte + min_temp + dte * min_temp,
data = hosts_by_day)
Residuals:
Min 1Q Median 3Q Max
-97.769 -8.406 0.714 15.801 85.371
Coefficients:
Estimate Std. Error t value Pr(>|t|)
(Intercept) 387373.6410 351082.8998 1.103 0.2957
dte -19.6205 17.7876 -1.103 0.2958
min_temp 20013.1697 5730.5409 3.492 0.0058 **
dte:min_temp -1.0141 0.2904 -3.492 0.0058 **
---
Signif. codes: 0 '***' 0.001 '**' 0.01 '*' 0.05 '.' 0.1 ' ' 1
Residual standard error: 48.51 on 10 degrees of freedom
Multiple R-squared: 0.9316, Adjusted R-squared: 0.911
F-statistic: 45.38 on 3 and 10 DF, p-value: 0.000003943
When we include both date, minimum temperature, and the interaction between the two, date is no longer significant, so it appears that temperature is more likely the driving factor behind the popularity of the page.
Here’s what the individual relationships look like graphically.
In the early days of the Internet, site counters were a popular addition to a site so you could see how many hits a particular page received. It was a sort of thumbs up for your page that proved your page was valuable.
Here’s the daily data for that statistic, again, removing bots and web crawlers: