4  Notes

Show the code
source(here::here("scripts/setup.R"))

4.1 About the data

Each row of data (“record”) in the data set corresponds to one observation, typically:

  • Made at noon local time, recording location (lat/lon) and weather conditions
  • One observation each day of the voyage, except when a ship was in port

There are a lot of errors and omissions in the data set. Some likely were originally made in the stress of the moment when writing in the ship’s log; in other cases errors probably were made in the process of transcribing the written logs into electronic records (for example: reading unclear handwriting, inferring the meaning of abbreviations used inconsistently, or while typing every character and line of every of log entry). Thus I excluded voyages or observations in the following cases:

  • Voyages starting earlier than 1750 or later than 1815 (the start date is VoyageIni)
  • Voyages missing ship name, start date, or end date
  • Records missing latitude, longitude, or the date of observation
  • Voyages less than 2 days
  • Voyages in which there is a large gap between observations (more than 1000 km). or 1000 days (2.7 years)

When making maps of voyages I excluded these in addition:

  • Voyages in which there is a gap of more than 2 days or 1000 km (621 miles)

I excluded repeated observations on the same day of the same voyage; in this case I kept only the first one.

It’s worth noting some other considerations:

  • Some logs include stops in multiple ports as part of a single voyage. Others break long journies into multiple voyages (a voyage being defined as a unique combination of ShipName and VoyageIni voyage start date; each voyage has a VoyageFrom and VoyageTo, and for most I manually normalized port names in port_from and port_to) and added country_from and country_to.

  • We don’t know how these particular logbooks were chosen for digitization, so it’s best to assume we have a convenience sample, which limits the kinds of inferences we can make.

  • It’s not possible to determine from the data set alone when multiple ships share the same name. Typically names are not in use concurrently within one country or company, however it’s not uncommon for a name to be revived some years after the demise of previous ship of that name.

  • Typically I refer to colonies and countries by their current name in English. In cases of cities and geographic entities (capes, points, bays, rivers) being part of the origin or destination, I generally use current names, except where old names are very well known (for example, Bombay, Madras, Calcutta all have new names, which I didn’t use). In some cases I substitute a larger city very close to the small city or fort (for example, Middelburg instead of Vlissingen and Rammekens), and in some cases I have kept the French or Spanish name. I determined the “normalized” port names and added countries by inspecting the port names, plotting the routes, and triangulating via Wikipedia, Google Maps, and Google search. It’s possible a small minority of my inferences are incorrect.

  • While the motivation and funding for digitizing these logs was to make weather data available for analysis, I’m ignoring weather details and focusing on the voyage tracks.

Even with the above limitations, there is a lot we can learn by plotting the locations recorded each day, which reveals the track of many voyages within the limits of accuracy available at the time, which used a sextant, marine chronometer, and almanac, and when the weather did not cooperate, by dead reckoning.


4.2 Acknowledgements

I was inspired by Simon Coulombe’s write up: Animated map of ships, 1750-1799 which makes use of the Climatological Database for the World’s Oceans (CLIWOC)

The Climatological Database for the World’s Oceans (CLIWOC) represents the culmination of a major project funded by the European Union, and pursued by a large team of researchers in organizations and universities around the world. … You can read more about how the database was originally coded, and has now been reformatted, at Steven’s website. This is essential reading if you’d like to use the database for climate history research.

To interpret nautical terms in the database, you will likely need to use a dictionary created as part of the CLIWOC project.

I could have retrieved the data from source at https://www.historicalclimatology.com/cliwoc.html, however Simon offers a download that is cleaned up a bit already:

I referred a lot to the codebook at https://stvno.github.io/page/cliwoc/ which explains the data columns.

Thanks to the EU for funding the digitization and to everyone who did the work to get the data in a state that made my task so easy. And thanks to Claus Wilke for providing the code pattern for plotting a globe with simple features (sf R package), which Simon and I both followed.

Adrien Charles (and others?) at Axxio made a nice visualization in Tableau here: https://www.axxio.io/the-age-of-discovery/ , and there are other examples a web search away. I did not consult them before doing my own work.

Thanks to everyone developing R, RStudio, and the many high-quality R packages I used:

  • tidyverse and tidyverse-adjacent: tidyr, readr, readxl, dplyr, purrr, ggplot2, stringr, lubridate, forcats, skimr, gt, scales, here, hrbrthemes
  • GIS: sf, lwgeom, rnaturalearth, cowplot, measurements, units


4.2.1 Source files

Source files for this analysis are available at https://github.com/dmoul/age-of-sailing