In the beginning
About 15 years ago, a group of astronomers had a problem. There were hundreds of thousands of digital images of the night sky and only a few grad students to search through them to find and categorize galaxies. The astronomers had an idea: put the images online and let thousands of volunteers, i.e. Citizen Scientists, look at the images and collect the data. This project was called The Galaxy Zoo, and it was a tremendous success.
Philip Brohan had a problem. He is a climate scientist in England, and he wanted historic weather data from the oceans to feed into the various climate models to improve their analysis of historic weather. He contacted the Galaxy Zoo people about having Citizen Scientists extracting weather from Royal Navy logbooks.
The two groups designed such a system, called Old Weather, and the two projects: Galaxy Zoo and Old Weather formed the Zooniverse.
Naval historian, Gordon Smith, joined Old Weather in April 2010. His job was to broaden the scope of the project – to demonstrate the value of the ships' logbooks as historical records, not just sources of pressure and temperature observations.
The success of Old Weather as a history project (naval-history.net) has also helped the work in climate science. Volunteers produce edited ship histories, based on Old Weather transcriptions, and the publication of these histories is announced on this forum and on social media. Expanding the project in this way has been vital in sustaining the public interest that has kept Old Weather going so long.
Old Weather a short history
Old weather has had several phases:
- Phase I: Royal Navy logs ~1880 to ~1930;
- Phase II: Royal Navy gunboats operating in China in the 1930s;
- Phase III: U.S. Navy and Coast Guard 1844 to 1930s;
- Phase IV: U.S. Whaling ships; and,
- Phase V: It was so badly set up that we never did much with it, it doesn't matter.
- OW Arctic, now called OW Federal Ships: U.S. Navy and Coast Guard 1867 - 1955.
- Collecting valuable data for climate research;
- Connecting with people around the world through the Forum; and,
- Having a sense of connection to our ships, the people on them and the history.
- The original data entry system which was slow, awkward and not flexible.
- The new data entry system gave the transcriber random pages for a ship so the sense of continuity was lost.
- The old Forum was abandoned by the Zooniverse, and they moved to the dreaded Talk program.
- The last straw was the discovery that the Zooniverse's new program had a bug that prevented anyone from doing more than five pages for one ship-year and the news that the Zooniverse could not fix it.
A new way of doing Old Weather
About two-thirds of the way through Phase III, Stuart thought it would be better to enter the data straight into a spreadsheet, with the logbook image as a background. It was an intriguing idea, and there were many advantages. I set up a test using Excel, and it was immediately obvious that such a system was faster, easier and more efficient than using the old system.
Bob, a very active transcriber until his job started taking all his time, took on the project. We had to use a free system, so he looked at Open Office and Libre Office, and he chose the latter. He designed and built the first spreadsheet and a couple of us tested it. We made suggestions, and Bob implemented them and worked out all the kinks.
Randi looked around the internet and found a very good bulletin board for our new forum. Bob set up the hosting for the forum and even paid for a five year subscription. Randi, Gordon and a couple of others set up the new forum, and we switched over.
From data collection to data processing
With the old system, we never saw the collected data. With Bob's new system, the data we collected were available on the internet in a tagged text format. For the first time, we could download these data and process them ourselves.
During the entire OW project, we collected place names and their locations, so we could see where the ships were. Matteo in Italy was, and still is, in charge of this project. He has been busy collating the data, and maintaining the lists in the forum. He also set up our online tools for finding places and maps, and for calculating positions and voyages.
In order to make the job of finding new places easier, I built a small program to calculate a ship's position given its starting position and a few hours of courses and bearings. Over the years, as the needs changed, it became more sophisticated to the point I could calculate the hourly positions for an entire year.
Now that we had our collected data online, I could decode the collected data, put them into a spreadsheet and run my voyage calculator. I then put the hourly positions back into the spreadsheet.
OW Arctic / OW Federal Ships
Kevin Wood, who was working on Arctic Ice projects for the National Oceanographic and Atmospheric Administration, NOAA, was very interested in the OW project and he proposed and became in charge of OW Whaling and OW Arctic. His interest was in climate change and Arctic sea ice, and he wanted to know how good the climate models were in predicting where sea ice was historically, and how the presence of sea ice had changed over the years.
As he was setting up OW Whaling and OW Arctic, and before Phase III was finished, he asked me to do two things:
- Calculate a few voyages using Phase III data; and,
- Collect some ice observations from a few U.S. Coast Guard ships and plot the position of the ship at those times.
Kevin's other questions was: how useful is the ice data from the US Coast Guard ships? I collected ice data from six or eight ships and Kevin tested it against the models. He discovered that those data were extremely important. And, also, the models were very good at predicting the presence of ice, but having the actual data was helping improve the models even more.
So, with these results in hand, Kevin proposed that the logbooks from the Coast Guard and other ships be scanned for OW Arctic, and that logs from whaling ships be scanned and used for OW Whaling. The project was approved and funded.
By now, Phase III was ending and our spreadsheet method for data collection was operational.
New questions, new projects
Now that we had our new system set up, we were able to ask and answer new questions. I was curious to see how the data from the different ships compared with each other. It was easy to do: get the data for all the ships for a given year, and compare the data when any two ships were within five miles of each other. I was doing this just out of curiosity, but Kevin became interested. He has given the data to an expert in statistics, who is using these data to better correlate the readings between the ships, and so making the data even better.
In order to make the comparisons, I had to "clean" the data. Pressures like 3030 had to be converted to 30.30. Entries that were just ditto marks had to be converted to their actual values. Values that were out of range, like a temperature of 667 had to be fixed by checking with the value in the logs, etc. Wind speeds were converted from Beaufort Force to knots, Visibilities were converted from code values to nautical miles.
I went through all the data, cleaning and fixing what I could. I also added the Verification checks to the spreadsheet to catch any errors before they were saved. You would be interested to know that there are VERY FEW errors in our transcribed data. So few errors, in fact, that when we demonstrated the spreadsheet method to Philip, and when he sasw the first results, he was so impressed that he relaxed the three person rule to just one. For Phases I, II and III, each log book was transcribed by three separate people. This was done to weed out transcription errors. With our system, only one person is needed per log!
Yangtze River Floods. Gil Compo and other researchers were interested in weather data from the Yangtze River area in China for 1930-31. This was the year of the most severe floods in history, and was probably the most destructive weather event in the world. We transcribed the data from three U.S. gunboats for this project.
Converting wind directions from magnetic to true. Two days ago, Gil wondered if having magnetic wind directions would make a difference in the models. He asked me if I could do that conversion. I did the conversion for three ships, an "old" Bear, a "new" Northwind and a voyage from Ashuelot which sailed most of the way around the world. I chose this because it would have the greatest range of magnetic declinations. Gil did the test and asked me to convert all the wind directions, which I have just done.
Conclusion
The OW volunteers have come up with a system that is:
- Extremely efficient;
- Flexible; and,
- It evolves to meet the needs of the transcribers and the science team.
Because Bob no longer has the time and Craig has passed, Gordon, Chris (aka Hanibal) and I have all made any necessary or requested changes.
Randi, Joan, Gordon and Caro set up the new forum and keep it clean, organized and easy to use.
Matteo keeps the location data up to date, and he maintains his OW online tools.
When Kevin died, much, much too soon, Gil and Lawrence both reassured us that our work is very valuable and is being used in more ways that we imagined. I know that they are using each file as soon as it is uploaded, because I had a question about Kearsarge 1875 the day after I sent it up.
We should all be proud of what we have done. The science team certainly appreciates it.
PS
I wasn't too impressed with the OCR demonstration. I suggested that they wouldn't get too many long term volunteers to use it if it wasn't flexible, easy to use and able to be modified by the user. I.e. some people like to use a mouse, others prefer to use a keyboard. etc etc. I sent them a spreadsheet to give them an idea of all the options we give our users: Enhance images as required? Yes. Magnify the image? Yes. Change font colours? Yes. etc etc etc.