Monday, July 30, 2012

Indexing Update from FamilySearch

Here is an update from the FS blog:

FamilySearch indexers are nearing the end of their own “Olympic” marathon. Is this the end—or just the beginning of something even bigger? Let the games begin! After four long years, we finally get to enjoy another exciting version of that international celebration of sport known as the Olympic Games. Few events at the Olympics symbolize human achievement like the marathon. Approximately 26.2 miles in length, the marathon demands exceptional fitness, incredible determination and a willingness to sacrifice personal well being to achieve glory for flag and country.
FamilySearch indexers are nearing the end of their own marathon called the 1940 US Census Community Project. It’s been a challenge, but incredibly, we’ve broken records with every step. Now we’re in the final stretch with the finish line rapidly approaching. Glory awaits, but as every athlete knows, you have to “push through the tape” and cross the finish line before the race is won. If our “race” continues to go as well as it has, indexers and arbitrators will reach the finish within days.
Olympic marathoners end their race on the track inside the main Olympic stadium. When the first runner clears the service tunnel leading from the streets outside the stadium and begins to “kick” toward the finish one-half lap away, thousands of fans erupt in a deafening cheer. It is a spine-tingling moment, charged with emotion. After more than two hours of intense individual effort, suddenly there are tens of thousands to help push weary legs the final 300 meters.
The adrenaline rush in those moments is exhilarating and the race-ending flood of emotions, ranging from relief to amazement to sheer ecstasy, can be overwhelmingly powerful. It’s that same well-deserved feeling we would wish for every 1940 US Census Community Project indexer and arbitrator who has tenaciously stuck with this marathon indexing effort from the starting gun to the finish line.
To you who have given your all to this project and tirelessly pushed through the indexing equivalent of heavy legs, shortness of breath, and doubts about your ability to endure, we can only hope in these final days of the project that you can somehow feel the silent but enthusiastic cheers of the literally tens of millions who are the recipients of your great gift.
The Victor’s Crown
The traditional symbol of Olympic victory is a gold medal, but anciently the symbol was the laurel wreath, woven from the supple branches and leaves of a wild olive tree. The 1940 US Census Community Project represents a major victory. Only the most wildly optimistic individuals would have suggested that the entire census could be indexed and arbitrated in less than 4½ months. But that’s precisely what we, the genealogical community, have done. It’s an achievement without parallel.
Among the project’s myriad astonishing statistics is the number of people who contributed to the creation of the US 1940 Census index. To date that number is hovering near 155,000—enough to make a decent-sized city—and it continues to climb even as we head into the home stretch. The enthusiasm for this project and for indexing in general is such an inspiration!
For all who have participated, from those who arbitrated thousands of names to those who indexed a single batch, we offer the victor’s laurel, a badge you can proudly display to show your part in making history. More celebrating lies ahead, but that can wait until we all cross the finish line together as the last of the full index is published to the world. Stay tuned for more about that in the near future.
The Start of a New Trend?
For now, let’s consider one of the “unintended consequences” of the 1940 US Census Community Project and another comparison to the Olympic marathon. The year was 1972. The setting was Munich, Germany. From the field of more than 70 competitors, a relatively unknown American named Frank Shorter emerged and surprised the world by beating the rest of the field by more than two minutes. His stunning victory was such an inspiration to Americans that it fueled a national running craze that continues to this day.
Great moments in history can inspire generations to take action and accomplish even greater feats. What greater achievements will the memory of the 1940 US Census Community Project inspire? Already it has swept up more participants than any other project of its type in history, but there are billions of additional records still waiting to be indexed. Could the 1940 US Census experience mark the beginning of a new culture of group giving in the genealogical community? We now know what we’re capable of accomplishing—is there any reason we shouldn’t just continue?
If your answer to that question is a resounding, “NO!,” then you’ll be pleased to learn about the US Immigration & Naturalization Community Project. It’s the sequel to the 1940 US Census project and records from this project are already available for indexing (just look for the “US (Community Project)” label).
If you need a rest from the “marathon,” everyone will understand. But if you’re thinking the 1940 US Census was just a good warm up and are wondering just how much more we can accomplish in the future, then get on board and full steam ahead! The race to remember our immigrant ancestors has just begun!

Saturday, July 14, 2012

Church of Jesus Christ of Latter-day Saints Finances

This is a report by the Church that should be of interest to those in Genealogy Community

Commentary —  12 July 2012

The Church and Its Financial Independence

Saturday, July 7, 2012

1940 Census report

Here is the latest report from FamilySearch:

No new states have been posted this week. States such as New York and a few others are so big that they take a lot of processing at the back end to make them searchable. Please be patient as we get them ready to post. I know that we are all excited to see new states posted, especially if it’s been a state you’ve been working on. We want to make sure they are as error free as possible and sometimes that takes a little extra time.  

Below are the latest statistics for the project. They continue to be very encouraging.
  • 115,886,258 names have been indexed and arbitrated.
  • 29 states have searchable indexes on These states include Alabama, Alaska, Arizona, California, Colorado, Delaware, Florida, Hawaii, Idaho, Indiana, Iowa, Kansas, Louisiana, Maine, Mississippi, Missouri, Montana, Nevada, New Hampshire, New Mexico, North Dakota, Oklahoma, Oregon, South Dakota, Utah, Vermont, Virginia, Washington and Wyoming.  
  • 4 additional states are 100% indexed and arbitrated and are in the final stage in preparation for posting.
  • 5 additional states are 90% or more indexed and arbitrated.
  • 14 states are 50% or more indexed.
  • 4 states are still less than 50% indexed. To see the status of each state visit the 1940 US Census state-by-state progress map on the FamilySearch website.
  • The 1940 US Census is currently 82% indexed and arbitrated.
  • 150,990 indexers have signed up to index the 1940 US Census.

Wednesday, July 4, 2012

Records preservation

This is a copy of a blog post on record preservation that I believe you will find interesting from FamilySearch:

The LDS Church has been a pioneer for many decades in preserving important family history records, keeping them safe from the dangers of both man and nature. It took many years to build the Granite Mountain Records Vault, where microfilm records are safely kept today. But what about all this digital information that Family Search is generating to assist researchers on the Internet –how does that get preserved from generation to generation? As you might imagine, digital content is a bit more complex and fragile than microfilm to preserve long term. Digital preservation is a lot more than just tape backup. Let’s explore some of the nuances and complexities of long-term, digital preservation.

Volume of data: the digital pipeline in Family Search is generating somewhere in the range of 15 terabytes of images, or one million to three million pages digitally every business day of the year. The software to handle this volume did not exist when we started digital preservation. We push many of our vendors to come up with new technology to meet our needs as we stretch their capabilities and often break their products. We are also writing our own magnetic tape storage software because no products exist on the market that can handle preservation storage volume of this magnitude.

Data validation: on an annual basis, the preservation system has to be automated to check all the bits on every tape and make sure that there is no corruption. We store checksum values at multiple levels so the software can read the checksum, read the data, and compare the calculations to ensure integrity. It is very resource intensive to deploy tape drives for writing new data, while also using drives for the annual validation of every tape. It takes complex scheduling to balance the work between the two and assure that we don’t go too long without touching each tape for validation.

Media refresh: as the tape media ages, the system needs the ability to make copies on to new media, before unrecoverable errors begin to appear. There is no way to tell exactly when tapes will begin to fail, so the software has to keep a database of errors for every tape and every tape drive and look for trends that indicate a coming problem before they actually occur. If we rotate media too often, however, the system becomes too costly to maintain.

File format migration: do you have any WordPerfect 4.2 files lying around? How about a Lotus 1-2-3 spreadsheet or even something more obscure, where the software vendor is long gone, along with your installation disks? As years pass, the risk of not being able to accurately read a data file increases. Our preservation system has to account for this and be able to convert files from one format or version to a newer format. If files are not migrated in a timely fashion, massive amounts of data can become inaccessible or difficult to render accurately. Some file formats may be viable for a decade or more, while others could become obsolete within just a few short years or less. Is a PDF a viable rendering of an Excel spreadsheet? What about the underlying formulas, fonts, supporting data, and links to data sources? There is a significant risk of losing content whenever a file format is converted to a new format.

Metadata and descriptive data: so you have a file from 5 years ago…who created it? What software version is required to read it? Where was the image originally digitized? Who is the owner of the original? Are there any restrictions on the use of the file in the future? Is this copy the highest resolution version we own, or is there a better image somewhere? What is the subject matter of the file? Are there people in the photograph? Is there important genealogical data contained in the image? The list of important questions goes on and on. Keeping track of the many types of metadata, indexes, and associated descriptive data is critical for our preservation system.

Documentation: a preservation system serves both currently living persons as well as future generations. We often pull images from preservation to avoid having to rescan originals or microfilm in the digital pipeline. A professional genealogist may need to see our highest resolution copy of an image to get clarity around handwriting. A future generation may have to open up our protected vaults and try to recover as much information as they can from our tape libraries and try to rebuild the family history information we have attempted to preserve. Documentation is a critical component of digital preservation. It is imperative that we document our data models, file formats, technology standards, software code, hardware specifications, and many, many other aspects of the digital preservation system. A future archeologist will not be able to simply put a magnifying glass up to microfilm to view our digital artifacts.

There are many additional complexities associated with operating a trusted digital repository. Hopefully, this article gives you some insights into some of them and helps you appreciate the efforts FamilySearch is taking to ensure that future generations are handed a pristine copy of their family records. We have not yet solved all of the challenges associated with building our preservation system –a task that will take many more years and possibly decades to prove out. We take our work very seriously and have a dedicated team of professionals looking after the world’s records. With contributions from many, we hope to enable future generations to learn of their heritage and make the same precious bond with their ancestors as we have.