| 1 |
genealogy.module Scott Courtney (Drupal ID "syscrusher")
|
| 2 |
scott@4th.com http://4th.com/
|
| 3 |
|
| 4 |
Revision history:
|
| 5 |
2005-03-25 First public release to "sandbox"
|
| 6 |
|
| 7 |
License: GNU General Public License (http://www.gnu.org/)
|
| 8 |
|
| 9 |
RELEASE STATE: This is an ALPHA RELEASE, not for production use
|
| 10 |
NOTE: Tested with Drupal 4.5 only, may or may not work
|
| 11 |
with Drupal 4.6.
|
| 12 |
|
| 13 |
Introduction:
|
| 14 |
|
| 15 |
This module started out as a quick-and-dirty PHP hack to allow my wife (a
|
| 16 |
librarian) and one of her colleagues to publish an online obituary database
|
| 17 |
for the Louisville (Ohio) Public Library. They had accumulated obituary
|
| 18 |
records from over a century of local newspaper microfilms into a simple
|
| 19 |
comma-separated ASCII file, and they wanted a way to publis this info in
|
| 20 |
a searchable online database.
|
| 21 |
|
| 22 |
The database has grown to over 25000 records and is searched by hundreds
|
| 23 |
of genealogy enthusiasts every month, so the notion of this as a one-off
|
| 24 |
hack no longer really applies. Furthermore, I've had inquiries from other
|
| 25 |
library systems wanting to know if they could use the code. In the meantime,
|
| 26 |
the Louisville Library moved from a static HTML web site to Drupal, with
|
| 27 |
my wife as the webmaster. To make my originally-standalone PHP code work
|
| 28 |
smoothly with Drupal, I wrapped it inside a simple Drupal module. That is
|
| 29 |
the status of the code that you will find in this directory -- it is a
|
| 30 |
Drupal wrapper around some really crufty old code from the days of PHP
|
| 31 |
3.x. Yeah, it's *that* old.
|
| 32 |
|
| 33 |
The current version of the module is in production at the Louisville Public
|
| 34 |
Library web site, http://louisvillelibrary.org/ . Working with the staff
|
| 35 |
of the library, I have plans to enhance it substantially. The "Roadmap"
|
| 36 |
section below explains where this module is going.
|
| 37 |
|
| 38 |
Status:
|
| 39 |
|
| 40 |
If you've read the preceding text, it should come as no surprise that I
|
| 41 |
issue the following warning: THIS MODULE IS VERY MUCH IN A "SANDBOX"
|
| 42 |
STATE, AND IT *WILL* CHANGE SUBSTANTIALLY. Play with it if you want, but
|
| 43 |
don't use it in production right now. And please, don't judge my coding
|
| 44 |
quality by this stuff! I had to take really old code and glue it into
|
| 45 |
Drupal on a tight schedule, and it's not my best work, to put it mildly.
|
| 46 |
|
| 47 |
That being said, the code is hereby released under GNU General Public
|
| 48 |
License, so legally you can do whatever that license allows you to do.
|
| 49 |
|
| 50 |
Roadmap:
|
| 51 |
|
| 52 |
In the near future, I plan to upgrade this module with substantial new
|
| 53 |
functionality. The library staff want to track not only obituaries, but
|
| 54 |
also marriages, births, divorces, and other major milestones. What this
|
| 55 |
means is that the database schema will be changing. I plan to normalize
|
| 56 |
the database, with a single table for "people" and another table that
|
| 57 |
establishes relationships between individuals, such as spousehood,
|
| 58 |
parentage, etc. There may be a third table that lists only the citations
|
| 59 |
of public records (newspapers, health department files, etc.), and that
|
| 60 |
table's citation IDs would link with a many-to-one relationship to the
|
| 61 |
marriage/ancestry table. None of this is cast in stone yet.
|
| 62 |
|
| 63 |
The important thing to note is that I CANNOT GUARANTEE 100% MIGRATION OF
|
| 64 |
THE CURRENT TABLE to the new schema. With 25000+ records in the library
|
| 65 |
database, I'm certainly going to do my best to automate as much as
|
| 66 |
possible, but there is a tacit agreement with the library staff that they
|
| 67 |
may have to do some manual touchup of records when we move to the new
|
| 68 |
schema.
|
| 69 |
|
| 70 |
Also, note that a lot of the information is fuzzy in nature, and the
|
| 71 |
new database schema will have to handle that. Genealogy is often an
|
| 72 |
inexact science. For example, there may have been several people named
|
| 73 |
"Smith" who died in the same year in the same small town. If the obits
|
| 74 |
in the newspaper describe "J. Smith", "J. A. Smith", and "John Smith"
|
| 75 |
in three separate obituaries, and you know that your great-grandfather
|
| 76 |
was named "John Alan Smith", there is no absolute way to know which of
|
| 77 |
these three obituaries applies to him.
|
| 78 |
|
| 79 |
In the current version of the software, this issue is left up to the
|
| 80 |
human who is searching the database. Names are matched on a raw text
|
| 81 |
match, with optional SOUNDEX broadening of the search (so that "Jensen"
|
| 82 |
and "Jansen" are effectively treated as equal, for example). If there
|
| 83 |
are six different people who might be the person you are seeking, so
|
| 84 |
be it -- the current database shows you all six and allows you to decide
|
| 85 |
which one is the right one based on other information. Or you may just
|
| 86 |
have to live with ambiguity.
|
| 87 |
|
| 88 |
A fully third-normal data schema will make this a trickier problem,
|
| 89 |
because relationship tables typically point one record ID number to
|
| 90 |
another record ID number, with zero room for ambiguity. There are a
|
| 91 |
couple of different ways to resolve this in the automated schema
|
| 92 |
conversion. In one approach, the relationship tables might just keep
|
| 93 |
the existing convention of mapping by name string, so it would be a
|
| 94 |
many-to-many mapping rather than multiple one-to-one mappings. Or,
|
| 95 |
a "certainty flag" (boolean) could indicate whether a relationship
|
| 96 |
is definitely confirmed or just one of multiple possible alternatives.
|
| 97 |
There are also other approaches.
|
| 98 |
|
| 99 |
My point is that none of this has been decided yet. There is a tradeoff
|
| 100 |
in that the more the new schema tolerates ambiguity, the easier it will
|
| 101 |
be to automate the migration from the old schema. But the more precision
|
| 102 |
the new schema mandates -- at the expense of conversion labor -- the
|
| 103 |
more accurate the ultimate results will be. I need to spend more time
|
| 104 |
thinking about this problem.
|
| 105 |
|
| 106 |
Contributing:
|
| 107 |
|
| 108 |
I would welcome contact from anyone who is interested in partnering up on
|
| 109 |
this project. I will state up front, however, that I wrote this to scratch
|
| 110 |
my own itch, so to speak, so any architectural changes will need to be
|
| 111 |
something that aligns with the goals I have for the module. The plan is to
|
| 112 |
make it generic enough that it will also be useful to others, but I can't
|
| 113 |
compromise its original customer in order to do that. Hopefully, there is
|
| 114 |
enough generality in the problem definition that whatever solution works
|
| 115 |
for Louisville Public Library will also work for others.
|