/[drupal]/contributions/sandbox/syscrusher/genealogy/README.txt
ViewVC logotype

Contents of /contributions/sandbox/syscrusher/genealogy/README.txt

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.2 - (show annotations) (download)
Fri Mar 25 21:53:13 2005 UTC (4 years, 8 months ago) by syscrusher
Branch: MAIN
CVS Tags: HEAD
Changes since 1.1: +2 -0 lines
File MIME type: text/plain
Minor documentation update.
1 genealogy.module Scott Courtney (Drupal ID "syscrusher")
2 scott@4th.com http://4th.com/
3
4 Revision history:
5 2005-03-25 First public release to "sandbox"
6
7 License: GNU General Public License (http://www.gnu.org/)
8
9 RELEASE STATE: This is an ALPHA RELEASE, not for production use
10 NOTE: Tested with Drupal 4.5 only, may or may not work
11 with Drupal 4.6.
12
13 Introduction:
14
15 This module started out as a quick-and-dirty PHP hack to allow my wife (a
16 librarian) and one of her colleagues to publish an online obituary database
17 for the Louisville (Ohio) Public Library. They had accumulated obituary
18 records from over a century of local newspaper microfilms into a simple
19 comma-separated ASCII file, and they wanted a way to publis this info in
20 a searchable online database.
21
22 The database has grown to over 25000 records and is searched by hundreds
23 of genealogy enthusiasts every month, so the notion of this as a one-off
24 hack no longer really applies. Furthermore, I've had inquiries from other
25 library systems wanting to know if they could use the code. In the meantime,
26 the Louisville Library moved from a static HTML web site to Drupal, with
27 my wife as the webmaster. To make my originally-standalone PHP code work
28 smoothly with Drupal, I wrapped it inside a simple Drupal module. That is
29 the status of the code that you will find in this directory -- it is a
30 Drupal wrapper around some really crufty old code from the days of PHP
31 3.x. Yeah, it's *that* old.
32
33 The current version of the module is in production at the Louisville Public
34 Library web site, http://louisvillelibrary.org/ . Working with the staff
35 of the library, I have plans to enhance it substantially. The "Roadmap"
36 section below explains where this module is going.
37
38 Status:
39
40 If you've read the preceding text, it should come as no surprise that I
41 issue the following warning: THIS MODULE IS VERY MUCH IN A "SANDBOX"
42 STATE, AND IT *WILL* CHANGE SUBSTANTIALLY. Play with it if you want, but
43 don't use it in production right now. And please, don't judge my coding
44 quality by this stuff! I had to take really old code and glue it into
45 Drupal on a tight schedule, and it's not my best work, to put it mildly.
46
47 That being said, the code is hereby released under GNU General Public
48 License, so legally you can do whatever that license allows you to do.
49
50 Roadmap:
51
52 In the near future, I plan to upgrade this module with substantial new
53 functionality. The library staff want to track not only obituaries, but
54 also marriages, births, divorces, and other major milestones. What this
55 means is that the database schema will be changing. I plan to normalize
56 the database, with a single table for "people" and another table that
57 establishes relationships between individuals, such as spousehood,
58 parentage, etc. There may be a third table that lists only the citations
59 of public records (newspapers, health department files, etc.), and that
60 table's citation IDs would link with a many-to-one relationship to the
61 marriage/ancestry table. None of this is cast in stone yet.
62
63 The important thing to note is that I CANNOT GUARANTEE 100% MIGRATION OF
64 THE CURRENT TABLE to the new schema. With 25000+ records in the library
65 database, I'm certainly going to do my best to automate as much as
66 possible, but there is a tacit agreement with the library staff that they
67 may have to do some manual touchup of records when we move to the new
68 schema.
69
70 Also, note that a lot of the information is fuzzy in nature, and the
71 new database schema will have to handle that. Genealogy is often an
72 inexact science. For example, there may have been several people named
73 "Smith" who died in the same year in the same small town. If the obits
74 in the newspaper describe "J. Smith", "J. A. Smith", and "John Smith"
75 in three separate obituaries, and you know that your great-grandfather
76 was named "John Alan Smith", there is no absolute way to know which of
77 these three obituaries applies to him.
78
79 In the current version of the software, this issue is left up to the
80 human who is searching the database. Names are matched on a raw text
81 match, with optional SOUNDEX broadening of the search (so that "Jensen"
82 and "Jansen" are effectively treated as equal, for example). If there
83 are six different people who might be the person you are seeking, so
84 be it -- the current database shows you all six and allows you to decide
85 which one is the right one based on other information. Or you may just
86 have to live with ambiguity.
87
88 A fully third-normal data schema will make this a trickier problem,
89 because relationship tables typically point one record ID number to
90 another record ID number, with zero room for ambiguity. There are a
91 couple of different ways to resolve this in the automated schema
92 conversion. In one approach, the relationship tables might just keep
93 the existing convention of mapping by name string, so it would be a
94 many-to-many mapping rather than multiple one-to-one mappings. Or,
95 a "certainty flag" (boolean) could indicate whether a relationship
96 is definitely confirmed or just one of multiple possible alternatives.
97 There are also other approaches.
98
99 My point is that none of this has been decided yet. There is a tradeoff
100 in that the more the new schema tolerates ambiguity, the easier it will
101 be to automate the migration from the old schema. But the more precision
102 the new schema mandates -- at the expense of conversion labor -- the
103 more accurate the ultimate results will be. I need to spend more time
104 thinking about this problem.
105
106 Contributing:
107
108 I would welcome contact from anyone who is interested in partnering up on
109 this project. I will state up front, however, that I wrote this to scratch
110 my own itch, so to speak, so any architectural changes will need to be
111 something that aligns with the goals I have for the module. The plan is to
112 make it generic enough that it will also be useful to others, but I can't
113 compromise its original customer in order to do that. Hopefully, there is
114 enough generality in the problem definition that whatever solution works
115 for Louisville Public Library will also work for others.

  ViewVC Help
Powered by ViewVC 1.1.2