/[drupal]/contributions/modules/datasync/README
ViewVC logotype

Contents of /contributions/modules/datasync/README

Parent Directory Parent Directory | Revision Log Revision Log | View Revision Graph Revision Graph


Revision 1.1 - (show annotations) (download)
Mon Jul 21 22:49:59 2008 UTC (16 months ago) by andrewlevine
Branch: MAIN
CVS Tags: HEAD
Branch point for: DRUPAL-5, DRUPAL-6--1
Committing actual files
1 ; $Id$
2
3 description
4 -------------
5 The DataSync module was written to import data reliably on a large scale. It
6 allows you to schedule and run multiple types of import jobs on multiple servers in
7 a reliable and centralized way. IT is NOT very scalable at the moment because
8 Drupal 5 does not work well with database transactions so you should only run each
9 consumer on one machine at a time in order to prevent race conditions. This
10 should be fixed in the Drupal 6 version. It is however very functional and has
11 run thousands of jobs on our production servers already.
12
13
14 overview of the modules
15 --------------------------
16 SUMMARY
17 -The datasync.module provides an API (both PHP functions and web service) to
18 schedule and keep track of data import jobs
19 -The datasync_consumer.module and datasync_producer.module implement that API
20 and will automaticlly start and run your jobs by calling PHP hooks that you
21 will define in a separate module (datasync_api_example.module for an example).
22
23 DATASYNC.MODULE
24 The datasync.module file by itself provides an API for scheduling and
25 running data import jobs. It also provides a library of supporting functions
26 and database tables that may help importing data. The API can be accessed either
27 by calling the functions with PHP directly or with web service calls to paths
28 defined in datasync_menu() (please note that as of July 21, 2008, the web service
29 API is fairly incomplete and untested). You should only make changes to the tables
30 created by datasync.module through the API, unless you are certain of what you are
31 doing. It is important to realize that by itself, datasync.module will not initiate
32 or run jobs, or do much of anything. This module just provides a way to schedule
33 these jobs. You should look at the comments above each API core function in
34 datasync.module under the heading "DATASYNC CORE FUNCTIONS" to learn how to implement
35 the API.
36
37 DATASYNC_CONUMER.MODULE AND DATASYNC_PRODUCER.MODULE
38 Since it would be a pain to fully implement this API every time you wanted to import
39 a new type of data, the datasync_consumer and datasync_producer modules implement it
40 for you and provide their own API for you to define your specific data import jobs.
41 These modules will schedule and run jobs for you (according to how you define them) and
42 should take most of the drudgery out of getting a data importer working. Please note
43 that if you use the datasync_consumer and datasync_producer modules, the jobs will run
44 by being called as PHP hooks. This means if you want to run some totally separate
45 non-Drupal and non-PHP system to generate the data as you import it, you probably
46 would not want to use these modules and instead implement the DataSync API yourself.
47 You should, however, be able to use the datasync_consumer and datasync_producer modules
48 in a majority of cases.
49
50 The datasync_producer and datasync_consumer modules work together by updating the
51 datasync_jobs table. The datasync_producer.module will create new jobs at specified
52 intervals and advance them to the next status when it has finished running its current
53 task. The datasync_consumer.module will take jobs when they are waiting to be processed,
54 call the appropriate hooks to work on the job, and then set the job status as completed when
55 it is done. In other words, datasync_producer.module will continually create jobs and make
56 sure they are ready and waiting to be processed, while datasync_consumer.module actually
57 processes the jobs and marks them as finished. You define your specific jobs by implementing
58 a module that will tell the datasync_consumer.module and datasync_producer.module what to do.
59 The best way to do this is probably to copy datasync_api_example.module (which is commented
60 heavily), and tweak the hooks and functions to work to your expectations.
61
62 The datasync_producer and datasync_consumer modules run persistently through the php command
63 line interface. They are spawned by the hook_cron function on the servers that you specify
64 and exit themselves hourly to prevent memory leaks. This means you should run the appropriate
65 hook_cron functions at least hourly to make sure these processes continue running.
66
67
68 install and configuration
69 --------------------------
70 See the INSTALL file
71
72
73 todo
74 --------------
75 interface for ds_variable_get('datasync_fail_job_restart', 1);
76 reporting mechanism for datasync failures
77 statistics for job completion
78 interface for killswitch for consume.php and produce.php
79 interface to purge started jobs
80
81 Originally contributed by SonyBMG

  ViewVC Help
Powered by ViewVC 1.1.2