Processing batch data from another server

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

I have a new project coming up that involves something I have not done
before, so I thought I'd ask a rather broad question in order to begin
determining how to best go about it.

I apologize if my questions seem rather elementary (noobish?).

I have to process a batch of data (from enrollment/ application forms)
that I get from another server once a day.  It will be a file containing
typical applicant data - name, address, phone, but also ss#, so it must
be secure.  There could be anywhere from 1 - 100 applications per day in
the file.

It seems rather straight forward, but I thought I'd ask as a sort of
sanity check.  I will be using Php.

One method is to poll the data myself with a cron job (using fopen to
access a file on the other server over a secure link?).

The other method would be to have the data pushed from the other
server.  I am in the dark here and I'm not sure what methods are
available for doing this.  This is where I need the most advice

The data can be formatted using XML or csv (or even a design of my own
choosing).  I have not used XML before, but it looks rather easy - - -
and Php has built-in XML parsing functions (besides, it would probably
be a good idea to get my feet wet).  Should I use that or just stick
with a familiar csv format?

I would appreciate any thoughts on a simple, secure way to dependably
obtain this data once a day.

My scripts will enter this data into a local database, and then produce
a PDF for each application.  This part already works (I currently
process single forms filled out by applicants on my site).

Should I poll the data myself (cron job - once a day) or have the other
server push the data to me?  If using the "push" method how do I do
that?  How does the other server transmit the file (data) to my script?  
Again, I apologize if this seems rather elementary, but I am new to this.

Thanks in Advance.

 Chuck Anderson • Boulder, CO

Re: Processing batch data from another server

Fleeing from the madness of the Listing Design jungle
and said:

Quoted text here. Click to load it

o scp/sFTP
o ssh
o https

Quoted text here. Click to load it

doesn't matter - but if you have any control over the data format then I'd  
insist on header and trailer records - the former containing a serial  
number and the latter containing a record count (and optionally a checksum)

Quoted text here. Click to load it

You don't mention the platforms of the servers involved - that would be  
useful information.  Also it may help to know how much control the data  
provider and data consumer have over their respective servers.

Quoted text here. Click to load it

Do you need to know how the remote server is managed?  surely you simply  
care that a compatible protocol is chosen and used.

William Tasso

Re: Processing batch data from another server

William Tasso wrote:

Quoted text here. Click to load it

Quoted text here. Click to load it
Https is the only one I have experience with (although I'm sure I could
figure out how to use secure ftp).

With https, would they simply POST the file contents when invoking my
script - or would they supply me with a filename that I would then
fopen?  This is where it gets a bit dodgy for me.  I've done a lot of
web design, I've used Php and basic MySQL for three years, but I'm not
clear on how all the network interactions (passing a file) are handled
from server to server.

Quoted text here. Click to load it
Interesting.  They told me I could define the format, so I'll ask them
if they can supply a header and trailer record.  I like the idea of
being concise and checking for errors.

Quoted text here. Click to load it
I (the data consumer) am on a remote shared host - Linux/Apache with
little to no control of the server.
I don't know what the other server (data provider) is, but I believe
that it is their own in house server.

Quoted text here. Click to load it
No.  I don't think so.

Quoted text here. Click to load it
Yes.  And that part I think I can work out myself (thanks for the idea
of adding error checking).  I wasn't sure if XML would be preferable for
any reason (it seems like everyone talks about using it these days).  
That's why I asked about it.

What I don't understand is how they "push" the file (data) to me.  This
seems like the best method conceptually, as if I run a cron job to look
for the file, I have to make provisions in case their server was down or
something and it's not there.  I'd rather have them invoke my scripts
when the data is ready.

 Chuck Anderson  Boulder, CO

Re: Processing batch data from another server

Fleeing from the madness of the Listing Design jungle
and said:

Quoted text here. Click to load it

ok - stop trying to be too clever.  Data management is simple when you  
break it down to simple processes.

Forget pull.  As the remote server is in-house they probably won't let you  
access it in any meaningful way.  Let them push over an agreed protocol -  
how and when they do that is no longer any concern of yours.

How you receive the file(s) needs careful attention.  Frankly I wouldn't  
let anyone simply load data files over ssh/scp/ftp - you need some control  
and monitoring of what is happening and when.  A simple file upload page  
running on an SSL enabled site would do the trick - I have also used email  
to load data but this method isn't secure without additional skulduggery -  
and we really do want to keep this simple.  Another advantage of using  
https/upload is that the remote can use your page manually or they can  
automate it - and you really don't care which.

Your upload script needs to log (and optionally report [via mail?]) all  
activity at the file level, allocating *your* serial number and safely  
storing the file in an area ready for processing  
(/data-in/FileName-LoadDateTime-SerialNumber.csv).  Be prepared to accept  
several files concurrently - allocate the serial number right at the start  
of the process.

The file processing script (fired by cron maybe) is now fairly trivial -  
it simply needs to process the files checking your allocated serial number  
with the number in the header record - any inconsistencies cause  
abort-run-notify-panic.  Assuming all is ok with the serial number check  
you can go ahead and process the data in the file.  Rinse and repeat till  
there are no more files to process.

As for csv vs xml (or any other format - fixed width text files are often  
used) - it really doesn't matter, but I will tell you this, plain text is  
a whole heap easier to debug by hand/eye than xml.

As for the error checking - it comes naturally after you've been writing  
data management systems for several years :)

The checksum need not be a complicated algo - if there is a numeric field,  
simply add the values in that field.
William Tasso

Re: Processing batch data from another server

Chuck Anderson wrote:

Quoted text here. Click to load it

Sounds like fun.

One of my current projects involves keeping data synchronised between
several different servers. Sure, many database packages include
replication, but the complication here is that the databases being synced
are different versions of the same database, running on different
platforms, and in the future may include a different RDMS too.

When a change is triggered on one server, it encodes the data change as
XML, pushes it to the "master server" (each table in the database can have
a different master server!) which makes the change on its copy of the data
and then pushes the data out to the other servers.

In the above case, a push is required. A pull would be useless because
server A doesn't know when to request an update from server B, because it
doesn't know when data on server B changes.

Really you need to decide which method (push/pull) is most useful for your
situation. If you need to have the data at a particular time, pull it
down. If you don't know exactly when the data will be ready, but you'd
like to have it as soon as it is, then have them push it across.

Toby A Inkster BSc (Hons) ARCS
Contact Me  ~

Site Timeline