Flushing and multiple pipes

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View
Dear group,

happy to post my first message here! So, now the business.

I define two pipes which print both two parsed datastreams on the
_same_ postscript file. The data values are selected through 'commands'
by certain criteria in the following manner:

# shortened pseudocode

my $pipe1  = " commands >> $psfile";
my $pipe2  = " commands >> $psfile";

open(H1 ,"| $pipe1") or die "\n --- Error: Could not plot $pipe1: $!\n";
open(H2 ,"| $pipe2") or die "\n --- Error: Could not plot $pipe2: $!\n";

foreach my $ifile (@ifiles){
    open(IFILE, "<$ifile");
            $line = split(/\s+/,$_);
            print H1 $line;
            print H2 $line;
# end shortened pseudocode

I noticed that, depending on the datastream type, sometimes the
postscript is written and closed correctly, sometimes it does not. I
noticed also that on Ubuntu the script behaves differently than on
Leopard OSX. No fancy modules are loaded, just pure Perl. The
'commands' are a pipe of awk and gmt routines.

To be on the safe side, i duplicated the foreach loop, and now i open
one pipe at a time, guessing that the problem is how the OS flushes the
pipes' buffers and how the postscript file gets the values.

It ain't by any means the best solution, because i have to parse the
same files twice. Then the question: is there a way to open concurrent
pipes in a robust way?


Re: Flushing and multiple pipes


Quoted text here. Click to load it

I assume 'correctly' means you get two differently processed lines for
each input line in the output file, in the order they were written in
perl. That's never going to work reliably in this way because not only
perl employs internal output buffering but the commands running as
part of your pipeline do this as well: If they use stdio, their output
will be 'fully buffered' when stdout is not connected to an
interactive device. Also, the processes in both of your pipelines
execute asynchronously with respect to the Perl control process and
the processes in the other pipeline. This means you may get higher
throughput in this way but the downside is that output reordering may
(and usually will) occur.

The simple but relatively inefficient solution to that is to create
two new pipelines for each input line and don't start the second
before the first has terminated (or the third before the second has
terminated and so on). Unless you're repeatedlyv dealing with large
inputs, this is probably good enough, though. If you wan't to process
the input asynchronously and concurrenly, you need to employ a final
'put it back together' filter which reads data from both pipelines
as it becomes available and puts the output back into the proper

Re: Flushing and multiple pipes

On 2012-07-20 15:08:14 +0200, Rainer Weikusat said:

Quoted text here. Click to load it


Thanks for the explanation. So, it seems i was on the right track somehow.

Quoted text here. Click to load it

This is the way i implemented it now. I could read the files at once
but isn't an option as well. And you are pointing out the issue: the
data records are large enough, 1-2 Gb.

Quoted text here. Click to load it

It sounds quite new to me. I found on perfaq5 pack/unpack. Is is right?


Site Timeline