Renumbering Files - Page 2

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Re: Renumbering Files

On Saturday, February 13, 2016 at 11:43:15 AM UTC-8, Robbie Hatley wrote:
Quoted text here. Click to load it

You must have missed my follow-up to correct the typo.  

Quoted text here. Click to load it

You're right... a cautionary note may be scary or even disallowed
in some situations. IMO it's also worth noting the "experimental"  
tag often outlives its reasonable shelf life. Autoderef has been  
around about 5 years; the embedded code construct in regular exp's
...far longer.    

Quoted text here. Click to load it

Er, such complaints have been made against the Perl language in  
its entirety. And they're routinely leveled at every new feature
in Perl.(Deservedly, maybe in the case of the smart match operator.)
However, a simple autoderef of a push() or pop() or the use of a
ternary doesn't warrant that imo.  

Re: Renumbering Files

Quoted text here. Click to load it

Quoted text here. Click to load it

I attempted a rewrite of your code with the following changes:
delegating the arguments checks to Getopts long module
and converting all our variables to my. And also trying to use
constants for scalability.


# This is an 78-character-wide ASCII-encoded Perl source-code text file.
# =======|=========|=========|=========|=========|=========|=========|========

# /rhe/scripts/util/merge-batches                                            #
# Merges two same-day batches of Olympus digital camera JPG photos,          #
# renumbering second batch as necessary to prevent number overlap.           #
# Edit history:                                                              #
#    Sun Feb 07, 2016 - Wrote first draft (stub).                            #
#    Tue Feb 09, 2016 - Updated (now fully functional).                      #
#    Thu Feb 11, 2016 - Cleaned up some implementation details.              #

# Usage: merge-bath --nodry_run -s dirA -d dirB
# Note: --dry_run is ON by default. So, to get the files renamed, we have to
# actively turn it OFF via the command line.

use 5.022;
use strict;
use warnings;

# ============================== USEFUL MODULES ==============================
use Cwd            qw( chdir getcwd );
use File::Basename qw( basename );
use Getopt::Long   qw(:config no_ignore_case no_auto_abbrev bundling);

# ============================= GLOBAL CONSTANTS =============================

use constant NL  => qq; # newline
use constant SP  =>  q{ };  # space
use constant TAB => qq; # tab
use constant BS  => qq; # tab

use constant EMPTY => q{};  # null or empty value

use constant MAX_LIMIT => 9999; # 4-digit in prefix can afford so much files

use constant PN    => basename $0; # program name
use constant PAGER => "more"; # or less

use constant AUTHOR => "Robbie Hatley";
use constant EMAIL  => '';

use constant USAGE => join(
   "Usage: " . PN,
      q{[-h|--help] |},
      qw{ [--dry_run] [--debug] },
      q{[-p|--prefix] ...string...},
      q{[-s|--src_dir] ...FILE...},
      q{[-d|--dest_dir] ...FILE...},

use constant WITH_CODE_OF => {
   nopix   => 888,
   help    => 777,
   usage   => 666,
   success => 0,

sub dup_stderr($$) {
   # stash stderr
   open($, '>&', *STDERR) or die "Cannot dup stderr: $!";

   # redirect stderr -> stdout
   open(STDERR, '>&', *STDOUT) or die "Cannot dup stderr: $!";

   open $, "|".PAGER or die "$!";

sub rst_stderr($$) {
   # restore stderr
   open(STDERR, '>&', $) or die "Cannot restore stderr: $!";

   close $ or die "$!";

use constant QUIT_ON => {
   nopix => sub {
      local $\;

      print STDERR <<"EMPTY_DIR";
Error: There were no pictures found in the source directory,
which match the prefix options that you specified.
Hence nothing much to do. Bailing out.

      exit WITH_CODE_OF->;

   usage => sub {
      local $\;

      print STDERR <<"ERR_MESG";
Error: This program takes exactly 3 arguments. The 1st argument
must be an Olympus digital camera image prefix such as "PB17",
and the 2nd and 3rd arguments must be directories containing
same-day Olympus digital camera jpg files.

Type "$ --help" for more info.

      exit WITH_CODE_OF->;

   help => sub {
      local $\;

      # stash stderr, redirect stderr, & open pipe
      dup_stderr( my $stderr_sav, my $pager_fh );

      print $pager_fh <<"HELP_MESG";
Welcome to "$". This program merges-together same-day
batches of Olympus digital camera JPG files, doing necessary
number corrections.

Command line:

Description of options:
Option:                      Meaning:
"-h" or "--help"             Print help and exit.
"--debug"                    Prints additional debug messages.
"--dry_run"                  Prints what files will be affected without
                             actually doing so.

If not requesting help, this program takes 3 mandatory arguments
and 2 optional arguments( --dry_run and --debug ):

Arg 1: Prefix. This must be a valid 4-character Olympus digital camera
prefix code, such as "P817" (Aug 17) or "PC25" (Dec 25).

Arg 2: First-batch directory.

Arg 3: Second-batch directory.
This program will then look for JPG picture files with the given prefix
in each of the 2 given directories. It will determine min and max file
numbers for both directories. It will rename the files in the second
directory so that their minimum number is 1 greater than the first
directory's maximum number. It will then merge the JPG files from
directory 2 into directory 1.


      # restore stderr
      rst_stderr $stderr_sav, $pager_fh;

      exit WITH_CODE_OF->;

use vars qw( @CWD );

# ================ SUBROUTINE PRE-DECLARATIONS ===============================

sub pix_inorder;
sub select_pix(@);
sub print_debug_block($@);
sub mk_ccomment_safe($);
sub print_stats(\%);
sub pushdir($);
sub popdir;

# =================== PUNCTUATION VARIABLES INITIALIZATION ===================

local $\ = NL; # auto-print newline
local $, = SP; # space out print arguments

# ============================= ARGUMENT CHECKS ==============================

local @CWD = ( getcwd() ); # stack of directories

my($debug, $dry_run) = (0, 1); # Defaults => debug mode:OFF, dry run mode: ON.

my @ARGV_sav = @ARGV; # GetOtions clobbers @ARGV, hence we save it

      'prefix|p=s'   => \my $prefix,
      'src_dir|s=s'  => \my $src_dir,
      'dest_dir|d=s' => \my $dest_dir,
      'dry_run!' => $dry_run,
      'debug'    => $debug,
      'help|h' => QUIT_ON->,

   if !defined $prefix   or !$prefix
   or !defined $src_dir  or !$src_dir
   or !defined $dest_dir or !$dest_dir

# ========================== MAIN ============================================
my $month_in_hex =  q{ [\dA-C]          };  # 12 months represented as hex
my $day_of_month = qr{ [0-2]\d | 3[0-1] }x; # day in a 2-digit form
my $photos_count = qr{ [0]* \d     }x; # number of photos in 4-digit
my $photos_type  = qr{ [.] (?:JPG|jpg)  }x; # file format for storing photos

# Validating the prefix input
$prefix =~ m{\A [P] (?ix: $month_in_hex ) $day_of_month \z}x
   or die "[E]Bad prefix '$prefix'";

# dynamically construct the regex for a valid picture name
my $picture_name = qr{
   (?ix: $prefix       )
   (?:   $photos_count )
   (?:   $photos_type  )

my %knt_stats;


print_stats %knt_stats;

# ========================== SUBROUTINE DEFINITIONS ==========================
sub merge_batches {
   print_debug_block("Src  dir = $src_dir", "Dest dir = $dest_dir");

   my $both_knt = my $dest_knt = my @dest_files = do{
      pushdir $dest_dir;
      opendir my $dest_dirh, "."
         or die(
               "[E]Cannot open directory handle",
               "on destination dir '$dest_dir': $!",
            [sort { pix_inorder } select_pix readdir $dest_dirh],

   &} unless $dest_knt;

   print_debug_block("List of dest files...", @dest_files);

   my ($max_dest_fname) = $dest_files[-1] =~ /^ $prefix [0]* (\d) /x;

   print_debug_block("Max index of dest files...", $max_dest_fname);

   $both_knt += my $src_knt = my @src_files = do{
      pushdir $src_dir;
      opendir my $src_dirh, "."
         or die(
               "[E]Cannot open directory handle",
               "on source dir '$src_dir': $!",
            [sort { pix_inorder } select_pix readdir $src_dirh],

   # populate the global variable holding count stats
   %knt_stats = (
       src_knt  => $src_knt,
       dest_knt => $dest_knt,
      both_knt => $both_knt,

   print_debug_block("List of source files...", @src_files);

   my ($min_src_fname) = $src_files[0] =~ /^ $prefix [0]* (\d) /x;

   my $filenames_overlap = 0;
   for my $src_file ( @src_files ) {
      $filenames_overlap++,last if -e "$dest_dir/$src_file";

   if ( $filenames_overlap ) {
      my $boost = $max_dest_fname - $min_src_fname + 1;
      my $avail_4digit_slots = ((MAX_LIMIT) - $boost);

         "[E]Not enough slots available in the 4-digit",
         "number to be able to rename source list",
         if $avail_4digit_slots < @src_files;

      for ( @src_files ) {
         my $src_fn_orig = $_;
         my $src_fn_conv;

            $src_fn_conv =
               join(EMPTY, (
                  sprintf("%04d", $max_dest_fname++),
         } until ! -e "$dest_dir/$src_fn_conv";

         if ( $dry_run ) {
            print "would rename $src_fn_orig ---> $src_fn_conv...";

         if ( ! $dry_run ) {
            rename "$src_dir/$src_fn_orig", "$dest_dir/$src_fn_conv"
               or die(
                     "[E]Could not mv the source file '$src_fn_orig'",
                     "into the dir '$dest_dir'",
   else {
      if ( $debug ) {
         print "Nothing to rename as the source & dest files do not overlap.";
      # do a simple mv to dest dir
      if ( ! $dry_run ) {
         rename "$src_dir/$_", "$dest_dir/$_"
            or die(
                  "[E]Could not mv the source file '$src_dir/$_'",
                  "into the destination dir '$dest_dir'",
               for @src_files;

sub print_stats(\%) {
   local $\;

   my($src_knt, $dest_knt, $both_knt) = @{
   qw/ src_knt  dest_knt    both_knt / };

   print <<"KNT_STATS";

 Batch 1 contained     $dest_knt files.
 Batch 2 contained     $src_knt  files.
 Merged batch contains $both_knt files.

sub select_pix(@) {
   grep { -f && /\A$picture_name\z/x }

sub pix_inorder {
   my($x, $y) =
      map { /\A[0]*(\d)\z/x }
      map { /($photos_count) $photos_type\z/x }
      $a, $b;

   $x <=> $y;

sub print_debug_block($@) {
   my @argv = map { mk_ccomment_safe $_ } @_;

   my ($maxl) = sort { $b <=> $a } map { length } @argv;

   my $stars = q x (++$maxl);

   print STDERR
         (map { "[I]$_" } @argv),
      ) if $debug;

sub mk_ccomment_safe($) {
   local $_ = $_[0];
   s{ (?<=/) [*] }egx; # change /* -> /[*]
   s{ [*]  (?=/) }egx; # change */ -> [*]/

sub pushdir($) {
   push @CWD, $_[0];
   chdir $CWD[-1] or die "[E]Cannot cd to '$CWD[-1]': $!";

sub popdir {
   die "[E]popdir: Directory stack empty." if @CWD < 2;
   pop @CWD;
   chdir $CWD[-1] or die "[E]Cannot cd to '$CWD[-1])': $!";


Re: Renumbering Files


Regarding your re-write of my script "merge-batches", on the plus side:
  - It's more scalable.
  - It has debugging provisions.
  - It tests for number over-run.
  - It has "dry run".

However, on the minus side:
  - It's about twice as long as the original.
  - To me, it's much less clear as to what its doing.
    For example, to me, the following looks like a submission to
    an "Obfuscated Perl" contest:

    my $both_knt = my $dest_knt = my @dest_files = do{
       pushdir $dest_dir;
       opendir my $dest_dirh, "."
          or die(
                "[E]Cannot open directory handle",
                "on destination dir '$dest_dir': $!",
             [sort { pix_inorder } select_pix readdir $dest_dirh],

  - Your re-write uses "rename" directly, rather than using my
    RH::Dir::rename_files. That's dangerous because it doesn't
    test for over-writing existing files as my rename_files does.
  - Number over-run just isn't going to occur. To have an over-run,
    the photographer / script user would have to shoot over 10,000
    photos in 1 day. Not going to happen.
  - Changes CWD. Not necessary. My version just leaves CWD where it
    is. I don't know or care what the CWD is. All file paths are
    used in "fully qualified" form. The two directories could be
    children of the CWD, or could be elsewhere, even on other
    volumes. Since every file is accessed only via its full path,
    doesn't matter.
  - Collects file names. Not necessary. I just collect the numbers,
    and for Directory 2 only, and for matching files only (because
    those are the only files which will be renamed & moved).
    For Dir 1, I just note the maximum serial number of the matching

I'm afraid that my original script is useful only to me and perhaps
a half-dozen other humans in this universe. The reason being,
I think very few people these days still use 10-year-old Olympus
digital cameras with tiny 40-photo memory cards and USB upload ports.

Now, if you or others want to "generalize" my script and make
something more widely-useful out of it, that's fine with me.

But your re-write, while full of interesting bits of Perl technique,
is much more convoluted than my style. I tend to write in a simpler,
less-nested style with clarity of "what is this script doing and
how is it doing it?" of foremost concern. I see Perl as being more of
a human language than a computer language.

I found it amusing, btw, that you retained (and expanded on) some of
my return codes:

use constant WITH_CODE_OF => {
    nopix   => 888,
    help    => 777,
    usage   => 666,
    success => 0,

I'm curious if you realize where some of those came from.

666 I've used as an "Error" return code for years, both
at home and at work. It's The Number Of The Beast and indicates
that something evil happened. :-)

777 means "user just wanted to see the help file, so this
instance just displayed help and exited; it didn't do anything."

42 I often use to mean "This [function | program] completed its
task successfully and completely and exited normally."
(BONUS QUESTION: Can you discern where that came from?)
(Hint: Deep Thought)

0 can mean either success or failure. In C & C++ it usually means
success, but in Perl usually means failure because it equates to
logical "false", thus facilitating idioms such as:

    ReadFile() and StoreData() or WarnUser();

Bonus Tip: When renaming files from Perl, don't use rename() directly.
Instead, wrap it in a sub that does a bit more error checking,
else you may end up over-writing things you didn't want to.
Something like this should do the trick:

# Rename a file, with more error-checking than rename() provides.
# Returns true on success, false on failure.
sub rename_file($$)
    my $OldFileName = shift;
    my $NewFileName = shift;

    # Disallow renaming to same name:
    if ($NewFileName eq $OldFileName)
          "Error in rename_file: new file name is same as old:\n".
          "old name: $OldFileName\n".
          "new name: $NewFileName\n"
       return 0; # returns 0, indicating FAILURE

    # Make sure old file name exists:
    if (not -e $OldFileName)
          "Error in rename_file: file \"$OldFileName\" does not exist.\n"
       return 0; # returns 0, indicating FAILURE

    # Disallow renaming to a name that already exists, even if we're in
    # Windows and user is just attempting a case change. User can use
    # Bash's mv or Dos's ren for case changes; it's too dangerous to
    # allow it to be done programmatically.
    if (-e $NewFileName)
          "Error: file \"$NewFileName\" already exists.\n"
       return 0; # returns 0, indicating FAILURE

    # Attempt to rename the file; return true on success, false on failure:
    return rename($OldFileName, $NewFileName);

Robbie Hatley
Midway City, CA, USA
perl -le 'print "4o6e7o4f0w5llc7m"'

Re: Renumbering Files

On Saturday, 13 February 2016 15:10:59 UTC+5:30, Robbie Hatley  wrote:

Quoted text here. Click to load it

You are right. What happened was that it started out with the <...> operator
which outputs file names with full paths, Hence, that's why the pushdir came  
in. While doing a rewrite to the readdir, I didn't realize that it outputs
just basenames. Thus there's no need for all the push/pop-s then.

Quoted text here. Click to load it
Hmm, when we do the boost operation by finding the min/max of the dir1
dir2 directories, that possibility is removed that they can be overwritten.
That's why the CORE::rename function was used in this case. Otherwise, ofcourse
we need proper checks for them as you point out.

Quoted text here. Click to load it
This is just-in-case...a safety feature.

Quoted text here. Click to load it
You are correct, the "readdir" operation will render it moot.

Quoted text here. Click to load it
Excellent point.

No idea. May be an interrupt was caught just before the relinquished
control to caller and soon as all it's work got over??

Quoted text here. Click to load it
The exit status is for the bash/shell running the Perl code, and there
0 is success and anything else a failure. So these exit codes are for
bash & not internal perl functions. That's they are invoked with Perl's
"exit" function.

Quoted text here. Click to load it

As stated above, the boost functionality renders this moot.

Thank you for your comments and observations.

Re: Renumbering Files

On Monday, February 8, 2016 at 3:44:32 PM UTC-8, Robbie Hatley wrote:
Quoted text here. Click to load it

I'd be tempted to just cook up something like this
and then copy them to the new directory.

perl -e 'for(glob("P207001*")) {($f=$_)=~s/(\d+)/$1+6/e; $rename $_,$f}'

Re: Renumbering Files

On 9/2/2016 01:44, Robbie Hatley wrote:
Quoted text here. Click to load it

my $dir_source = 'batch1';
my $dir_target = 'batch2';
opendir DIRSOURCE, $dir_source or die "oups $!\n";
while (readdir DIRSOURCE) {
next unless -f "$dir_source/$_";
print "source : $dir_source/$_ , target : ",Unique("$dir_target/$_"),"\n";
closedir DIRSOURCE;

sub Unique {
my ($dir,$file )= $_[0] =~/^(.*?)([^\/]*)$/;
if ( $dir=~/^\s*$/ ) { $dir = '.' } else { $dir    =~s/\/*$// }
$file = 'node' if $file=~/^\s*$/;
return "$dir/$file" if ! -e "$dir/$file";
my $i=1; while ( -e "$dir/$i.$file" )

Site Timeline