First position two strings differ

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

•  Subject
• Author
• Posted on
What's the most efficient way to compare two strings and return the first
position they differ? The corresponding C code would look something like
this:

int strcpos(const char *s, const char *t) {
int i;
for (i = 0; s[i] == t[i]; ++i)
if (s[i] == '')
return -1; /* s==t */
return i;
}

You could do it like this:

sub strcpos {
my @chars = split //, shift;
my @chart = split //, shift;
my \$i = 0;
while ((my \$c = shift @chars) == shift @chart) {
return -1 unless defined \$c;
++\$i;
}
return \$i;
}

But that just feels un-Perlish and inefficient. Is there some other clever
way to do it?

Brian

Re: First position two strings differ

Oops. In my Perl idea, I am comparing strings with ==, when clearly I
should be using eq instead. Please overlook that little glitch. I must
have been copying too much from my C. :)

Brian

Re: First position two strings differ

Also sprach Brian Kell:

> What's the most efficient way to compare two strings and return the first
> position they differ? The corresponding C code would look something like
> this:
>
> int strcpos(const char *s, const char *t) {
>      int i;
>      for (i = 0; s[i] == t[i]; ++i)
>          if (s[i] == '')
>              return -1; /* s==t */
>      return i;
> }
>
> You could do it like this:
>
> sub strcpos {
>      my @chars = split //, shift;
>      my @chart = split //, shift;
>      my \$i = 0;
>      while ((my \$c = shift @chars) == shift @chart) {
>          return -1 unless defined \$c;
>          ++\$i;
>      }
>      return \$i;
> }
>
> But that just feels un-Perlish and inefficient. Is there some other clever
> way to do it?

You can use the xor-trick: Xor the two strings and look for the first
character which is different from '':

sub strcpos {
my (\$s, \$t) = @_;
return (\$s ^ \$t) =~ /(0+)[^0]/ ? length(\$1) : -1;
}

print strcpos("foobar", fooBar");
__END__
3

Tassilo
--
\$_=q#",}])!JAPH!qq(tsuJ[3..0}_\$;//::niam/s~=)]3[))_\$-3(rellac(=_\$({
pam)(rekcah)(lreP)!JAPH!qq(rehtona{tsuJbus#;
\$_=reverse,s+(?<=sub).+q#q!'"qq.\t\$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval

Re: First position two strings differ

comp.lang.perl.misc:
> Also sprach Brian Kell:
>
> > What's the most efficient way to compare two strings and return the first
> > position they differ? The corresponding C code would look something like
> > this:
> >
> > int strcpos(const char *s, const char *t) {
> >      int i;
> >      for (i = 0; s[i] == t[i]; ++i)
> >          if (s[i] == '')
> >              return -1; /* s==t */
> >      return i;
> > }
> >
> > You could do it like this:
> >
> > sub strcpos {
> >      my @chars = split //, shift;
> >      my @chart = split //, shift;
> >      my \$i = 0;
> >      while ((my \$c = shift @chars) == shift @chart) {
> >          return -1 unless defined \$c;
> >          ++\$i;
> >      }
> >      return \$i;
> > }
> >
> > But that just feels un-Perlish and inefficient. Is there some other clever
> > way to do it?
>
> You can use the xor-trick: Xor the two strings and look for the first
> character which is different from '':
>
>     sub strcpos {
>     my (\$s, \$t) = @_;
>     return (\$s ^ \$t) =~ /(0+)[^0]/ ? length(\$1) : -1;

The regex must be anchored to the beginning of the string.  OTOH,
there's no need to check for a non-null character.  Also, the return
value should be 0 if the strings differ from the beginning.  I'd say

(\$s ^ \$t) =~ /^0*/;
\$+[ 0];

>     }
>
>     print strcpos("foobar", fooBar");
>     __END__
>     3

Anno

Re: First position two strings differ

Also sprach Anno Siegel:

comp.lang.perl.misc:
>> Also sprach Brian Kell:
>>
>> > What's the most efficient way to compare two strings and return the first
>> > position they differ? The corresponding C code would look something like
>> > this:
>> >
>> > int strcpos(const char *s, const char *t) {
>> >      int i;
>> >      for (i = 0; s[i] == t[i]; ++i)
>> >          if (s[i] == '')
>> >              return -1; /* s==t */
>> >      return i;
>> > }

[...]

>> You can use the xor-trick: Xor the two strings and look for the first
>> character which is different from '':
>>
>>     sub strcpos {
>>     my (\$s, \$t) = @_;
>>     return (\$s ^ \$t) =~ /(0+)[^0]/ ? length(\$1) : -1;
>
> The regex must be anchored to the beginning of the string.  OTOH,
> there's no need to check for a non-null character.  Also, the return
> value should be 0 if the strings differ from the beginning.  I'd say
>
>         (\$s ^ \$t) =~ /^0*/;
>         \$+[ 0];

It should be anchored, yes. I did the test for non-null in order to
be able to return -1 when the match fails (that is, the two strings are
equal).  So your @+ approach wont return -1 on equal strings.

To handle strings that differ at the first character, I'd now rewrite
the whole thing into

return (\$s ^ \$t) =~ /^(0*)[^0]/ ? length(\$1) : -1;

Tassilo
--
\$_=q#",}])!JAPH!qq(tsuJ[3..0}_\$;//::niam/s~=)]3[))_\$-3(rellac(=_\$({
pam)(rekcah)(lreP)!JAPH!qq(rehtona{tsuJbus#;
\$_=reverse,s+(?<=sub).+q#q!'"qq.\t\$&."'!#+sexisexiixesixeseg;y~\n~~dddd;eval

Re: First position two strings differ

comp.lang.perl.misc:
> Also sprach Anno Siegel:
>
> comp.lang.perl.misc:
> >> Also sprach Brian Kell:
> >>
> >> > What's the most efficient way to compare two strings and return the first

> >> > position they differ? The corresponding C code would look something like
> >> > this:
> >> >
> >> > int strcpos(const char *s, const char *t) {
> >> >      int i;
> >> >      for (i = 0; s[i] == t[i]; ++i)
> >> >          if (s[i] == '')
> >> >              return -1; /* s==t */
> >> >      return i;
> >> > }
>
> [...]
>
> >> You can use the xor-trick: Xor the two strings and look for the first
> >> character which is different from '':
> >>
> >>     sub strcpos {
> >>     my (\$s, \$t) = @_;
> >>     return (\$s ^ \$t) =~ /(0+)[^0]/ ? length(\$1) : -1;
> >
> > The regex must be anchored to the beginning of the string.  OTOH,
> > there's no need to check for a non-null character.  Also, the return
> > value should be 0 if the strings differ from the beginning.  I'd say
> >
> >         (\$s ^ \$t) =~ /^0*/;
> >         \$+[ 0];
>
> It should be anchored, yes. I did the test for non-null in order to
> be able to return -1 when the match fails (that is, the two strings are
> equal).  So your @+ approach wont return -1 on equal strings.

I was happy with it returning an index outside of the strings, but -1
may be a better indicator.

> To handle strings that differ at the first character, I'd now rewrite
> the whole thing into
>
>     return (\$s ^ \$t) =~ /^(0*)[^0]/ ? length(\$1) : -1;

Yup.

Anno

Re: First position two strings differ

Tassilo v. Parseval wrote:
> You can use the xor-trick: Xor the two strings and look for the first
> character which is different from '':

Aha! I knew there was some good reason for the string ^ operator.
Excellent. Thank you.

Brian