|
Posted by Peter J. Holzer on March 9, 2008, 9:28 am
Please log in for more thread options > brian d foy a écrit :
>>> PerlFAQ Server a écrit :
>>>
>>>> 5.3: How do I count the number of lines in a file?
>>
>>> How does this code handle other unicode new line codages, such as 0x85,
>>> 0x0d 0x0a?
>>
>> If you have a different idea of the human concept of "line", you'll
>> have to adjust the code to have the right line ending (and probably not
>> use tr///).
For perl, a newline is "\n". Conversion to and from some file encoding
should be done with the appropriate IO layer.
> I do not remember where I read it, but
> specificity of computers standards is there are so many.
>
>
> I assume the following can be used:
>
> «A newline sequence is defined to be any of the following:
>
> \u000A | \u000B | \u000C | \u000D | \u0085 | \u2028 | \u2029 |
> \u000D\u000A »
If all of these were just "newline sequences", they should be turned
into "\n" by the crlf conversion of the encoding(utf-*) layers. But they
aren't. For example, \u000C signifies not just a new line, but a new page.
> However it is not always true, because with some encodings (cp850 for
> instance), 0x85 is plain character.
cp850 doesn't have anything to do with unicode, so that doesn't seem
relevant.
hp
|