Re: sexp xml syntax transformation

Do you have a question? Post it now! No Registration Necessary.  Now with pictures!

Threaded View

Quoted text here. Click to load it

thanks Alex Shinn & Alexander Burger.

recently i got tired of w3c and now html5 wtf-group meandering
attitude about html/xhtml/html5 in the past decade.

(see:80=88(Google Earth) KML Validation Fuckup=E3=80=89http://xahlee.o =
kml_validation.html )

Who Listens to Correctness When Authorities Meander?


When XML and XHTML came alone in about 2000 with massive fanfare, we
are told that XHTML will change society, or, at least, make the web
correct and valid and far more easier to develop and flexible. Now
it's a decade later. Sure the web has improved, but as far as html/
xhtml and browser rendering goes, it's still a fuck soup with extreme
complexities. 99.99% of web pages are still not valid. Major browsers
still don't agree on their rendering behavior. Web dev is actually far
more complex than before, involving tens or hundreds of tech that
hardly a person even knows about. It's hard to say if it is better at
all than the HTML3 days with80=9Cfont=E2=80=9D and80=9Ctable=E2=
=80=9D tags and gazillion
tricks. The best practical approach is still trial n error with

And, now HTML5 comes alone, from a newfangled hip group, with a
attitude that validation is overrated80=94 a flying fuck to the face
about the XML mantra from standards bodies, just when there starts to
be more and more sites with correct XHTML.

XML is break from SGML, with many justifications why it needs be, and
now HTML5 is a break from both SGML and XML. WTFML anyone?

so i've been thinking of starting a radical proposal.

=E2=80=A280=88HTML6, Your HTML/XML Simplified=E3=80=89

here's the draft
HTML6, Your HTML/XML Simplified

Xah Lee, 2010-09-21

Tired of the standard bodies telling us what to do and change their
altitude? Tired of the SGML/HTML/XML/XHTML/HTML5 changes? Tire no
more, here's a new proposal that will make life easier.

Introducing HTML6

HTML6 is based on HTML5, XML, and a rectified LISP syntax. More
specifically, it is derived from existing work on this, the SXML. , except that there is complete
regularity at syntax level, and is not considered or compatible with
lisp readers. The syntax can be specified by 3 short lines of parsing
expression grammar.

The aim is far more simpler syntax, 100% regularity, and leaner. but
with a far simpler, and more strict, format.

First of all, no error is accepted, ever. If a source code has
incorrect syntax, that page is not displayed.


Here's a standard ATOM webfeed XML file.

<?xml version=3D"1.0" encoding=3D"utf-8"?>
<feed xmlns=3D" " xml:base=3D" /

 <title>Xah's Emacs Blog</title>
 <subtitle>Emacs, Emacs, Emacs</subtitle>
 <link rel=3D"self" href=3D" "/>
 <link rel=3D"alternate" href=3D" "/>
   <name>Xah Lee</name>
   <uri> /</uri>
 <id> </id>
 <rights>=C2=A9 2009, 2010 Xah Lee</rights>

   <title>Using Emacs's Abbrev Mode for Abbreviation</title>
  <link rel=3D"alternate" href=3D" /

Here's how it looks like in html6:

=E3=80=94?xml80=8Cversion80=9C1.0=E2=80=9D encoding80=9Cutf-=
=E3=80=94feed80=8Cxmlns80=9C =
xml:base80=9C /

 80=94title Xah's Emacs Blog=E3=80=95
 80=94subtitle Emacs, Emacs, Emacs=E3=80=95
 80=94link80=8Crel80=9Cself=E2=80=9D href80=9Chttp://xa=
 80=94link80=8Crel80=9Calternate=E2=80=9D href80=9Chttp=
 80=94updated 2010-09-19T14:53:08-07:00=E3=80=95
  80=94name Xah Lee=E3=80=95

 80=94rightsA9 2009, 2010 Xah Lee=E3=80=95

  80=94title Using Emacs's Abbrev Mode for Abbreviation=E3=80=95
  80=94updated 2010-09-19T14:53:08-07:00=E3=80=95
  80=94summary tutorial=E3=80=95
  80=94link80=8Crel80=9Calternate=E2=80=9D href80=9Chtt=
Simple Matching Pairs For Tag Delimiters

The standard xml markup bracket is simplified using simple lisp style
matching pairs. For example, this code:

Is written as:

=E3=80=94h1 HTML6=E3=80=95
The delimiter used is:

Character    Unicode Code Point    Unicode Name
XML Properties and Attributes Syntax

In xml:

<h1 id=3D"xyz" class=3D"abc">HTML6</h1>
In html6:

=E3=80=94h1=E3=80=8Cid80=9Cxyz=E2=80=9D class80=9Cabc=E2=80=9D=E3=
The attributes are specified by matching corner brackets. Items inside
are a sequence of pairs. The value must be quoted by curly double

Escape Mechanisms

To include the80=94tortoise shell=E3=80=95 delimiters in data, use
=80=9C&#x3014;=E2=80=9D and
=E2=80=9C&#x3015;=E2=80=9D, similarly for the80=8Ccorner brackets=E3=

Unicode; No More CD Data and Entities80=9C&amp;=E2=80=9D

There's no Entities. Except the unicode in hexadecimal format
=E2=80=9C&#x=E2=80=B9unicode code point hexidecimal=E2=80=BA=E2=80=9D.

For example,80=9C&amp;=E2=80=9D is not allowed.

Treatment of Whitespace

Basically identical to XML.

Char Encoding; UTF8 and UTF16 Only

Source code must be UTF8 or UTF16, only. Nothing else.

File Name Extension

File name extension is80=9C.xml6=E2=80=9D.


got so tired of the w3c and wtf-group80=9Cstandard bodies=E2=80=9D of =
continuous changing attitude about what html/xhtml should be, so i
cooked up this.

it'd be nice if we just adopt sxml, but the various lisp ones i found
has problems in that they ignore htm/xml as a syntax by itself,
instead, the lispers just wanted lisp compatible with lisp readers,
not really a standalone syntax.

the lisp's sexp syntax has a bunch of problems, foremost is that its
not regular. (which the xml/xhtml movement fixed to html to some
degree, but is now thwarted by the new html5 wtf-group with google and
apple backing, nay, mostly just google + apple.)

=E2=80=A280=88Fundamental Problems of Lisp=E3=80=89

also, the xml as textual representation of a tree has a quirk, in that
each node has this special thing called80=9Cproperties=E2=80=9D or
that are not a node/branch, but rather, are info attached to a node.
The standard sexp to representation for this is inconsistent, e.g.

(tagX :probA aValue :propB bValue ...)

without changing the syntax, the above is like this:

(a b c d e ...)

which means that b c d e are actually nodes.

another way to represent xml's attribute, i think is from sxml as
shown in Alex and Alexander's messages:

(a ((x . "1") (y . "2")) (b NIL (c ((y . "2")) "Mumble")) (d))

(@ (name "value") ...)

both have the same issue. That is, there's no syntax level distinction
of what's a node and what's a node's property.

e.g. in this

(a ((x . "1") (y . "2")) (b NIL (c ((y . "2")) "Mumble")) (d))

the ((x . "1") (y . "2")) can be interpreted as a node by itself,
where the first element is again a node. But also here, it uses lisp's
special con syntax (x . "1") which is itself ambiguous at the syntax
level. e.g. it can be considered as a node named x with 2 branches80=
and80=9C"1"=E2=80=9D, or it can be considered as a node named80=
=9Ccons=E2=80=9D with 2
branches80=9Cx=E2=80=9D and80=9C"1"=E2=80=9D.

in this:

(@ (name "value") ...)

again, this whole thing at the syntax level is simply a node named
=E2=80=9C@=E2=80=9D. Only at the semantic level, that it is taken as proper=
ties of a
node by the special head80=9C@=E2=80=9D.

So, in conceving html6, i thought a solution for getting rid of syntax
ambiguity for node vs attributes is to use a special bracket for
properties/attributes of a node. e.g.

In xml:

<h1 id=3D"xyz" class=3D"abc">HTML6</h1>

In html6:

=E3=80=94h1=E3=80=8Cid80=9Cxyz=E2=80=9D class80=9Cabc=E2=80=9D=E3=

one thing about this html6 is that it is intentionally separate as
being a sexp in the lisp world. The key is that the syntax is designed
specifically as a 2d textual representation of a tree, and with a
attribute quote that attachs a limited form of info (pairs sequence)
to any node to fit existing xml.

the advantage of this is that it should be extremely easy to parse, in
perhaps just 3 lines of parsing expression grammar. And can be easily
done in perl, python, ruby... without entailing lisp quirks.

any thoughts about flaws?

it's just a personal fantasy.98=BA

 Xah88=91 xahlee.org98=84

Re: HTML6 proposal (Re: sexp xml syntax transformation)

On 2010-09-22, Xah Lee wrote:
Quoted text here. Click to load it

   Are they too high or too low?

Chris F.A. Johnson

Re: HTML6 proposal (Re: sexp xml syntax transformation)

On Wed, 22 Sep 2010 14:48:18 -0400, Chris F.A. Johnson wrote in

Quoted text here. Click to load it

As usual, it depends on the persons invidual mindset, engraved behavior patterns
and, of course, how much medication they're
on. If it's these particular bodies here... , well, I'm
ok with following their lead.

Double parked on the corner of Null and Void.

Re: HTML6 proposal (Re: sexp xml syntax transformation)

Quoted text here. Click to load it

This is a fairly good summary of some of the things that are wrong.

Quoted text here. Click to load it
Quoted text here. Click to load it
Quoted text here. Click to load it

But the difficulty of parsing HTML is the least of its problems. Maybe
it's not as easy to parse as your proposed syntax, but quite easy

Quoted text here. Click to load it

It doesn't solve any of the (much harder) problems you described at the

Re: sexp xml syntax transformation


Re: sexp xml syntax transformation

Quoted text here. Click to load it

You may edit your ~/.xmodmap file to bind the keys you want to the
characters you want.

__Pascal Bourguignon__            /

Re: sexp xml syntax transformation

Quoted text here. Click to load it
Quoted text here. Click to load it

yeah, i thought that could be a problem.

my thoughts is this:

Q: Why use weird Unicode characters for matching pair?

A: Unicode has become widely adopted today. (See: Unicode Popularity
On Web.) Unicode also has a lot proper matching pairs. (See: Matching
Brackets in Unicode.) It seems today is the right time to adopt the
wide range of proper characters instead of keep relying on the very
limited number of ASCII characters.

The straight quote character " is not a matching pair, and in code it
present several problems. For example, it needs context to know which
quote chars are paired. Also, it is difficult to recover from a
missing quote. (this problem is especially pronounced in text editors
for syntax highlighting.) A proper matching pair allow programs and
editors to more easily correctly determine the quoted string, and for
easily navigating the tree.

The unicode characters80=94=E3=80=95 and80=8C=E3=80=8D may be dif=
ficult to input. Possibly,
they can be replaced by () and {} for html6. Though, that also means a
lot ugly escape will need to happen in the content text. If not
escaped, that means incorrect syntax for the whole file.


also, i think with today tech, it's trivial to overcome this, in so
many ways.

In emacs, you can easily make the Win or Menu key type these chars.

;;;; set Hyper and Super key
 ((string-equal system-type "windows-nt") ; Windows

  ;; setting the PC keyboard's various keys to
  ;; Super or Hyper, for emacs running on Windows.
  (setq w32-pass-lwindow-to-system nil
        w32-pass-rwindow-to-system nil
        w32-pass-apps-to-system nil
        w32-lwindow-modifier 'super ;; Left Windows key
        w32-rwindow-modifier 'hyper ;; Right Windows key
        w32-apps-modifier 'hyper) ;; Menu key
 ((string-equal system-type "darwin") ; Mac
  (setq mac-option-modifier 'super) ) )

(defun insert-pair (leftBracket rightBracket)
  "Insert a matching bracket and place the cursor between them."
  (insert leftBracket rightBracket)
  (backward-char 1)

Here's my personal setup, for the Dvorak layout:

(defun insert-pair-paren () (interactive) (insert-pair "(" ")") )
(defun insert-pair-brace () (interactive) (insert-pair "{" "}") )
(defun insert-pair-bracket () (interactive) (insert-pair "[" "]") )
(defun insert-pair-single-angle-quote () (interactive) (insert-pair
"=E2=80=B9" "=E2=80=BA") )
(defun insert-pair-double-angle-quote () (interactive) (insert-pair
"=C2=AB" "=C2=BB") )
(defun insert-pair-double-curly-quote () (interactive) (insert-pair
"=E2=80=9C" "=E2=80=9D") )
(defun insert-pair-single-curly-quote () (interactive) (insert-pair
"=E2=80=98" "=E2=80=99") )
(defun insert-pair-double-straight-quote () (interactive) (insert-pair
"\"" "\"") )
(defun insert-pair-single-straight-quote () (interactive) (insert-pair
"'" "'") )

(defun insert-pair-corner-bracket () (interactive) (insert-pair "=E3=80=8C"
"=E3=80=8D") )
(defun insert-pair-white-corner-bracket () (interactive) (insert-pair
"=E3=80=8E" "=E3=80=8F") )
(defun insert-pair-angle-bracket () (interactive) (insert-pair "=E3=80=88"
"=E3=80=89") )
(defun insert-pair-double-angle-bracket () (interactive) (insert-pair
"=E3=80=8A" "=E3=80=8B") )
(defun insert-pair-white-lenticular-bracket () (interactive) (insert-
pair "=E3=80=96" "=E3=80=97") )
(defun insert-pair-black-lenticular-bracket () (interactive) (insert-
pair "=E3=80=90" "=E3=80=91") )
(defun insert-pair-tortoise-shell-bracket () (interactive) (insert-
pair "=E3=80=94" "=E3=80=95") )

(defun insert-pair-fullwith-paren () (interactive) (insert-pair "=EF=BC=88"
"=EF=BC=89") )
(defun insert-pair-fullwith-bracket () (interactive) (insert-pair "=EF=BC=
"=EF=BC=BD") )
(defun insert-pair-fullwith-brace () (interactive) (insert-pair "=EF=BD=9B"
"=EF=BD=9D") )

(defun insert-pair-white-paren () (interactive) (insert-pair "=E2=A6=85"
"=E2=A6=86") )
(defun insert-pair-white-bracket () (interactive) (insert-pair "=E3=80=9A"
"=E3=80=9B") )
(defun insert-pair-white-brace () (interactive) (insert-pair "=E2=A6=83"
"=E2=A6=84") )

; aoeui keys, for major matching pairs
(global-set-key (kbd "H-a") 'insert-pair-double-curly-quote) ;=E2=80=9C=E2=
(global-set-key (kbd "H-A") 'insert-pair-single-curly-quote) ;=E2=80=98=E2=
(global-set-key (kbd "H-o") 'insert-pair-bracket)            ;[]
(global-set-key (kbd "H-O") 'insert-pair-white-bracket)      ;=E3=80=9A=E3=
(global-set-key (kbd "H-e") 'insert-pair-paren)              ;()
(global-set-key (kbd "H-E") 'insert-pair-white-paren)        ;=E2=A6=85=E2=
(global-set-key (kbd "H-u") 'insert-pair-brace)              ;{}
(global-set-key (kbd "H-U") 'insert-pair-white-brace)        ;=E2=A6=83=E2=
(global-set-key (kbd "H-i") 'insert-pair-single-angle-quote) ;=E2=80=B9=E2=
(global-set-key (kbd "H-I") 'insert-pair-double-angle-quote) ;=C2=AB=C2=BB

; q j k x keys for less used matching pairs.

(global-set-key (kbd "H-k") 'insert-pair-corner-bracket) ;=E3=80=8C=E3=80=
(global-set-key (kbd "H-K") 'insert-pair-white-corner-bracket) ;=E3=80=8E=
(global-set-key (kbd "H-j") 'insert-pair-angle-bracket)        ;=E3=80=88=
(global-set-key (kbd "H-J") 'insert-pair-double-angle-bracket) ;=E3=80=8A=
(global-set-key (kbd "H-q") 'insert-pair-black-lenticular-bracket) ;=E3=80=
(global-set-key (kbd "H-Q") 'insert-pair-white-lenticular-bracket) ;=E3=80=
(global-set-key (kbd "H-;") 'insert-pair-tortoise-shell-bracket)   ;=E3=80=

; common keys
(global-set-key (kbd "H-p") 'insert-pair-double-straight-quote)
(global-set-key (kbd "H-P") 'insert-pair-single-straight-quote)
(global-set-key (kbd "H-.") (lambda () (interactive) (insert "=3D"))) ;
(global-set-key (kbd "H-,") (lambda () (interactive) (insert "+"))) ;

i use extensively matching pair characters provided in unicode.

For example, to type80=94tortois shell bracket=E3=80=95, it's80=
=90Menu+;=E3=80=91 (which is

=E2=80=A280=88How to Define Keyboard Shortcuts in Emacs=E3=80=89

=E2=80=A280=88Emacs Hyper and Super Keys=E3=80=89

=E2=80=A280=88The Dvorak Keyboard Layout=E3=80=89

=E2=80=A280=88Matching Brackets in Unicode=E3=80=89

For linux, Pascal mentioned X11's xmodmap.


Or, this can be done on a OS level.

For Windows and Mac, see:

=E2=80=A280=88Microsoft Windows Keyboard Shortcuts=E3=80=89

=E2=80=A280=88How To Create Keybinding In Mac OS X=E3=80=89


basically, i think the difficulty with input will be solved if there's
a need. Look at Chinese input systems. Typing Chinese with its 3
thousand common characters, or perhaps 1 thousand used daily in peech,
seems impossible at first, but thru the decades people invented some
20 diverse input methods for Chinese, and today they are widely used,
typed, daily by average joe, in IM chat, blogging, etc. (you can see
them in Chinese chat network, online gaming, even tons on twitter )
(chinese being the second most used lang on the net)


the tens of Editors free or commercial can easily come up with methods
to input them. e.g. by button panel for html tag insertion (which is
typical in gui html editors such as Dreawweaver), by abbrev (builtin
in Microsoft Word, Open Office, emacs...), by OS wide hotkey or OS
wide unicode palette (Windows, Mac OS), or even auto-fix like turning
straight quote to curly ones popular in almost all word processors.
Or, double like ((this)) becomes80=94this=E3=80=95.

am thinking if the unicode brackets is really a problem, even if
psychological, we can use () or {}, or set them as equivalent. The
problem is that it means a lot escape will have to happen in the text
content whenever the text contains a paren, which is very common (e.g.
any page about programing), and if not escaped by mistake, it results
a badly malformed source code.

 Xah88=91 xahlee.org98=84

Site Timeline