Binary SignWriting
From PuddleNet
Binary SignWriting is a character encoding model for the SignWriting script.
Contents |
[edit] Status of the work
Working on Binary SignWriting revision 3
[edit] Changes
- BaseSymbols are now characters
- Symbols are BaseSymbol variants with 2 modifiers
- Structural markers are reordered
- The characters are fixed width 12 bits
[edit] Abstract Character Repertoire
[edit] Structural Markers
- left lane signbox marker
- middle lane signbox marker
- right lane sign box marker
- sequence marker
- layer start
- layer end
[edit] BaseSymbols
- 652 BaseSymbols from the ISWA 2010
[edit] Modifiers
- 6 fill modifiers
- 16 rotation modifiers
[edit] Number Characters
- -299 to 300
[edit] Coded Character Set
12-bit fixed width characters. Three hexadecimal characters used for codes. 7-bit ASCII compatible.
[edit] Structural Makers
- 0fa - left lane signbox marker
- 0fb - middle lane signbox marker
- 0fc - right lane signbox marker
- 0fd - sequence marker
- 0fe - layer start
- 0ff - layer end
[edit] BaseSymbols
- 100 to 38b for sequentially numbered BaseSymbols
[edit] Modifiers
- 38c to 391 for fill modifiers
- 392 to 3a1 for rotation modifiers
[edit] Number Characters
- 3a2 to 4cc for negative numbers -299 to -1
- 4cd for zero
- 4ce to 5f9 for positive numbers 1 to 300
[edit] Character Encoding Form
The Unicode standard will be used as the character encoding form. The codes for the set are shifted twice. First to shift the range and second to shift the plane.
[edit] Primary Shift
The primary shift moves the codes to a higher range. Each code is increased by 55,046 or D706 in hex. This shifts the 12 bit range of 0fa - 5f9 to the 16 bit range of d800 - dcff.
[edit] Secondary Shift
There are 3 potential secondary shifts that change the plane of the codes to plane 1, 15, or 16. Each shift is easy. Character d800 becomes 1d800 for plane 1, fd800 for plane 15, and 10d800 for plane 16.
If you look at the roadmap for the Supplementary Multilingual Plane, you'll see that room has been set aside for SignWriting in rows 1D8 thru 1DB. The next four rows are blank but set aside for notation systems in general. If row 1DC can be reserved for SignWriting as well, this new encoding will fit properly in the space allotted.
[edit] Character Encoding Scheme
UTF-8 is a great character encoding scheme. It's a little tricky to convert back and forth. If you know how to encode the first character of any plane, you can create a conversion for the entire plane.
- For plane 1, the first character is 010000 or %f0%90%80%80 in UTF-8.
- For plane 15, the first character is 0F0000 or %f3%B0%80%80 in UTF-8.
- For plane 16, the first character is 100000 or %f4%80%80%80 in UTF-8.
[edit] Conversion from previous BSW
A conversion from BSW 2010 to BSW revision 3 is available online. It is part of the ISWA conversion package.
[edit] Advanced Binary SignWriting
A complication to the standard model for Binary SignWriting can help standardized spelling and searching, but the increased complexity of the user interface is prohibitive at this time. The design of this option is detailed here to promote discussion.
[edit] Flat Layers
Each sign is written on its own canvas. The coordinates for each symbol are relative to other symbols. A layer introduces a break in the relation between symbol coordinates. Symbols in a sign can be directly on the canvas or placed on a layer. A layer can have multiple symbols. The layer itself has coordinates. Moving a layer moves all of the symbols on the layer.
A new token is needed
- C - Canvas layer
Defining Regular Expression
- Flat layer sign - ((Cnn([hmdftx]ionn)*C)|([hmdftx]ionn)*)*
[edit] Multiple Layers
Each sign is written on its own canvas. Both symbols and layers can be written on a canvas. A layer can contain symbols or layers. This layers within layers is probably overkill for a writing system.
Two new tokens are needed
- C - Canvas layer start
- Y - Canvas layer end
Defining a regular expression is beyond my experience. The basic idea is that a Cnn starts a layer and a Y ends a layer. Since a layer can be inside a layer, CnnCnn would be a valid option. Following the two layer starts, the first Y would end the sub layer, and a second Y would end the main layer.
[edit] Language Requirements
- Two way conversion between character code and symbol id.
- Access to glyphs - symbol image
- Creation of glyphograms - visual unit of spatially written glyphs
- Center estimate for glyphs and glyphograms
- Horizontal stacking of glyphs and glyphograms based on center
- Proper spacing between writing and punctuation
- Horizontal off center alignment for lanes
- Sorting based on sequence data, including special sorting symbols
- Searching for symbol, BaseSymbol, symbol combination, spatial arrangement, or exact sign match
- Drag and drop user interface
- Keyboarding user interface
- Special Commands for text entry

