Welcome, Guest. Please Login or Register
 
The Purple Parade is marching in full stride to the beat of that 'other' drum we all hear, but generally ignore. Wink
Home Help Search Login Register


Pages: 1 2 3 ... 5
Send Topic Print
Project - Table File Standard (Read 45148 times)
Nightcrawler
YaBB Administrator
*****
Online


The Dark Angel of Romhacking

Posts: 3236
USA
Gender: male
Project - Table File Standard
May 12th, 2010 at 3:13pm
 
During development of my some of my own tools, and examination of existing tools such as Cartographer, Atlas, Hexposure, a silly thought occurred to me.  Why don't I start crusading for some sort of standardization of the table file format and see what comes of it. At worst, I'd have a written standard that I follow for all of my tools. With a little luck, a few people will jump on board and we can take a small evolutionary step forward in program compatibility and table feature support. As luck would have it I did find some interested parties. The most notable being Klarth, the author of the insertion utility, Atlas. It's been tossed around for quite some time now, but it will be worth the wait! Read on!

There's currently no real standard format. There are quite a few differences from utility to utility on how line breaks, end tokens, linked entries, hiragana/katakana, control codes, bookmarks, etc. are handled in table files. I thought it would be a good idea to create a standard that can be used going forward for table files where they are interoperable without change amongst utilities. Obviously as much backwards compatibility as possible would also be a goal. However, we also need to take aim at taking an evolutionary step forward and enhancing our feature set. It's always a tough balance, but I think the result is approaching something nice. Smiley

The document acts a reference to the file format, explanation for newcomers, and tips for programmers. I don't think we've ever had anything like this on the subject before. Smiley

Any thoughts on this?

Back to top
« Last Edit: Jul 31st, 2012 at 9:27am by Nightcrawler »  

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
KaioShin
Nobleman
***
Offline



Posts: 102
Germany
Gender: male
Re: 'Standard' Table File Format
Reply #1 - Jun 3rd, 2010 at 11:04am
 
It sounds like a great idea, though I think there is one big problem: if you really wanna fix all problems of the old format mayhem it'll be impossible to maintain compatibility with the old format. At least I can't see how it would work. At the very least one of the biggest hurdles to begin with is SJIS vs Unicode. A modern table format just shouldn't use SJIS, but that's what the old tables mostly used. Though it was never specified anywhere that the file needs to be encoded in SJIS (IIRC), almost all old tools expect it in that encoding and will crash on unicode files.

Any thoughts on how to resolve that?
Back to top
 
342002603  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Online


The Dark Angel of Romhacking

Posts: 3236
USA
Gender: male
Re: 'Standard' Table File Format
Reply #2 - Jun 3rd, 2010 at 11:26am
 
Table File encoding never being specified is just half the cause of the problem. The other half is the complete disregard for encoding support in any of the utilities. Now I certainly understand why. Prior to recent times, supporting multiple encodings, especially Unicode in your utility was difficult. It's much simpler today with advancements such as .NET.

It makes sense to use UTF-8 for the table file encoding standard. In this day and age with many languages used for translation, backwards ASCII compatibility, and it being widely adopted, it seems like the obvious choice. I'm not sure it would make sense to use anything else.

However, I will say I plan to support the most common encodings in my dumper. Probably just UTF-8, S-JIS, EUC-JP, and ASCII. That should help the cause a little bit. Though there's the school of thought that the clean break is better and backwards compatibility just encourages people to hang on to the old. It's always a fine line between ushering a new standard and getting people to use your stuff so it takes off. It's probably best to start with support and take it out later or something.

Anyway, that's more for the dumper. UTF-8 makes sense to declare as the encoding of choice for the table file format. However, some might argue that the table file format doesn't need an encoding specified and it's up to the dumping/inserting utility to dictate. I think UTF-8 it is though since we need to start ensuring compatibility. You shouldn't have to alter your table every time you use a different utility to bend to it's individual will.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
KaioShin
Nobleman
***
Offline



Posts: 102
Germany
Gender: male
Re: 'Standard' Table File Format
Reply #3 - Jun 3rd, 2010 at 2:01pm
 
Nightcrawler wrote on Jun 3rd, 2010 at 11:26am:
. I think UTF-8 it is though since we need to start ensuring compatibility. You shouldn't have to alter your table every time you use a different utility to bend to it's individual will.


Couldn't agree more.

If a new format was created from scratch, have you considered making it XML based? Mandatory entries for each entry would be the table number and the value, optional additional attributes can be used for things like bookmarks. And the value field can be of any datatype so it wouldn't be a problem to put pointers, letters or strings in them. The parsing would be done completely automatically in any language with even basic XML functions, so it would be very easy to implement too. If one has to parse a textfile manually it's always a hassle to parse where one entry begins and one ends. Depending on how you programm it even an empty line can crash the parser and there are a lot of pitfalls for newbie programers. My programs usually detected the end of an entry by linebreak, but even that can cause problems for example with unix style linebreaks being different and it doesn't allow for table values that contain new lines themselves (not common, but who knows, might come in handy). XML files would basically parse themselves and allow for pretty much anything one might need.

Whatever way it'll go, to promote such a format it would be a good idea to have a reference implementation in the form of a DLL file with the most important functions. That way even people who don't even want to bother with the details of the format will be able to incorporate support into their tools easily.
Back to top
 
342002603  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Online


The Dark Angel of Romhacking

Posts: 3236
USA
Gender: male
Re: 'Standard' Table File Format
Reply #4 - Jun 3rd, 2010 at 3:36pm
 
I've updated the first post with some information on all of the available table file functions I've seen in the utilities I've recently looked at and what my thoughts on them are.

I didn't really think of XML. What would you be trying to achieve with it? See I'm approaching with the mentality that the only thing that should be in the table file are those things completely necessary for the translation from hex to text characters and vice versa. Some of the other stuff such as bookmarks don't belong in the table file in my opinion. What business does that stuff have in a hex to text translation file?

That stuff came from dumpers and inserters. And you know what? That's where it belongs, with dumpers and inserters, not in your table file. If you're going to go in that direction, the mentality is more considering the table file as a giant general purpose game task configuration file. You may as well write your notes in there too and add some assembly code. Tongue

I think you have to assign some boundaries and a purpose to what the table file is and what it should include.

Now back to XML, what ideas did you have for what you'd end up doing with it? In the event you had to make a manual change or scan the table for your own human informational or lookup purposes, using something like XML would be more cumbersome. I find myself browsing my table files often for various reasons.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
KaioShin
Nobleman
***
Offline



Posts: 102
Germany
Gender: male
Re: 'Standard' Table File Format
Reply #5 - Jun 3rd, 2010 at 5:04pm
 
I only mentioned bookmarks and stuff since you brought it up. I can see how it can be considered out of the scope.

After parsing the table file, the data in the table should be in some kind of data structure that's easily accesible right? Instead of only standardizing the physical file, why not standardize the datastructure representation too? That would make creating a reference implementation that's interchangable with other tools that use the standard even easier. And a XML file is basically just that, a physical file that also contains the datastructure information. Most languages have libraries that take a XML file as parameter and instantly give you back a tree structure for example.

For a not-so-pro programmer who is trying to create a custom dumper for his or her game, what do you think would be easier? Parsing through the textfile or parsing through a well defined data structure? I think XML would be actually easier for the programmer, but I might be wrong. I personally hate dealing with text files, there are so many annoying small pitfalls. From what you wrote above I'm not 100% sure where you draw the difference between two table entries. Kist with a newline? What about the differences between windows and unix newline conventions? I just hate dealing with that kind of stuff. With a XML file I just search the table entry for the key "0x0A" and get back whatever data was in the value field of the document. No dealing with the underlying mechanics of the file at all.

One advantage of XML are optional attributes. You could have stuff like bookmarks if you want and they'd be completely optional. They'd be parsed alongside the rest and the XML libraries will report to you if they are present or not and you can proceed in your program accordingly. Or you can just ignore them if you don't want to support them and they aren't harming you. In a raw text format you'd have to deal with them during parsing if you want to support them or not and they'd indeed quickly become unwanted baggage.
Back to top
 
342002603  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Online


The Dark Angel of Romhacking

Posts: 3236
USA
Gender: male
Re: 'Standard' Table File Format
Reply #6 - Jun 4th, 2010 at 10:42am
 
KaioShin wrote on Jun 3rd, 2010 at 5:04pm:
I only mentioned bookmarks and stuff since you brought it up. I can see how it can be considered out of the scope.

After parsing the table file, the data in the table should be in some kind of data structure that's easily accesible right? Instead of only standardizing the physical file, why not standardize the datastructure representation too? That would make creating a reference implementation that's interchangable with other tools that use the standard even easier. And a XML file is basically just that, a physical file that also contains the datastructure information. Most languages have libraries that take a XML file as parameter and instantly give you back a tree structure for example.


I can see value in a reference implementation such as Klarth's table library or something similar. But I'm not sure I'd try and dictate what the programmer should do after parsing, only suggest and provide easy example library to use or something. It would difficult to declare any type of programming standard. The data structure it goes to is up to the language you use, and what you're trying to do with it.

Quote:
For a not-so-pro programmer who is trying to create a custom dumper for his or her game, what do you think would be easier? Parsing through the textfile or parsing through a well defined data structure? I think XML would be actually easier for the programmer, but I might be wrong. I personally hate dealing with text files, there are so many annoying small pitfalls. From what you wrote above I'm not 100% sure where you draw the difference between two table entries. Kist with a newline? What about the differences between windows and unix newline conventions? I just hate dealing with that kind of stuff. With a XML file I just search the table entry for the key "0x0A" and get back whatever data was in the value field of the document. No dealing with the underlying mechanics of the file at all.


Depends on the language. Is there any XML support in C++ core library or STL? I'm not sure there is. You'd probably have to go to third party or do it yourself.

Yes, newline is currently the differentiation. In any .NET language, you can use TextReader.ReadLine(). It will work fine with newlines of Unix and Windows. I think iostream getline() works appropriately in C++ as well. You probably shouldn't be scanning for new line bytes yourself.

Regardless, how do you MAKE your table to begin with in XML? I think making an XML table would be much more difficult than it is  with simple text. Making one manually would be many times more work. You could use a table maker, but none exist yet to do the job.

Quote:
One advantage of XML are optional attributes. You could have stuff like bookmarks if you want and they'd be completely optional. They'd be parsed alongside the rest and the XML libraries will report to you if they are present or not and you can proceed in your program accordingly. Or you can just ignore them if you don't want to support them and they aren't harming you. In a raw text format you'd have to deal with them during parsing if you want to support them or not and they'd indeed quickly become unwanted baggage.


That certainly makes sense. Then you don't need much of a standard because it could be custom expanded by any utility (like it already is), but unsupported features don't get in the way and are just ignored.

I'm not sure I like that direction because in the end, don't we still end up with tables that aren't going to be very compatible between programs? Have we really done much then? I'm not sure. I guess we still have a base standard.

Next, does a table file really need that much of a data structure? Most entries are in dictionary form. Term A=Term B. Not much more to it. There's our control entries, but I'm approaching them in such a generic way, that we have very few and don't WANT to know much about them.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
KaioShin
Nobleman
***
Offline



Posts: 102
Germany
Gender: male
Re: 'Standard' Table File Format
Reply #7 - Jun 4th, 2010 at 1:03pm
 
I see where you're coming from concerning creating the table file. If it's not tool assisted it would be quite a bother...

Alright, let's stick to plain text then.
Back to top
 
342002603  
IP Logged
 
KingMike
Circle of Royalty
*****
Offline


BRAAAIIINS!

Posts: 579
Gender: male
Re: 'Standard' Table File Format
Reply #8 - Jun 5th, 2010 at 1:11pm
 
Possibly an entry specifying a dictionary entry?
Seems kinda wasteful for the table maker to have to type out
FF00=Entry1
FF01=Entry2
FF02=Entry3
when the program could probably look up the entry.
I'm thinking a value to specify like
FF=Substring,Format,NumberOfBytesInIndexValue
where format can specify if the dictionary table is constant length or not, and also how many bytes to be read to find the index (in the above example, 1 byte).
If constant length, provide the length and the value of the padding byte.
If not constant length, provide the address of the start of the table, and the termination byte value (and then the dumper can look it up).
Maybe we could specify the hex value of the initial entry in the table, too.
Or Pascal format strings (I think that's what they're called, it's when the first byte is the string length)
(in my own program, I thought of being able to find values by pointer-table entry, but then realized it's most likely the pointers would be in sequential value anyway)

Yeah, dictionary MIGHT be something for a custom dumper, but I think it's a common enough practice that it might be worth including in a standard table format.
Back to top
 
WWW 124792925  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Online


The Dark Angel of Romhacking

Posts: 3236
USA
Gender: male
Re: 'Standard' Table File Format
Reply #9 - Jun 5th, 2010 at 5:57pm
 
I'm not sure if I'm following you. Can you provide an example?

In general, the table file certainly shouldn't contain any ROM addresses, or ROM information of any kind. I'd advise a table making utility to help make generating the dictionary table entries easier.

You have to be careful with trying to turn the table file into a configuration file for a dumper. That's really not what it should be in my opinion.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Online


The Dark Angel of Romhacking

Posts: 3236
USA
Gender: male
Re: 'Standard' Table File Format
Reply #10 - Jun 11th, 2010 at 2:39pm
 
See the first post. THE FIRST DRAFT IS UP!

The document is a.) A reference to the file format, b.) explanation for newcomers, and c.) Helpful reference for programmers. I don't think we've ever had anything like this on the subject before!

Good thing I thrive on pain! This didn't help my elbow issues any! Tongue



KingMike, I've been thinking about your dictionary idea. Since you can just make a table and dump the dictionary from a game, I thought the best idea for handling dictionary is a dumper with a special mode to dump a dictionary so it can be plugged directly into a table file!

I believe I will try to add this feature to my Generic Dumper.  Should be no need to modify the file format for this. Instead, you'd just dump the dictionary and copy/paste to your new table. What do you think about THAT? Smiley
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
Next_Gen_Cowboy
Nobleman
***
Offline


I am what I am; nothing
more and nothing less

Posts: 113
Inside my own head.
Gender: male
Re: 'Standard' Table File Format
Reply #11 - Jul 9th, 2010 at 3:18pm
 
Excellent! That's all I have to add, you must have been going full throttle for a while!
Back to top
 

Sleep is like cocaine, for the brain.
nextgencowboy  
IP Logged
 
Gil Galad
Peasant
*
Offline


I Love TransCorp!

Posts: 5
Ohio, USA
Gender: male
Re: 'Standard' Table File Format
Reply #12 - Jul 14th, 2010 at 6:29am
 
Actually, I liked the Thingy table format the best. However, Qbasic just doesn't get it for me anymore. I can't get the EUC or SJIS to display as Japanese characters. I could in Windows 98 by downloading a viewer at the NJstar site.

I talked to Bongo about supporting Thingy tables in Windhex32. I'm getting the impression that some of the features are difficult to code. While I disagree that some of the features of that table file format are not needed.

For example, the table marks for Dakuten and Handukuten can reduce your table file size and time to make the table file. Thingy also has the ability to modify the byte after or before. These modifier tiles are commonly found in NES/FC/FDS games. Maybe other systems too.

I do agree that the bookmarks don't need to be in a table file specifically for dumping the text in a generic dumper. However, I still find it useful to use bookmarks in a hex editor so that I can easily view and jump to various sections of the ROM.
Back to top
 
WWW Klingo1000  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Online


The Dark Angel of Romhacking

Posts: 3236
USA
Gender: male
Re: 'Standard' Table File Format
Reply #13 - Jul 14th, 2010 at 9:44am
 
Gil Galad wrote on Jul 14th, 2010 at 6:29am:
Actually, I liked the Thingy table format the best. However, Qbasic just doesn't get it for me anymore. I can't get the EUC or SJIS to display as Japanese characters. I could in Windows 98 by downloading a viewer at the NJstar site.


If you want to talk about Thingy specifically, is there anything you didn't specifically mention below? Generally, everything applicable from Thingy made it in, and better implemented at that in most cases.

Quote:
I talked to Bongo about supporting Thingy tables in Windhex32. I'm getting the impression that some of the features are difficult to code. While I disagree that some of the features of that table file format are not needed.


It's not really about if they're needed or useful, rather do they belong and can they fit within the goals we need to meet. I've gone into detail on the specific two issues in question below.


Quote:
For example, the table marks for Dakuten and Handukuten can reduce your table file size and time to make the table file. Thingy also has the ability to modify the byte after or before. These modifier tiles are commonly found in NES/FC/FDS games. Maybe other systems too.


This is bad on many levels. This ruins all abstraction. It ruins language independence. It requires any utility utilizing a table file now be language and character aware. Difficulty of implementation increases greatly. This really goes against much of what we need to accomplish here. The table file's purpose is to map hex to text and vice versa. This dakuten/handuten forces the actual conversion to the utility. Right now, we have complete isolation. All conversion is done from the table. The utility is abstracted and doesn't need to know any language or character information beyond the initial table parse special characters.

I have a very strong disagreement with doing anything of the sort. A way to handle this situation while maintaining this abstraction level and no character dependency would be welcome. If it requires a little extra table work, that's much better than the consequences of losing abstraction and utility character independence. I just threw out some possible alternates. Really, this is a very specific case to a specific language and specific console. So, the fact that it could be done with what we have was enough for me.


Quote:
I do agree that the bookmarks don't need to be in a table file specifically for dumping the text in a generic dumper. However, I still find it useful to use bookmarks in a hex editor so that I can easily view and jump to various sections of the ROM.


It's not a matter of whether it's useful, but whether it belongs. Bookmarks are specific program settings for specific hex editors. They have nothing to do with mapping hex to text or vice versa.

The table file isn't a dumping ground for any old thing you might want to throw in it. If you think it is, we may as well store pointer information in it, the ROM filename, checksums, assembly text hacks, etc. Bookmarks belong in a game specific configuration file for the utility rather than in a table file.

The table file as it's name implies serves a single purpose. A table mapping hex to text and vice versa. Does that make sense?
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
Tauwasser
Peasant
*
Offline


Evil Impersonator

Posts: 14
Re: 'Standard' Table File Format
Reply #14 - Aug 13th, 2010 at 3:54pm
 
Nightcrawler wrote on Jun 3rd, 2010 at 11:26am:
However, I will say I plan to support the most common encodings in my dumper. Probably just UTF-8, S-JIS, EUC-JP, and ASCII.


You forgot Big5 and HKSCS, both of which certainly do belong to most common encodings. If you plan on doing that (which might be a waste, because there are very few characters from these sets not in Unicode), at least think about language-tagging of some sorts.

UTF-8 files should be required to have a BOM - which you don't specify in your document anywhere. ASCII is basically indistinguishable from UTF-8 without BOM. So are some of the other encodings.
Personally, I wouldn't guess file encodings or let the user specify. Things get mixed up and you will have to have support for determining if all characters in the file were representable in your destination encoding/codepage.

There are of course pros and cons to using XML over a plain text file. First and foremost, you won't have one-character control codes inside the table file.
Code:
@C0=3,C000 

is just not as readable as Code:
<array bytesFollowing="3" baseOffset="C000">C0</array> 

no matter how you turn it.
You could of course compensate for this with longer control codes, like Code:
array:C0=3,C000 

for instance. This decision is obviously up to you and good cases have been made in favor and opposed to the proposal of using xml files. However, just think about the comfort of using an xml-based table editor in general. From a user perspective, it would only add comfort and you yourself said we should go with the times. And the times favor taggable, extendable formats, not plain text files with hard-to-remember control sequences.

Next, I would quote your whole document in a spoiler, but this board doesn't seem to have spoilers. Generally, I would have preferred a tex file for presentation of this. Plain text documentation is somewhat not this day and age and also your figures in there become pretty non-understandable.

  • Less screaming in 2.1.
  • 2.1 is missing a description of what happens when two entries collide, that is
    • 00=Five
    • 01=Six
    • 0001=Seven
    What happens in these instances? Which string will get dumped on the byte sequence
    0x00 0x01
    ?
    "FiveSix" or "Seven"? Historically it would be "Seven", but you would have to specify this.
  • 2.1 Define what happens on illegal sequences. Preferably, these should be ignored from a technical point of view.
  • Section 2 is the only section whose caption format does not use tabs throughout. Also, it would be preferable if 2.2 were "Regular entries" and 2.3 not restricted to being "formatting".
  • For escape sequences in 2.3. Don't use /r if it isn't what /r is in most programming languages. Historically, /n is newline while /r is carriage return. Defining your own escape sequences is good, but this labeling is misleading.
    Also, I do not understand the difference between /r and /n and your example is lacking.
  • Is there a reason /r and /n have spaces after them while /t has a tab after it in the table?
  • You lack documentation how to insert / as a literal. This would ideally be through double escaping // for literal /.
    All other sequences should be invalid and should be ignored for future compatibility. Example why: TBL v1.0 doesn't support /y so some people might decide to turn this into literal / literal y. TBL v1.1 supports control sequence /y for something specific. You just broke upwards compatibility. Better to reserve all control sequences and then have nobody accidentally use them in his dumps.
  • You mix single quotes '' and double quotes "" to mean the same and different things in different parts of the document. For instance, you talk about '=' (which would be traditional notation for character) Yet (bold mine): Quote:
    [One] can use script formatting values like '\n' to do something[...]
    This should be double quoted, because you're talking about the string "\n" and not the character '\n' as a newline character in the actual table file.
  • 2.3. Make a better figure, explain "//" in dumps (is this Atlas format, etc). Make examples for all control codes.
  • 2.5. Make explicit what values are allowed as "label" [commas obviously aren't, are "formatting control codes"?. Also, you didn't give a name to the "label" part, you should. If you want to be a libertarian about the hexadecimal format, at least talk about insertion problems (the string might be representable with table entries!) and give a properly escaped sequence. If you build a reference with a bad example, many other peeps will still follow it.
  • Array entries and table switching should be combined. First off, the two are abstractly speaking the same and secondly, this seems specifically tailored to Han characters.
    Let me elaborate: Arrays can be thought of as table switching for the next N bytes and be implemented much in the same manner as table switching itself. While table switching is usually not limited by character number, it can be thought of as just being so for array entries. I therefore propose:
    • Unify the both. Drop the base offset for both, make the number of bytes optional
    • If possible, include an option to switch back to the old table once an invalid entry is recognized (i have seen games do just this).
    • Don't limit yourself to two tables. You can have more tables than that if you implement arrays as table switching.
    • See the added benefit of not having assumed a format about the array: Your format was specifically tailored for 1 byte entries where it now can have variable size and multi-byte entries just like tables. Of course the old idea is still possible, too.
    • Base offsets can be accomplished using different tables and don't assume an entry size either.
  • 3.1. assumes a dictionary format. While two bytes are common, there can be all sorts of dictionary. I would personally therefore drop the bit about how to dump them, since this seems to make normative claims when it really shouldn't be
  • If you are talking about language specific problems, explain a bit about them before. Handakuten and dakuten are not common lingo for everybody. Also, it seems fairly relevant to point to this example in 2.1 or at least 2.0
  • 4.1. add Macintosh and CR only usage and possibly check if libraries can handle them as to give good advice
  • 4.3.4 is missing a word
  • 2.3. Fix "the the"


This seems like a really good draft so far. However, you talk a whole lot about globalization efforts, yet you neglect complex script shaping.

You should also in that regard, forbid certain sequences that UTF-8 may contain, such as LTR and RTL marks.

Consider Arabic languages. We have seen some share of these translations on RHDN some time ago. Please refer to Chapter 08 of the Unicode standard and Technical Report 09, section 3.5 Shaping to get a basic understanding of what is required.

Basically, Arabic will use at least (but I'm not sure limited to) four different forms per letter depending on their position inside of words. There will usually be a need to implement these forms as different characters in NES roms for example (since nobody will probably be writing a shaping engine on NES anytime soon). There should be a way to select each of these forms in the table file somehow and give them distinct string representations.

Now, you may say that these situation could be handled by simply putting "alif4" into the script for example. However, as alif and 4 don't combine, you will possibly lose ZWJ and ZWNJ (Zero Width (Non) Joiner) in this process if you write the string by hand.

The same thing would possibly hold true for some Indian scripts such as Devanagari, so it's not - generally speaking - a language specific problem. There should be some standard way to address different glyph shapes and assign them different hexadecimal values.

If not, at least a viable approach should be given how to circumvent this issue (like in the case with Japanese).

cYa,

Tauwasser


Nightcrawler: I pulled the old click Modify instead of Reply trick. I think I recovered your full post.  Embarrassed
Back to top
« Last Edit: Aug 17th, 2010 at 11:14am by Nightcrawler »  
 
IP Logged
 
Pages: 1 2 3 ... 5
Send Topic Print
(Moderator: Nightcrawler)