Welcome, Guest. Please Login or Register
 
The Purple Parade is marching in full stride to the beat of that 'other' drum we all hear, but generally ignore. Wink
Home Help Search Login Register


Pages: 1 2 3 4 5
Send Topic Print
Project - Table File Standard (Read 37119 times)
KingMike
Circle of Royalty
*****
Offline


BRAAAIIINS!

Posts: 576
Gender: male
Re: 'Standard' Table File Format
Reply #15 - Aug 14th, 2010 at 9:50pm
 
As to handakuten, it seems more common for games that treat them as separate bytes, to play the dakuten AFTER the main character, not before.
So, it would be like:

60=カ
607F=ガ
Back to top
 
WWW 124792925  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Offline


The Dark Angel of Romhacking

Posts: 3219
USA
Gender: male
Re: 'Standard' Table File Format
Reply #16 - Aug 17th, 2010 at 11:12am
 
Tauwasser wrote on Aug 13th, 2010 at 3:54pm:
You forgot Big5 and HKSCS, both of which certainly do belong to most common encodings. If you plan on doing that (which might be a waste, because there are very few characters from these sets not in Unicode), at least think about language-tagging of some sorts.


It's not so much common in our community that I'm aware of. UTF-8 is the default and primary encoding. Anything I add beyond that will be an extra gift under the tree.

Quote:
UTF-8 files should be required to have a BOM


Due to the fact that BOM is optional in many applications that work with UTF-8 text files and 95% of the userbase won't even know what a BOM is (many don't even know what UTF-8 is), I do not agree with requiring this. One aim of the whole thing is also as much backward compatibility as possible while still moving forward. Compatibility with older ASCII tables even if technically 'wrong' is desirable. Asking too much is a sure way to get our community to do nothing at all. Baby steps. Wink

Quote:
There are of course pros and cons to using XML over a plain text file.


There's certainly many benefits to XML. However, the aim of this was to standardize what we have, move us forward, but still maintain as much compatibility as we can with existing table files and utilities. And again, I don't think a clean break to something completely incompatible and new works in our community. If we were to ever move to XML, I'd suggest doing so with a converter built into applications that could convert from the old to the new format.

Quote:
Generally, I would have preferred a tex file for presentation of this. Plain text documentation is somewhat not this day and age and also your figures in there become pretty non-understandable.


There's something to be said for a UTF-8 plain text document defining a file format using UTF-8 plain text.  Can't say I know much about TeX. Seems unnecessary for this document. PDF in general is a future option for bookmarks and nicer presentation. Also, there's the time factor. I probably won't want to do the rework for another format.

Good list here. Many items were addressed. A few remain for discussion.

  • Less screaming in 2.1.
  • 2.1 is missing a description of what happens when two entries collide, that is
    • 00=Five
    • 01=Six
    • 0001=Seven
    What happens in these instances? Which string will get dumped on the byte sequence
    0x00 0x01
    ?
    "FiveSix" or "Seven"? Historically it would be "Seven", but you would have to specify this.
  • 2.1 Define what happens on illegal sequences. Preferably, these should be ignored from a technical point of view.
    Shouldn't the decision to generate error or ignore be that of the utility and not the file format?
  • Section 2 is the only section whose caption format does not use tabs throughout. Also, it would be preferable if 2.2 were "Regular entries" and 2.3 not restricted to being "formatting".
    1. not sure what you mean by the caption format. 2. Normal/Regular are synonyms. Either would be OK, but I think 'normal' is slightly more appropriate here as it applies to a standard. 3. Agreed.
  • For escape sequences in 2.3. Don't use /r if it isn't what /r is in most programming languages. Historically, /n is newline while /r is carriage return. Defining your own escape sequences is good, but this labeling is misleading.
    Agreed. It is like this for compatibility with Cartographer, ROMJuice, and Atlas. Will discuss with other utility authors.

    Also, I do not understand the difference between /r and /n and your example is lacking.
  • Is there a reason /r and /n have spaces after them while /t has a tab after it in the table?
  • You lack documentation how to insert / as a literal. This would ideally be through double escaping // for literal /.
    All other sequences should be invalid and should be ignored for future compatibility. Example why: TBL v1.0 doesn't support /y so some people might decide to turn this into literal / literal y. TBL v1.1 supports control sequence /y for something specific. You just broke upwards compatibility. Better to reserve all control sequences and then have nobody accidentally use them in his dumps.
  • You mix single quotes '' and double quotes "" to mean the same and different things in different parts of the document. For instance, you talk about '=' (which would be traditional notation for character) Yet (bold mine): Quote:
    [One] can use script formatting values like '\n' to do something[...]
    This should be double quoted, because you're talking about the string "\n" and not the character '\n' as a newline character in the actual table file.
  • 2.3. Make a better figure, explain "//" in dumps (is this Atlas format, etc). Make examples for all control codes.
  • 2.5. Make explicit what values are allowed as "label" [commas obviously aren't, are "formatting control codes"?. Also, you didn't give a name to the "label" part, you should. If you want to be a libertarian about the hexadecimal format, at least talk about insertion problems (the string might be representable with table entries!) and give a properly escaped sequence. If you build a reference with a bad example, many other peeps will still follow it.
    Good point on being representable by table entries. Will discuss.
  • Array entries and table switching should be combined. First off, the two are abstractly speaking the same and secondly, this seems specifically tailored to Han characters.
    Let me elaborate: Arrays can be thought of as table switching for the next N bytes and be implemented much in the same manner as table switching itself. While table switching is usually not limited by character number, it can be thought of as just being so for array entries. I therefore propose:
    • Unify the both. Drop the base offset for both, make the number of bytes optional
    • If possible, include an option to switch back to the old table once an invalid entry is recognized (i have seen games do just this).
    • Don't limit yourself to two tables. You can have more tables than that if you implement arrays as table switching.
    • See the added benefit of not having assumed a format about the array: Your format was specifically tailored for 1 byte entries where it now can have variable size and multi-byte entries just like tables. Of course the old idea is still possible, too.
    • Base offsets can be accomplished using different tables and don't assume an entry size either.
    I like it, however I think there is difficulty in implementation or at least more to define. What if you have two table switches in the same table? One for a kanji array, one for a hiragana/katakana switch. How do you define that? How do you see the syntax for those cases?
  • 3.1. assumes a dictionary format. While two bytes are common, there can be all sorts of dictionary. I would personally therefore drop the bit about how to dump them, since this seems to make normative claims when it really shouldn't be
  • If you are talking about language specific problems, explain a bit about them before. Handakuten and dakuten are not common lingo for everybody. Also, it seems fairly relevant to point to this example in 2.1 or at least 2.0
  • 4.1. add Macintosh and CR only usage and possibly check if libraries can handle them as to give good advice
  • 4.3.4 is missing a word
  • 2.3. Fix "the the"


Quote:
This seems like a really good draft so far. However, you talk a whole lot about globalization efforts, yet you neglect complex script shaping.


I really don't know enough about Arabic languages, script shaping, or the UTF-8 intricacies associated with them. I also don't intend to take up that course of study. If you want to write up something appropriate that should be included in the document, I can look at it for potential inclusion.

I would probably say I wouldn't like to require any more work for utility creators for special support of Arabic languages. We're already asking a lot and pushing whether or not anyone will ever adopt this. And I'd certainly like to keep the abstraction level we have of never having to look at individual text characters after the initial table parse (even then it's limited).
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Offline


The Dark Angel of Romhacking

Posts: 3219
USA
Gender: male
Re: 'Standard' Table File Format
Reply #17 - Aug 17th, 2010 at 11:13am
 
KingMike wrote on Aug 14th, 2010 at 9:50pm:
As to handakuten, it seems more common for games that treat them as separate bytes, to play the dakuten AFTER the main character, not before.
So, it would be like:

60=カ
607F=ガ


Handled the same I think. Updated document to reflect.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
DaMarsMan
Nobleman
***
Offline



Posts: 163
Re: 'Standard' Table File Format
Reply #18 - Aug 17th, 2010 at 2:44pm
 
Hmmm I don't have many thoughts but here they are.

2.1 Encoding: UTF8 only. Moving away from the older utilities is a must. We should push the best utilities to update their source (WindHex and others). We can't let old standards hold back the community!!!! I think we both agree here.   Grin

2.2 Normal Entries: I would include something about priorities in here. (I don't know if I missed it) I know it's up to the inserter but I think Atlas does a good job with the way it does it and it is probably worth mentioning. For an explanation see this thread.
http://www.romhacking.net/forum/index.php/topic,8108.0.html

2.3 Control Codes: I don't get what is going on with "\r" here. Shouldn't someone who wanted comments after just use "\n//"?

FE=<linebreak>\n//
/FF=<end>\n\n\n//

Shouldn't this produce the same thing?

2.7      Dual Table Files: I agree with Tauwasser here. Let's not limit it to two. I've had cases where I needed multiple table files.

What if we had another table format like maybe "tbp" that was a pack of multiple tables and was basically a table but had multiple tables in it? You could use something like "TABLE=English" or something to divide them up. This could be a cool feature to load one table pack into a hex editor and flip between them. Or, load one table file into an Atlas script and jump between different parts of the table (Can be useful for inserting original Japanese on untranslated parts). It's just an idea to extend it a bit.

That's really all I got. I don't think we really need documentation for other languages. The more complex this standard gets, the less likely people will be wanting to implement it into programs. I like the part about the Japanese because the majority of games come from Japan and it makes sense. Arabic games and games with strange encoding should probably have custom inserters.
Back to top
 

Dragon Quest 5 for ps2 hacker/translator....&&
WWW  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Offline


The Dark Angel of Romhacking

Posts: 3219
USA
Gender: male
Re: 'Standard' Table File Format
Reply #19 - Aug 17th, 2010 at 5:38pm
 
DaMarsMan wrote on Aug 17th, 2010 at 2:44pm:
Hmmm I don't have many thoughts but here they are.

2.1 Encoding: UTF8 only. Moving away from the older utilities is a must. We should push the best utilities to update their source (WindHex and others). We can't let old standards hold back the community!!!! I think we both agree here.   Grin

2.2 Normal Entries: I would include something about priorities in here. (I don't know if I missed it) I know it's up to the inserter but I think Atlas does a good job with the way it does it and it is probably worth mentioning. For an explanation see this thread.
http://www.romhacking.net/forum/index.php/topic,8108.0.html


Agreed on both. I'm not sure I understand Klarth's comment in the topic you provided. But, longest entry will take preference and handle that situation. It's the same for the hex side.

Table:
00=Five
01=Six
0001=Seven

If a byte sequence 0x00 0x01 is encountered, the string "Seven" should be mapped as the result and not any other combination regardless of table order.

Table:
12=Five
13=Six
0001=FiveSix

If the text 'FiveSix' is encountered, it will map as byte sequence $00 $01 regardless of the order it appears in the table.

That's the desired way functionally, my preference, and the way I understand Atlas is supposed to do it based on the code.

Quote:
2.3 Control Codes: I don't get what is going on with "\r" here. Shouldn't someone who wanted comments after just use "\n//"?

FE=<linebreak>\n//
/FF=<end>\n\n\n//

Shouldn't this produce the same thing?


I would think so, yes. Blame that on Cartographer. That's what it does. The redundancy didn't dawn on me. That should simplify escape codes to nothing but line breaks with the standard "\n" value.

Quote:
2.7      Dual Table Files: I agree with Tauwasser here. Let's not limit it to two. I've had cases where I needed multiple table files.

What if we had another table format like maybe "tbp" that was a pack of multiple tables and was basically a table but had multiple tables in it? You could use something like "TABLE=English" or something to divide them up. This could be a cool feature to load one table pack into a hex editor and flip between them. Or, load one table file into an Atlas script and jump between different parts of the table (Can be useful for inserting original Japanese on untranslated parts). It's just an idea to extend it a bit.

That's really all I got. I don't think we really need documentation for other languages. The more complex this standard gets, the less likely people will be wanting to implement it into programs. I like the part about the Japanese because the majority of games come from Japan and it makes sense. Arabic games and games with strange encoding should probably have custom inserters.


I'd still like to hear of syntax on this table switch idea and details on how it would be implemented and operate based on my questions in green on the list. We don't want to get carried away and don't want to force a complicated implementation on the programmer.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
DaMarsMan
Nobleman
***
Offline



Posts: 163
Re: 'Standard' Table File Format
Reply #20 - Aug 17th, 2010 at 5:51pm
 
Nightcrawler wrote on Aug 17th, 2010 at 5:38pm:
Agreed on both. I'm not sure I understand Klarth's comment in the topic you provided. But, longest entry will take preference and handle that situation. It's the same for the hex side.

Table:
00=Five
01=Six
0001=Seven

If a byte sequence 0x00 0x01 is encountered, the string "Seven" should be mapped as the result and not any other combination regardless of table order.

Table:
12=Five
13=Six
0001=FiveSix

If the text 'FiveSix' is encountered, it will map as byte sequence $00 $01 regardless of the order it appears in the table.

That's the desired way functionally, my preference, and the way I understand Atlas is supposed to do it based on the code


This isn't really what I was getting at... Take a look at this scenario.

12=Five
13=Six
00=FiveSix

Here we don't have a longest hex string... Should this produce 1213 or 00. I believe that according to that thread, Atlas would output it as 1213 because of the order of the table.

Back to top
 

Dragon Quest 5 for ps2 hacker/translator....&&
WWW  
IP Logged
 
Tauwasser
Peasant
*
Offline


Evil Impersonator

Posts: 14
Re: 'Standard' Table File Format
Reply #21 - Aug 18th, 2010 at 7:45am
 
Nightcrawler wrote on Aug 17th, 2010 at 11:12am:
Due to the fact that BOM is optional in many applications that work with UTF-8 text files and 95% of the userbase won't even know what a BOM is (many don't even know what UTF-8 is), I do not agree with requiring this.


While this does preserve ASCII compatibility (as ASCII can be interpreted as UTF-8 in ANSI format), it shouldn't be forbidden to use a BOM and would also immensely help identifying encoding in case you want to support other encodings. At least UTFs are uniquely identifiable with their BOMs.

Quote:
There's certainly many benefits to XML. However, the aim of this was to standardize what we have, move us forward, but still maintain as much compatibility as we can with existing table files and utilities. And again, I don't think a clean break to something completely incompatible and new works in our community. If we were to ever move to XML, I'd suggest doing so with a converter built into applications that could convert from the old to the new format.


Did you at least think about my proposal of allowing longer sequences separated by a colon before the hexadecimal? Just saying that I do prefer "linebreak:FF" instead of "\FF".

Quote:
There's something to be said for a UTF-8 plain text document defining a file format using UTF-8 plain text.


This isn't the issue. Of course a plain-text-like presentation is good and can be kept with LaTeX as well.

Quote:
Seems unnecessary for this document. PDF in general is a future option for bookmarks and nicer presentation. Also, there's the time factor. I probably won't want to do the rework for another format.


Your figures and explanations would benefit greatly from this. So it doesn't seem unnecessary to me.

Quote:
  • 2.1 Define what happens on illegal sequences. Preferably, these should be ignored from a technical point of view.
    Shouldn't the decision to generate error or ignore be that of the utility and not the file format?


You define what is supposed to happen for the sequences you do talk about, so why can't you make explicit what of these should happen if a control sequences such as "\<byte>" is encountered more than once?
You describe how tools should behave in there anyway, so why don't make a standard that tells tool makers how to handle incorrect data in a standard fashion so that a reference implementation will always be what users get from every tool.
This does not mean that tool makers cannot give the users dialogs to choose what he wants or alter behavior. It just means when the user presses the "dump it anyway biatch" button, that he'll get what you're talking about in the document.

Quote:
  • Section 2 is the only section whose caption format does not use tabs throughout.
    1. not sure what you mean by the caption format.

[URL=http://img36.imageshack.us/i/inconsistentheaders.png/][/URL]

(Done with Paint.NET for your Lulz)

Quote:
  • For escape sequences in 2.3. Don't use /r if it isn't what /r is in most programming languages. Historically, /n is newline while /r is carriage return. Defining your own escape sequences is good, but this labeling is misleading.
    Agreed. It is like this for compatibility with Cartographer, ROMJuice, and Atlas. Will discuss with other utility authors.


Quote:
Quote:
2.3 Control Codes: I don't get what is going on with "\r" here. Shouldn't someone who wanted comments after just use "\n//"?

FE=<linebreak>\n//
/FF=<end>\n\n\n//

Shouldn't this produce the same thing?


I would think so, yes. Blame that on Cartographer. That's what it does. The redundancy didn't dawn on me. That should simplify escape codes to nothing but line breaks with the standard "\n" value.


I believe this is dealt with.

Quote:
  • Array entries and table switching should be combined. First off, the two are abstractly speaking the same and secondly, this seems specifically tailored to Han characters.
    Let me elaborate: Arrays can be thought of as table switching for the next N bytes and be implemented much in the same manner as table switching itself. While table switching is usually not limited by character number, it can be thought of as just being so for array entries. I therefore propose:
    • Unify the both. Drop the base offset for both, make the number of bytes optional
    • If possible, include an option to switch back to the old table once an invalid entry is recognized (i have seen games do just this).
    • Don't limit yourself to two tables. You can have more tables than that if you implement arrays as table switching.
    • See the added benefit of not having assumed a format about the array: Your format was specifically tailored for 1 byte entries where it now can have variable size and multi-byte entries just like tables. Of course the old idea is still possible, too.
    • Base offsets can be accomplished using different tables and don't assume an entry size either.
    I like it, however I think there is difficulty in implementation or at least more to define. What if you have two table switches in the same table? One for a kanji array, one for a hiragana/katakana switch. How do you define that? How do you see the syntax for those cases?


You are right, of course. There is more to define. I like the following approach:

Have each table have a unique (within the set of used tables) id number.

Code:
id:TBL001 



You would then only need to associate each table with this ID and on switch commands tell which table to switch to.

Within TBL001:
Code:
switchTable:F8=TBL002 



Within TBL002:
Code:
switchTable:F8=TBL001 



(elaborating on your Hiragana/Katakana example). As you see, this would be pretty redundant (albeit doable) for two tables only. So I suggest, the table code could be optional for the simplified case of two tables. (And can be disambiguated at runtime by the dumper quite easily).

If you load more tables without markings, the dumper might decide what to do and for instance, prompt the user into which table each code should switch.

For the kanji arrays, this would simplyfy to the following (again, your example elaborated upon):

In TBL001:
Code:
switchTable:C0=KANJI001
switchTable:C1=3, KANJI002 



In KANJI001:
Code:
switchTable:XX=TBL001 



In KANJI002 you don't need a code, since it changes back after 3 table matches (not bytes) regardless (to the last used table). This would also mean, that you can set up for circular table changing, TBL001 --> KANJI001 --> KANJI002 --> KANJI001 --> TBL001 --> KANJI002 --> KANJI001 etc. if you should wish for that.

The kanji tables themselves can be easily computed from the current tables, that have your offset of 0xC000 added to their entries.
Notice also how there is no need for offsets of any kind.
To further simplify this process, one might designate a length of 0 matched table entries to mean "change back on first table entry not found in table".

In HIRA:

Code:
00=あ
01=い
02=う
03=HIRO
switchTable:F8=0,KATA
switchTable:F9=0,KANJI 



In KATA:
Code:
00=ア
01=イ
02=ウ
switchTable:F8=0,HIRA
switchTable:F9=0,KANJI 



In KANJI:

Code:
00=亜
01=意
 



You would start in table HIRA (By load order for instance, or by designation by user. This is a choice of the tool.)

0xF8 0x00 0x01 0x02 0xF9 0x01 0x03

HIRA --> KATA --> ア --> イ --> ウ --> KANJI --> 意 --> 0x03 fallback to KATA --> 0x03 fallback to HIRA --> HIRO.

This would make for a sound table switching routine that is a farily easy-to-implement data table search in most programming languages (including .NET and C++ with STL).

Quote:
I really don't know enough about Arabic languages, script shaping, or the UTF-8 intricacies associated with them. I also don't intend to take up that course of study. If you want to write up something appropriate that should be included in the document, I can look at it for potential inclusion.


Sadly I'm not sure myself. I just know that traditional methods don'T work quite well and it needs a lot of custom tools to produce the right output in the end, mostly replacing stuff with numbers etc while propagating shaping differences along... It's really unsatisfactory.

Also, this could potentially be used for kerning pairs in VWFs. At least I have implemented a few myself, but usually they will be explicit kerning pairs, that is they use two different byte representations for the same gfx data. This could be made available in a table format, I think.

So "VA" might produce different output than "VX" based on kerning - just like in so many modern fonts Cheesy

cYa,

Tauwasser
Back to top
 
 
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Offline


The Dark Angel of Romhacking

Posts: 3219
USA
Gender: male
Re: 'Standard' Table File Format
Reply #22 - Aug 18th, 2010 at 7:48am
 
What I said is all still true. That's not a different scenario. When mapping in the direction of text to hex, you have a longest TEXT string key. When mapping in the direction of hex to text the longest HEX string key.

Table order is immaterial. Using your example, string "'FiveSix" would be output as $00.

Make sense?

Atlas is supposed to work the same way with 'longest' keys according to the source code. See Table.h/Table.cpp.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
DaMarsMan
Nobleman
***
Offline



Posts: 163
Re: 'Standard' Table File Format
Reply #23 - Aug 18th, 2010 at 4:26pm
 
Okay gotcha.  Cheesy
Back to top
 

Dragon Quest 5 for ps2 hacker/translator....&&
WWW  
IP Logged
 
Gil Galad
Peasant
*
Offline


I Love TransCorp!

Posts: 5
Ohio, USA
Gender: male
Re: 'Standard' Table File Format
Reply #24 - Aug 21st, 2010 at 5:26am
 
I just read through all this. This new format could get pretty complicated, and I advise against that. But I guess some things would be needed to handle many different PC platforms and languages.

I just don't really have much to say about it right now, though. Perhaps I would know more once this theory is put into practice just to see how it works.

All the endcoding details is slightly confusing to me. Before, I would just use SJIS or EUC and not worry about anything else. Because it's simple and practical (at least for me).

I think that Cartographer had a good idea for table file switching. You have different sections for dumping various areas of the ROM, if I remember correctly, you can assign different table files for each section you want to dump.

Back to top
 
WWW Klingo1000  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Offline


The Dark Angel of Romhacking

Posts: 3219
USA
Gender: male
Re: 'Standard' Table File Format
Reply #25 - Aug 23rd, 2010 at 11:36am
 
Tauwasser wrote on Aug 18th, 2010 at 7:45am:
While this does preserve ASCII compatibility (as ASCII can be interpreted as UTF-8 in ANSI format), it shouldn't be forbidden to use a BOM and would also immensely help identifying encoding in case you want to support other encodings. At least UTFs are uniquely identifiable with their BOMs.


Absolutely. BOM should be allowed and encouraged. It's just not required. I will add a line about this.

Quote:
Did you at least think about my proposal of allowing longer sequences separated by a colon before the hexadecimal? Just saying that I do prefer "linebreak:FF" instead of "\FF".


Nothing wrong with the idea. I brought it up with Klarth, but I think we're going to end up keeping single special character parsing for ease of programming implementation and back compatibility.

Quote:
Your figures and explanations would benefit greatly from this. So it doesn't seem unnecessary to me.


Klarth said he may pretty it up into PDF. I just don't have the motivation now... Maybe motivation will come back after some has passed, but document formatting isn't fun for me.

Quote:
You define what is supposed to happen for the sequences you do talk about, so why can't you make explicit what of these should happen if a control sequences such as "\<byte>" is encountered more than once?
You describe how tools should behave in there anyway, so why don't make a standard that tells tool makers how to handle incorrect data in a standard fashion so that a reference implementation will always be what users get from every tool.
This does not mean that tool makers cannot give the users dialogs to choose what he wants or alter behavior. It just means when the user presses the "dump it anyway biatch" button, that he'll get what you're talking about in the document.


I see your point. I've run it by Klarth. So far duplicate entries, empty entries, and invalid syntax should generate an error.

Quote:
(Done with Paint.NET for your Lulz)


Got it. Will fix in next revision. Grin

Quote:
I believe this is dealt with.

Yes, but because we only have one now ("\n"), we may 'cheap out' on this and not reserve all escape sequences or allow literal "\n" to be in the script for ease of implementation. It's 'wrong' to do this, but requiring full escape code parsing and handling just for this is probably too much on the programmer. We're probably already pushing our luck with what we have if anyone is going to adopt this outside my small group.

Quote:
You are right, of course. There is more to define. I like the following approach:


I like it. I think it's a pretty powerful feature that could cover different implementations of kanji/han, Handakuten/Dakuten, hiragana, katakana, and even dictionary. There's a relatively low trade-off in programming complexity, however it may still be a hard sell. I've pointed Klarth to your example to see if I can get him to agree. (He didn't like the Kanji Array entry to begin with.)

Quote:
Also, this could potentially be used for kerning pairs in VWFs. At least I have implemented a few myself, but usually they will be explicit kerning pairs, that is they use two different byte representations for the same gfx data. This could be made available in a table format, I think.

So "VA" might produce different output than "VX" based on kerning - just like in so many modern fonts Cheesy


You can already handle this with the table format using explicit pairs like 12="VA", right? If so, that's probably good enough. The idea can be put in a locker until we're ready for the next generation table format in which more complicated scenarios would be on the table.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Offline


The Dark Angel of Romhacking

Posts: 3219
USA
Gender: male
Re: 'Standard' Table File Format
Reply #26 - Sep 16th, 2010 at 2:57pm
 
OK, a new draft is up! I've gone through everything here to date including my conversations with Klarth and updated accordingly. It probably needs a bit of editing, but content wise it's all there. The only other thing I may add is specifying what to do when an entry is not found. Specify how hex output should look.

Some items addressed:
  • Byte Order Mark (BOM)
  • Hex/Text Collisions
  • Illegal Sequences/Syntax Error
  • Table Switching Section
  • Edited common situations to include table switching
  • Edited format control to "\n" only.


I admit, it's starting to get a bit unruly in straight text format. Hard to keep consistency and readability. I will likely end up slapping in Word and making a PDF. Although it would be nice if someone else would do that part and pretty it up.

So, any other thoughts or suggestions?
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
DaMarsMan
Nobleman
***
Offline



Posts: 163
Re: 'Standard' Table File Format
Reply #27 - Sep 20th, 2010 at 9:53am
 
Okay... I've thought about the \n and here is my concern. Let's say you are dumping Japanese text and you need it commented.

Code:
//Japanese text here.<line>
//Japanese text here.<end>
 



You could do something like.
FE=<line>\n//

However, your text would output like this...


Code:
Japanese text here.<line>
//Japanese text here.<end>
 



Keep in mind on the end tag you could do...

FF=<end>\n\n//

That would fix it for every single entry besides the first. It's not too much trouble to go in and add the first // after the dump. Maybe you want something like this though...

FE=<line>\n\c{//}
FF=<end>\n\n\c{//}

Here I propose a comment system that says to add a comment to the beginning of that line for dumping and allows the user to specify which comment style to use.

I believe you have discussed before on keeping this up the the actual script dumper. That is certainly an option and I could see how this sort of thing can cause a problem. However, if you are keeping it up to the script dumper maybe \n should be removed too. Where should the line be drawn?
Back to top
 

Dragon Quest 5 for ps2 hacker/translator....&&
WWW  
IP Logged
 
Nightcrawler
YaBB Administrator
*****
Offline


The Dark Angel of Romhacking

Posts: 3219
USA
Gender: male
Re: 'Standard' Table File Format
Reply #28 - Sep 21st, 2010 at 10:37am
 
Yes, I don't believe the table file needs to have any concept of what a comment system is. Keep the abstraction. It's also an extremely specific post processing behavior that would occur after table mapping. With your idea you'd dump and hit an end token. Then depending on what it was have to go back to the beginning of the string and comment each line accordingly. I think everything we have now can be done in the mapping stage and not interrupt forward flow.

I would agree with removing it entirely as controls don't necessarily belong in the table file either by pure definition. However, a line break is really just a character that can be used in any table line. It is arguably part of the mapping (for dumping anyway). Also, we have charted a bit into gray area with our table switching, and linked entries, and \n in order to provided a high level, standard simple solution to things that are in every script. So, we have encroached on how things should be dumped or inserted with this, but the benefits of standardizing these things and flexibility of the solutions presented outweigh the cons.

With that said, what we have does keep a very high abstraction level regardless, and I would like to be absolute about keeping it.


Solution:

I ran into this issue developing my utility that implements this standard. I believe the solution to this is extremely simple. Any dumper worth anything would have some sort of header or template ability. Mine is user defined (allowing use of some variables) and would be output at the top all dumped files.

Example:
Quote:
//Game Name:    GameNameHere
//Source File: $file
//Block:     $block
//Block Range:   $start - $stop
//$text


So, as you can see that takes care of the issue.

Even a 'cheap' dumper could just output a single comment character or no comment character based on option to be compatible.

It seems like you'd either go in that direction or go in the direction of removing it entirely. However, I'd rather that no remove it entirely as line breaks can be used in any table entry, control code, end token etc. It is part of the mapping if you look at it like that.
Back to top
 

ROMhacking.net - The central hub of the ROM hacking community.
WWW  
IP Logged
 
DaMarsMan
Nobleman
***
Offline



Posts: 163
Re: 'Standard' Table File Format
Reply #29 - Sep 22nd, 2010 at 10:52am
 
I can see what you mean about making an exception for something that is almost always needed.

I would say that the best approach would probably be to have dump configuration file standards. I can see how you would have a problem with something like this though...

FE=\n<line>\n//

In this case the dump controls have to be mixed in with the table to get the proper output. Maybe if there were a dump configuration file you could just have an overwrite table character function for the control characters. With our current method, viewing these controls in a hex editor can get kind of nasty when every instance of FE is shown as the string above... An external, separate configuration file could solve some of these issues.
Back to top
 

Dragon Quest 5 for ps2 hacker/translator....&&
WWW  
IP Logged
 
Pages: 1 2 3 4 5
Send Topic Print
(Moderator: Nightcrawler)