No announcement yet.

Parsing Delimiters (theory)

  • Filter
  • Time
  • Show
Clear All
new posts

  • #76
    More RegEx

    I'm kinda stuck on a RegEx at the moment so, I think I will give you a tutorial on it. Generally when I ramble on in this thread it actually turns into something on my end. If you don't want to learn RegEx, don't read any further.

    I'd like to start this off with the fact that I technically don't know shit about RegEx. I approach RegEx the same way I approach .bat files. I learn as much as I need to know to get my current goal accomplished and no more.

    RegEx is short for Regular Expression. I don't know who came up with that name but, it is incredibly deceiving. It's sounds easy, it's not. It's some confusing, annoying crap that drives me gabonkers.

    RegEx is a very symbolic language. The same symbol can mean a lot of things depending on how it's used. Here is a for instance.

    Now you may be looking at that thinking "How is that a versus? They are totally different." Well, that's because I haven't explained it yet. Hold your horses . In the first example the parenthesis represent a group. The regex engine will look for the first instance of :{. The second example the parenthesis are more like an if statement. It's technically called a lookahead. What it does is match as many letter characters as are in a row until it reaches some other character, then it checks to see if that other character is a colon, if it is, it returns the word (without colon). If not, it's null.

    Another example of characters having multiple uses is question mark. The question mark in the above example is checking if something exists on a true/false level. The question mark in the bottom example is signifying that everything in parenthesis before it might not exist... and that's ok, look for the next stuff.

    the above would find word: or word[0]:

    That's a little bit of the complex stuff. Let's look at some simpler stuff. All along I have been throwing a chunk of the alphabet between some brackets and I never explained it.

    the above regEx will find one letter, case insensitive, between and including a to Z. By adding a plus sign after, it will go to the first letter and then get all the letters that come after it, until it hits something that isn't a letter.

    There are more symbols we can add here but, I don't really understand them and every time I add them into my regEx it breaks everything. For shits and giggles here's something you can do, that I don't get.

    supposedly that is to get a whole word. When I add those symbols I get null and when I get rid of them I get a whole word...ooooooh wait, hmm I think I just realized something but the shit doesn't ever work so it doesn't really matter.

    lookahead not true - only get the word if it isn't followed by a colon
    lookbehind - can't get this to work for me. This was my AHA! earlier in this post. Maybe if I use the whole word symbols this will work because, it will look behind the whole word, as opposed to the last letter, which is what I fear is really happening.

    Only get words that do not have a comma before them
    only get words that do have a comma before them
    There are shorthand expressions as well

    \d = 1 digit
    \d+ = as many digits as there are in a row
    \w = supposed to be word but it never works for me
    \w+ = maybe this means get as many words as there are in a row, I'm just making that up though so, maybe not.
    \s = space
    \r = return
    \t = tab
    \n = new line

    \b = word boundary. used like this
    word boundary has to match word characters that come before or after non word characters. Let's take the next sentence and apply it to it.

    "This is an island" - word boundary will only match the word "is", because the "is" in "this" and "island" have word characters on both sides of one of the boundaries.

    [space]\bis\b[space] = yes cause space is not a word character
    Th\bis\b[space] = the last part is good but the first part is bad so this will not match
    [space]\bis\bland - opposite of above

    note: the above 3 lines are not a regEx. It is an example of how the regEx will fall in the string and why it will or will not work.

    Anyway, I told you I don't really know regEx and now you know as much as I do. Oh wait, no you don't.

    We can get every instance of a pattern with the global flag

    that pattern will return an array of every time :{ appears in the string. Whereas having an array like this - array(":{",":{",":{") may seem completely useless. That count is actually the backbone of my system for finding the proper close delimiter. Also, having a regex spit the count back to you (via array length) is way faster than counting them in a loop.

    I hope you enjoyed my pinche' tutorial. Here is the site that I use to learn regEx. It is very thorough. [edit: I just bought his book. 4.99 donation for what is "arguably the most comprehensive regEx information on earth". I've been using his sites for little regEx's for years. He deserves $5 even if I never download his book. I also get to use his site without ads. I'll be honest, I never noticed his ads til he told me I could read the site without them. I think my brain works better than adblocker.]
    Last edited by MadGypsy; 11-07-2013, 10:32 PM.


    • #77
      I finally got back to the part that made me realize why I was sequestering arrays and objects that were a child of an array. I had forgotten why I did this. The reason turns out to be, in order to figure out what index the array/object child falls on in the parent array. Since objects and arrays use CSV's, counting preceeding commas is "impossible" if I don't replace the child with a token and then count once I've turned the parent array into a list of tokens.

      I realise why I did it, but I don't like that method. My new parser focuses on being "instant", direct. Having this sequester, tokenize, process later and reassemble system is not gonna fly. So, what do I do? The answer is pretty simple. I change this regex


      to this


      Now, what is simply removing the comma from the regex going to do? Well, after a find tail and trim operation on the returned regex and piecing the entire capsule together, the comma won't be present in the "snapshot", this means it won't be removed. Basically instead of tokenizing I just leave a bunch of "empty" commas to count. Then I add an extra condition for wiping the string when it becomes all commas. This can be accomplished while simultaneously creating the child and the comma count that preceeds the currently processed string becomes the index for that object/array on it's parent.

      This wipes like 100 lines of "prep" from my code and gets replaced with like 5. This also puts the assignment process where it belongs as opposed to a quick reconsideration before the very end. This doesn't mean that my code will be even shorter than what I previously said. This is just proving that I have some weird code ESP cause I said I would dump X lines on a "feeling" and now those lines are getting dumped.

      Anyway time to go erase ?100? lines and write 5.


      edit: I forgot to explain something that I wanted to explain. A coder reading this may realize that all this parsing is FIFO (first in first out) and that may lead them to wonder why I don't simply push the new data into the parent array. Push works to use the next available index in an array. The answer is simple. My system is not exactly FIFO. My system does FIFO for named objects/arrays, then it's FIFO on directly assigned array indexes and then it's FIFO for unnamed objects/arrays. Followed by a FIFO on all remaining data. This means I could have (ex) an Array that has guts something like (object, boolean, string, array, number). If I was just pushing the data, the array at index 3 would end up at index 1 (zero based) and then it would be overwritten by the boolean.

      That's why I don't just push the data. It's not truly a FIFO system. It's a FIFO system according to the priority of a type. And not to toot my own horn too much, but that priority system is genius. It makes it to where the ultimate string gets smaller and smaller by working through it like a stack, without ever having to back up or do a reconsideration and it is emulating this stack before the real stack even exists. It's also a whopping 7 lines long.

      mark2 =	(xxon.match(/[,]?[a-zA-Z]+(\[\d+\])?:[\{\(]/g)[0])
      		? xxon.match(/[,]?[a-zA-Z]+(\[\d+\])?:[\{\(]/g)[0]
      		? xxon.match(/[,]?[a-zA-Z]+(\[\d+\]):/g)[0]
      		? xxon.match(/[,]?[\{\(]/g)[0]
      		: null;
      @mark2 - my lil joke is I'm using mark2 cause mark1 wasn't working out for me. If you don't get it, don't worry about it, it's retarded.
      Last edited by MadGypsy; 11-11-2013, 08:42 AM.


      • #78
        Done. I was never able to class my parser out into a more design patterns way because I had a few operations that ere relying on "self" in order to throw a complete/error event. When the event is cast the return data pulls from the class that cast it via an event dispatcher that is considered a direct part of the actual class. In other words the dispatcher is not considered to be a dot syntaxable member. It's more like a ghost helper. So, If I was to turn (ex) findTailAndTrim into it's own class, when it fires an event the eventHandler would be looking on findTailAndTrim for the ultimate data that is returned and it wouldn't exist... cause that data is stored in the main parser class. I hated that coupling and always kept it in the back of my head that I would eventually need to uncouple some arbitrary processes so they can be moved to their own class, start implementing interfaces, etc.

        Actually I'm going to give you a very mini lesson on OOP with design patterns right now. You don't program to methods, you program to interfaces. This will always destroy coupling, which is good. The less your code relies on some other class to exist without errors, the better. That's good OOP. Also design patterns gives us a, well, pattern to follow and work from. This is where you really break through on being a programmer. When you understand things like factory patterns, state machines, etc it takes a lot of guesswork out of your structure because the pattern is already defined, you just need to contour it to your needs and variables.

        Anyway, back on topic. I just removed the coupling that was locking all my code parts into a static unit. It was really easy. I simply removed the dispatched events. In every case the event would be an error one, because these processes either work or they don't. If they work then we don't have to dispatch event because it means the parser is still doing it's job. If they don't work then we need to bubble up an error to the class that initiated the parse and then handle it. I simply removed the dispatch and returned null. Now the dispatching of the error event is handled in the class that called the process (which is what I need) and whether to throw an error is based simply on whether the return data was null.

        This is step one in turning my parsing delimiters class into a parsing delimiters package. Unfortunately, design patterns forces you to bloat your code a bit (all those interfaces that you just rewrite more specifically). I'm not happy about that but, the bottom line is the end code will be pro. Everything will be programmed to an interface and possess no coupling. This will also make all of my code very reusable (maybe even portable would be a good adjective)

        I'm not sure if I intend to start classing this out now or move on and come back to it. This isn't something that is technically going to change my results at all. A user would never know if I used procedural or design pattern style programming and by at least making sure that I don't start coupling things again in the procedural way, I shouldn't have much problem converting it all to design patterns at a later time.

        Really, probably the only reason for a guy like me to even worry about design patterns is if I believed so hard in my project that I intended to sell it to a mega-company at some point. By already having pro code the value of my product is increased. Barring a buy-out, there really isn't much reason for me to use design patterns. I can easily achieve the exact same results on the user end with procedural programming and actually do it with less code.

        I have no idea what the monetary potential of my project is but, by going as pro as I can, I know that I have at least added as much value to it as possible. There is also the perk that everything will be better organized.


        gonna skip DP for now and start drawing a line from the front to the back of the parser. What I mean by this is: I want to lay the entire save/recall foundation. I want to go from text to an object to a saved byte object, back to an object which is reverse parsed back to text. This will handle the entire save/open features in my editor and I believe it is the absolute smartest thing to work on next. Everything can be built on top of that foundation.

        I'm jumping ahead but, from there I will iterate the final library object and start turning it's members into the display/stream/event/tween objects that they describe.

        When I get to that point, this will all start making a lot more sense and I will finally be able to produce some visual results. That's not the end of the line though. From there I believe I would drop a "division" of sorts into the whole tree, that allows a user to drag and drop representational objects and it's description will be automatically written to an object. Right before that would probably be an excellent time to convert everything to design patterns. That's the point where I am really going to need an organized and independent structure to work from cause, I'm basically going to be dumping an alternate method right into the middle of everything.
        Last edited by MadGypsy; 11-11-2013, 10:14 AM.


        • #79
          I thought I was done making the parser work but, I'm not. There are conditions where data is missing. I thought about it a bit more and I realized some things. Earlier I said the array cannot be pushed because data would end up out of order. And it would if the conditions I mentioned were met. Thing is, there is never going to be a time where those condition need to be met. There's never going to be a time when there is a mixed bag of data spanning all of the arrays indexes.

          Really this whole array mess is only necessary for one thing. Let's say you made a template object named "myObject", well, the array is necessary so you can do this:

          or this:

          in order to get multiple instances of one object that you can do a final customization to. In this example, myObject is apparently referring to some sort of text box. we could assume that myObject would already have properties like font, color, width, size, etc set, so when we call that instance all we have to "config" is the part(s) that we don't want to be identical. In this case it would be the text that is displayed. Technically you could reconfig the whole box but that would defeat the whole purpose of being able to template objects.

          Well, I'm tired but I also want to figure out what is happening in my final parse process to lose info. I haven't decided which I want more, sleep or solutions. I programmed my ass off today, summin like 10 hours practically straight, that's a good guess.


          • #80
            Last night I went over my code again. I decided that I will not be satisfied until I rewrite the entire thing from scratch (lol). I know that sounds crazy and like I'm stuck in an infinite loop. The bottom line is, when I pieced the code back together to make parser2 while making huge changes, some new conditions arose that were creating too many exceptions. I don't want a gazillion exceptions. I want clear and to the point code.

            Also, my regEx skills have improved substantially. That's what this post is really going to be about. My original sequesterStrings method:

            		private function sequesterStrings(xxon:String):String
            		{	//__> Strings are sequestered and replaced with a token as the first parse operation
            			//__> This stops delimiter characters within strings from affecting the parse
            			var value:String;
            			var start_cursor:int, end_cursor:int;
            			var opn_del:String, cls_del:String, esc_del:String;
            			//find every [: , (][space]["] possibility and remove [space]
            			xxon = xxon.replace(/:[\s\r\t\n]+"/g,":\"");
            			xxon = xxon.replace(/,[\s\r\t\n]+"/g,",\"");
            			xxon = xxon.replace(/\([\s\r\t\n]+"/g, "\(\"");
            			//create quote types
            			cls_del = "\"", esc_del = "\\\"";
            			for (start_cursor = 0; start_cursor < xxon.length; ++start_cursor)
            			{	//find every [: , (] " possibility, it's not possible for the first quote to be an escaped quote
            				opn_del = xxon.match(/[:,\(]"/g)[start_cursor];
            				if (opn_del) 						//a quote or prelimiter+quote was found so, find next quote
            				{	start_cursor = xxon.indexOf(opn_del,start_cursor);
            					end_cursor = xxon.indexOf(cls_del,start_cursor+opn_del.length);
            				} else start_cursor = -1; 			//there are no quotes
            				if(start_cursor !=-1)
            				{	//if there is an open delimiter
            					while(xxon.indexOf(esc_del,end_cursor-1) == end_cursor-1)
            					{	//while a "look-behind" end_cursor equals an escaped quote, move ahead
            						end_cursor = xxon.indexOf(cls_del,end_cursor+cls_del.length);
            					//end_cursor is a close quote
            					value = xxon.substring(start_cursor+opn_del.length,end_cursor);		//get string
            					stringArray.push(value);											//save string
            					var token:String = "\"TEXT["+String(stringArray.length-1)+"]\"";	//create token
            					xxon = xxon.replace("\"" + value + "\"", token);					//replace value with tokenin the main string
            					//prime start cursor position to land just after the token upon next loop itineration
            					start_cursor = xxon.indexOf(token, 0) + token.length - 1;
            				} else break;
            			return xxon;
            44 lines of going through the string a character at a time (sorta), finding opening and closing quotes, getting the guts between those quotes and storing it in an array and finally replacing those guts with a token on the main string.

            My new sequester function
            		private function sequester(xxon:String):String
            		{	var snapshot:String, token:String;
            			var open_del:String, close_del:String;
            			xxon = xxon.replace(/[\s\r\t\n]+"/g, "\"");
            			var open_delimiters:Array = xxon.match(/[,\(\:]"/g);
            			var close_delimiters:Array = xxon.match(/(?<![,\(\\\:])"[,\)\}]/g);
            			for (var index:int; index < open_delimiters.length; ++index)
            			{	open_del = open_delimiters[index];
            				close_del = close_delimiters[index];
            				snapshot = trim(xxon, open_del, close_del);
            				token 	 = "\/"+ "TEXT[" + index + "]" + "\/";
            				xxon 	 = xxon.replace("\"" + snapshot + "\"", token);
            				stringArray[index] = snapshot;
            			return xxon;
            16 lines that do the exact same thing and a few of these lines are simply to make the script more readable. So, this code could be even shorter. If I wanted it to be completely unreadable, I could make it 10.

            That's not the whammy though. My original stripComments function was 28 lines long. Now it's 1.

            xxon = xxon.replace(/(\/\*[\w\s\r\n\'\*]+\*\/)|(\/\/[^\r\n]+[\r\n])/g, ""); //remove all comments


            Now I'd like to back up a hair to where I said I started completely over. My original parsers have 1 thing absolutely correct - the order of things that need to happen. So I started with that. First is to sequester strings, you have to do this because strings could have characters that will conflict with the parse. Also, part of the clean-up process is removing all whitespace, you don't want your strings hanging around while you do that. Next you remove all comments and whitespace. At this point you should have a nice clean string to start parsing. That's where I am, a nice clean string to start parsing, which came from more optimized processes (new code).

            if you actually read the new sequester function you may have noticed it's calling another function within (trim). That was my old findTailAndTrim function from forever ago. I have looked it over a million times, there is no changing it.

            This means that I already have clean up and my primary trim function as good as I'm gonna get it. The whole ballpark resides in interpret() now. Everything is truly ready for interpret() and this is where I need to make a final decision on how to go about it.

            @Spike - (unrelated from another thread) - number types cannot be null in AS3 either. I never knew that. See in my new sequester function above where I create index in the loop interface but I don't tell it to equal anything? Yeah, well, I've been doing that forever and never realized what that meant. Obviously it is 0 unless otherwise specified.

            However, I don't think I have ever tried to find a null value on a number in AS3. I only did it in QC to be thorough and cause I don't know what the fuck I'm doing
            Last edited by MadGypsy; 11-13-2013, 02:54 PM.


            • #81
              I have come to the conclusion that it is impossible to handle array/objects that are in an array without sequestering them. I "finished" the 3rd rewrite, I like my code, but objects/arrays in an array just disappear in the final step.

              I have written quite a large object that has basically every possibility in it. I have followed every step of the parse via trace statements (ECHO/print equivalent), from the beginning of everything clean out to interpret(). Everything works exactly as I expect. When it gets to the final process (compile()) everything still works great until I get to object/array in an array. Funny things is, that script disappears but it doesn't stop compiling. Basically my compiler will compile everything that is not an object/array in an array, whether the script comes before or after the skipped data is irrelevant and not one error is thrown. No infinite loops occur either.

              So, I'm over the absolutely direct method. No matter how I write it, rewrite it, tweak it, whatever, I keep having the same problem. Luckily, that just means that I need to back up to my first script, isolate the sequester object code chunks and port them into my new script. This might even make my code shorter cause I have an awful lot of "IF" going on for the interpret process and much of it is array related. To parse an Object type is cake, arrays though have more parts and the order the user used needs to be maintained.

              I guess there is one up-side. By me essentially "giving up" on trying to troubleshoot why the new way isn't 100%, I know I'll be done with this tonight. I already wrote the code that will replace this. Overall, I'm not sad that I spent a week rewriting the parser and it's system. I learned RegEx to a degree that I have confidence when I write an expression. I still shortened my code by hundreds of lines and I have a deeper understanding of the parse in general. I realized where I was trying to be too "human" about certain things and I think I grew a bit as an AS3 programmer as well.

              The real icing is I have the ENTIRE scope of the code. My other parser was like a Quake map, you can go all over the place but only one place at a time. My new code is "Eliminate Place", make "place" disappear now. This is so much easier to keep track of. Instead of the script going on a journey through the code. It is quickly "blinked" out of existence and with every blink container types are properly created. A few later blinks and all singular vars are born and it's done.

              I guess I can live with sequestering objects/arrays which are within an array. I also don't intend to turn this script into a package. There's no point. At no point do I intend to use any part of it in a modular fashion. Everything in the class is designed specifically to work with everything else in the class. Writing a design pattern for this class would be a code bloating waste of time.


              • #82
                The image below contains 2 types of data. The top is the actual object notation that needs to be parsed and the bottom is the info the parser spits out once it is complete. This means the data in the lower half, really exists, has already been parsed.

                Let's consider what is going on. First of all I threw a shit load of ugly comments in there simply to test my comment removal system, which is obviously working aces cause the lower data isn't all F'ed up. Secondly we see that my "push into index" feature also works. The likes array initially has only 3 values, but I push a specific index afterwards and you can tell from the trace data (the lower stuff) that the order is correct

                In the above image I example how multiple mixed data nesting is handled effortlessly. test[2] is the 2nd index of the test array. However, test[2] has been made an array itself and it's first index is also an array and the 4th index of that is an object. You will never need to do such sloppy coding in my final application, but my parser could either really work or just work. I chose really work.

                You may also notice that my parser is aware of escaped quotes and they will not mess up the parse. In my trace window it shows the backslash, if I put that same data in a text field it would be displayed as - escaped". In other words if you need a quote inside of a string, simply escape it and my parser won't look at it like a delimiter any longer (which is how it is supposed to be)

                the next post will be my entire current and perfected code. Imma post it right now.


                • #83
                  267 lines of brutal delimiter parsing

                  package virtuoso.eng
                  		virtuoso.eng.Parser - parse a String to an Object
                  		@author		MadGypsy
                  		@version	3.0
                  		@build 		20131119-18:04cn
                  		@extends	EventDispatcher
                  	import flash.utils.getQualifiedClassName;
                  	import virtuoso.utils.SimpleEvent;			//custom event class
                  	public class Parser3 extends EventDispatcher
                  		private var init:Boolean = false;
                  		private var libObject:LibraryObject = new LibraryObject();
                  		private var stringArray:Array = new Array(), objectArray:Array = new Array();
                  		private var extra:String;
                  		public function Parser3():void {/*exist*/}
                  		public function parse(xxon:String):void
                  		{	xxon = sequester(xxon);															//sequester strings
                  			xxon = xxon.replace(/(\/\*[\w\s\r\n\'\*\/]+\*\/)|(\/\/[^\r\n]+[\r\n])/g, ""); 	//remove all comments
                  			xxon = xxon.replace(/[\t\r\n\s]/g,"");											//remove whitespace
                  			libObject.content = xxon;														//store entire current script on main object
                  			init = true;																	//set first run
                  			interpret(libObject, libObject.content);										//begin interpreting the script to an Object
                  		private function interpret(xxo:Object, xxon:String, fromArray:Boolean = false):void
                  			{	// if this is the first run
                  				xxo.baseType = String(xxon.split(":")[0]);	//get baseType
                  				xxon = trim(xxon, xxo.baseType + ":{");		//strip baseType wrapper
                  				init = false;								//unset first run
                  			{	var mark2:String, name:String, value:String, close_del:String, snapshot:String;
                  				var isArray:int, index:int;
                  				while (xxon != null)
                  				{	snapshot = null;												//prime for non-existance
                  					mark2 =	(xxon.match(/[,]?[a-zA-Z]+(\[\d+\])?:[\{\(]/g)[0])		//named Objects Arrays and Array Indexes that contain an Object/Array
                  								? xxon.match(/[,]?[a-zA-Z]+(\[\d+\])?:[\{\(]/g)[0]
                  							:(xxon.match(/[,]?[a-zA-Z]+(\[\d+\]):/g)[0])			//array indexes that have a non-container type value
                  								? xxon.match(/[,]?[a-zA-Z]+(\[\d+\]):/g)[0]
                  							:(xxon.match(/[,]?[\{\(]/g)[0])							//nameless objects and arrays
                  								? xxon.match(/[,]?[\{\(]/g)[0]
                  								: null;
                  					if (mark2 != null)	//if one of the 3 container types were found, pick apart all of it's data and set some switches
                  						name = (mark2.match(/[a-zA-Z]+/g).length)? mark2.match(/[a-zA-Z]+/g)[0] : null;
                  						index = (mark2.match(/(\[\d+\])/g).length > 0)? int(mark2.match(/\d+/g)[0]) : -1;
                  						value = trim(xxon, mark2);
                  						isArray = (mark2.match(/\(/g).length)? 1 :(mark2.match(/\{/g).length)? 0 : -1;
                  						close_del = (isArray == -1)? "" :(isArray)? ")" : "}";
                  						snapshot = mark2 + value + close_del;
                  						fromArray = ( index > -1 || (getQualifiedClassName(xxo) == "Array") )? true : false;
                  						if (getQualifiedClassName(xxo) != "Array") xxon = xxon.replace(snapshot, "");//Array guts get tokened, not wiped
                  					if (snapshot != null) 															//if an Object, Array or Index was found
                  						if (fromArray)																//if xxo is an Array or array[0] syntax
                  							if(isArray > -1)														//if the current value is an Array/Object
                  							{	var len:int = objectArray.length;									//get the sequester array length
                  								objectArray[len] = (isArray)? new Array() : new LibraryObject();	//properly type the next position of the sequester array
                  								objectArray[len].content = value;									//dump it's value on the index
                  								var token:String = "\/OBJ[" + len + "]\/";							//tokenize the sequester array index
                  								token = (snapshot.charAt(0) == ",")? "," + token : token;			//add a comma if necessary
                  								xxon = xxon.replace(snapshot, token);								//replace the value with the token on the main string
                  								value = token.replace(",", "");										//overwrite the value with the token but remove the comma
                  								interpret(objectArray[len], objectArray[len].content);				//process the sequester array's .content
                  							if (name)													//only possible if array[0] syntax
                  								if (!xxo[name]) xxo[name] = new Array();				//no array? create it
                  								if(!xxo[name].content)									//no .content?
                  									xxo[name].content = value;							//make it
                  								{	var rewind:Array = xxo[name].content.split(",");	//split string
                  									if (index < 0) index = rewind.length;				//if index has no value, assign it the length
                  									rewind.splice(index,0,value);						//inject the value into the proper index
                  									xxo[name].content = rewind.join(",");				//dump it all back to a string on .content
                  						} else if (getQualifiedClassName(xxo) == "virtuoso.eng::LibraryObject") {	//if this is an object type
                  							if (name != null)														//it has to have a name
                  							{	xxo[name] = (isArray)? new Array() : new LibraryObject();			//assign the name a type
                  								xxo[name].content = value;											//dump the value on it's .content
                  								interpret(xxo[name], xxo[name].content);							//send it for further processing
                  							} else {																//otherwise throw a name error
                  								dispatchEvent(new SimpleEvent(SimpleEvent.NOT_READY, "Error: You have a nameless member of an Object"));
                  								xxon = null;
                  						} else xxon = null;
                  					} else { 					//if the snapshot is null, all Objects, Arrays have already been created
                  						xxon = (xxon.charAt(0) == ",")?xxon.replace(",", ""):xxon;		//remove preceeding comma(if any)
                  						xxo.content = xxon;
                  						if(xxo == libObject)compile(libObject);
                  						xxon = null;
                  		private function type(value:String):*
                  		{	switch(value)
                  				case value.match(/\/TEXT\[\d+\]\//g)[0]:
                  					return String(stringArray[int(trim(value, "TEXT[", "]"))]);
                  				case value.match(/\/OBJ\[\d+\]\//g)[0]:
                  					return objectArray[int(trim(value, "OBJ[", "]"))];
                  				case value.match(/true/g)[0]:
                  					return Boolean(1);
                  				case value.match(/false/g)[0]:
                  					return Boolean(0);
                  				case value.match(/[0-9a-fA-F]+/g)[0]:
                  					return Number(value);
                  					return null;
                  		private function compile(xxo:*):void
                  				var pairs:Array = new Array();
                  				var lump:Array = xxo.content.split(","); 	//make a name:value lump
                  				for (var p:int; p < lump.length; ++p)
                  				{	pairs[p] = new Array();					//prime pairs[num]
                  					switch(lump[p].split(":").length)		//handle the lump
                  					{	case 2:								//a name exists
                  							pairs[p] = lump[p].split(":");	//split lump into name:value pairs
                  						case 1:								//no name, we must be in an array
                  							pairs[p][0] = undefined;
                  							pairs[p][1] = lump[p];			//store value
                  							dispatchEvent(new SimpleEvent(SimpleEvent.NOT_READY, "Error: There's no data"));
                  					if (type(pairs[p][1]) !== null)			//if typing is successful
                  						pairs[p][1] = type(pairs[p][1]);	//officially type the value
                  					else									//or throw a type error												
                  					{	dispatchEvent(new SimpleEvent(SimpleEvent.NOT_READY, "Error: Unrecognized var type"));
                  					if(pairs[p][0] == undefined) xxo[xxo.length] = pairs[p][1];	//no name, assign the value to the next available array index
                  					else xxo[pairs[p][0]] = pairs[p][1];						//otherwise assign the value to the name
                  			if (xxo.content) delete xxo.content;
                  			for (var name:String in xxo)
                  			{	try
                  				{	if (xxo[name].content) compile(xxo[name]);	//compile child .content or
                  					else if (xxo[name].length > 0) 				//compile the .content for each array index
                  						for(var n:int = 0; n < xxo[name].length; ++n)
                  				}catch(e:Error){/*there is no content on this but, that's ok.*/}
                  			/*__________> OBJECT IS COMPLETE <__________*/	
                  			if (xxo == libObject) dispatchEvent( new SimpleEvent(SimpleEvent.READY, extra) ); //first in/last out
                  		private function sequester(xxon:String):String
                  		{	//__> sequester all strings and convert to tokens on the main string
                  			var snapshot:String, token:String;
                  			var open_del:String, close_del:String;
                  			xxon = xxon.replace(/[\s\r\t\n]+"/g, "\"");									//remove all whitespace before quotes
                  			var open_delimiters:Array = xxon.match(/[,\(\:]"/g);						//create an array of open delimiter combos
                  			var close_delimiters:Array = xxon.match(/(?<![,\(\\\:])"[,\)\}\r\n]/g);		//close delimiter combos
                  			if (open_delimiters.length != close_delimiters.length)						//lengths don't match, die
                  			{	dispatchEvent(new SimpleEvent(SimpleEvent.NOT_READY, "Error: You forgot to close a quote somewhere"));
                  				return null;	
                  			} else {
                  				for (var index:int; index < open_delimiters.length; ++index)
                  				{	open_del  = open_delimiters[index];							//grab open characters
                  					close_del = close_delimiters[index];						//grab close characters
                  					snapshot = trim(xxon, open_del, close_del);					//get guts
                  					token 	 = "\/"+ "TEXT[" + index + "]" + "\/";				//create token
                  					xxon 	 = xxon.replace("\"" + snapshot + "\"", token);		//replace guts with token
                  					stringArray[index] = snapshot;								//store guts
                  			return xxon;	//return the new string
                  		private function trim(xxon:String, open_del:String, close_del:String = null):String
                  		{	var snapshot:String;
                  			var open_count:int, close_count:int;
                  			var open:RegExp, close:RegExp; 
                  			var alt_del:String = open_del.match(/[\{\(]/g)[0];					//strip open_del to just it's Object/Array delimiter (if any)
                  			if (alt_del && !close_del)											
                  			{	close_del = (alt_del == "(")? ")" : "}";						//determine close delimiter
                  				open  = new RegExp("\\" + alt_del, "g");
                  				close = new RegExp("\\" + close_del, "g");
                  			} else {															
                  				close_del = (!close_del)? "," : close_del;						//_| for all of these conditions to be met and a comma assigned... 
                  				open  = new RegExp("\\" + open_del,"g");						//_| this content must be from an array index with a non-Array/Object value
                  				close = new RegExp("\\" + close_del, "g");						//_| the comma is a possible close delimiter but so is null (ie - end of the string)
                  			var start_cursor:int = xxon.indexOf(open_del,0)+open_del.length; 	//set position after start delimiter 
                  			var end_cursor:int = start_cursor;									//prime end_cursor
                  			while (true)
                  				end_cursor = xxon.indexOf(close_del, end_cursor);				//advance end_cursor to next close position			
                  				if (end_cursor != -1)
                  					snapshot = xxon.substring(start_cursor, end_cursor);		//grab snapshot between delimiters
                  					open_count = snapshot.match(open).length;					//count opening delimiters
                  					close_count = snapshot.match(close).length;					//count closing delimiters
                  					if(open_count != close_count)								//counts don't match, look ahead
                  					{	if ( xxon.indexOf( close_del, end_cursor + close_del.length ) != -1 )	//if more delimiters exist
                  							end_cursor += close_del.length;						//move past last found delimiter
                  						else break;												//or die
                  					} else return snapshot;										//counts match, we have what we want
                  				} else {
                  					if (close_del == "," && !snapshot)							//comma didn't work for the array[0] close
                  						return xxon.substring(start_cursor, xxon.length); 		//so, end of the string, get it all
                  					else break;													//or die
                  			//delimiters are missing - die
                  			dispatchEvent(new SimpleEvent(SimpleEvent.NOT_READY, "Error: You are missing a close delimiter somewhere after: "+snapshot));
                  			return null;
                  		public function get libraryObject():LibraryObject { return libObject; }
                  		public function set switchName (str:String):void {	extra = str;  }
                  Last edited by MadGypsy; 11-20-2013, 03:19 PM.


                  • #84
                    From this point I am going to start a blog. This parser is done, it works and it's as slim as I can get it. I edited it one last time since my last post. I took out 21 lines of code and commented the shit out of it. Some of my comments are a bit "caveman" because my character count was almost 14000. I had to take out almost 2000 characters without removing a single comment. That was a nice little lesson in futility but, I managed to do it without a character to spare. The above post is exactly 12000 characters and there isn't even a tab I can get rid of. I also didn't want to sacrifice line breaks for readability. In other words, I really made it by the skin of my teeth.

                    The 21 lines I removed were just long ways of saying something that can be said shorter or moving comments from their own line to the end of a line. The functionality and expected results have not changed a bit. There is only one way to get this any shorter and it would destroy readability. I would have to remove some vars and replace the use of the var with the value I initially assigned it. It's not worth it. I didn't set out to get this script as short as possible. I set out to get it shorter than it was, get rid of "wordy" code. I did that.

                    Now I can draw my line through the engine with String>Object>Bytes>Object>String, followed by creating the display/stream/tween/event types that the object notation will represent. When I get to the latter, then will be the time to go completely backwards and make a blog that catches up to where I am. Really, that's only like 3 or so blog posts, which I feel is a good start. So, the existence of the blog will pretty much designate that my project is finally a something.

                    I hope to fly through many of the next parts. I've already done some of the work and much of it is very repetitive. For instance if I lay the system for reading an object and parsing it out to a (ex) textbox, 90% of that will be the foundation for parsing everything out to whatever. Also, however I handle the data within the actual display/stream/event/tween type will be THE way. So I basically make one system that can handle all of the different types. That's not as complicated as it sounds, I'm already probably half way there, due to much previous work.


                    • #85
                      LOL, oh man, this is rich. Get this - when I complete something, I like to go back and test it to an extreme degree. In the case of this parser, I wanted to cover every possibility even if that possibility will never actually occur. I wrote a psychotic object. It was a huge mess but completely valid.

                      I then started tracing all the data to see every little thing the parser is doing, each step of the way. I noticed something, every object/array was being sequestered, whether it was within an array or not - the funny thing is, it still worked perfect, even though it was essentially doing something that I told it not to do.

                      I ran some more tests. Sure enough there was an entire "else" that was never getting hit...but everything still worked. The end result was correct regardless. That's not really acceptable though. I figured out why it was happening. I had to move 1 line of code down about 20 lines. Not because of any procedural garbage but because where I was setting something was sticking around too long. It needed to be moved to a spot that is constantly being reconsidered. I did so and the results were instantly what I wanted them to be. The "else" was getting hit properly.

                      Now this raises another consideration. If my else was never getting hit and everything worked perfect, then I don't need the else...right? Well, not exactly. Whereas it was working, it was working due to slipping through cracks. The chunk of code that was processing the wrong data does not clearly express that it is supposed to do that. In other words, if someone was to read my code, when they get to the "cracks" part, they would be lost. All of a sudden it doesn't make immediate sense why it works at all. On top of that, a part I still don't understand, is the spot where it was slipping through cracks is specifically for arrays and it automatically assigns the parent as an array, but objects were slipping through, being typed as an array and then magically becoming an object before the end. That's a mess. So, I do need the else, if for nothing more than to make it clear what is happening in the script but, primarily because objects shouldn't be typed as an array and then mystically be transformed back into an object elsewhere.

                      I just think it's crazy that my parser still worked perfectly even though it was taking strange detours through the code. If I wasn't the most anal man that ever lived (not really) I may have never even realized this was happening. It's such a minute detail that hinges on one little line and to figure out this is happening you have to trace data in a very specific and isolated spot. In other words, I got lucky. Funny thing is, I got lucky by looking at the output at just the right time and subconsciously something clicked that "that aint right". I didn't even know what wasn't right, at first. My object was so big and crazy it was like needle in a haystack stuff. I went back and looked at my Object notation and realized what wasn't right, the fix from there was simple. It was simple because I don't have an assload of conditions. Actually, there was only one thing it could be and that's why I spent all of this time rewriting my parser. There is really only one little chunk of code that does the body of the work. If there is an error, it's probably in there, cause the rest is primarily little side functions that are called by the main chunk and I already know they work.

                      5 functions

                      parse - cleans the string and kicks it all off - nothing to ever do here
                      type - simply returns a typed object - nothing to change here either unless a new type is added
                      sequester - sequesters all the strings, only called one time in parse - nothing to ever do here
                      compile - splits name:value pairs and assigns them using type - nothing to change here

                      that just leaves interpret(). interpret is where all of the actual parsing happens and if there is a parsing error, it's gonna be in there. That's where the greatest concentration of co-ordination occurs. Everything else is pretty linear. This was not the case in my older parsers. My older parsers divided up the work much differently.

                      Anyway, I'm rambling now, "overdriving my headlights" so to speak. I think I type based on what I'm pretty sure I am about to I'll adjust my code above. It's just one line. Also, I am confident that that is it. I have spent 6 hours hunting down problems that may not even exist and all I found was this. Honestly, there isn't really anywhere else that there could be a problem at this point. Everything is following the proper path and the return data is correct against psycho object. I think I can really move on.


                      • #86
                        What i am about to say has nothing to actually do with the theory of parsing delimiters, this is just the most relevant thread I can remember making. The concluding is actually about the results of parsing delimiters.


                        Wooah, I was just bringing my display engine to the next level and I had a big surprise on one of the tests. The idea is to make semi generic objects in separate files and then have a main file that handles the display list of those objects Ex: you design/customize a text field in every way but, leave the text blank so you can keep using instances of the textfield but, add the text at the last minute (outside of it's actual config object).

                        What I didn't realize that I did was allow for the possibility to skip the display list altogether. The display list should have simple information - positioning, layer and last minute adjustment (like the text to inject). For shits and giggles I put all of that info in the text object and just told the display list to add the object with no further info. It WORKED! This makes it even more versatile. Imagine a scenario where you will only ever put one textfield on the screen and the only thing that will ever change about that field is the inserted text. Well now you can tell the text object right it in it's descriptor the height, layer, width, x and y properties. Meaning the display list just has it's name. This is only ideal if you would want only one version of said object but if you want more than one, my originally intended method also works.

                        This is excellent. It gives the Objects an ability to have more meaning. Objects that contain their display properties are basically singletons and primarily static (and I can go further to make damn sure of that) Whereas objects that do not contain their display properties are fully dynamic (which is already true).

                        I love it when I make good stuff that I never even considered, accidentally. It's not very often but, it does happen. Actually I seem to remember another "side effect" bonus with this same parser but, I can't remember the details. I know it's in this thread though.


                        Aside: Ya know I know a lot of people think I am a good programmer and I am sure I am but, I never really feel that way. I always look at my work, no matter how much I have accomplished, and feel like "If I only did it the better way that I don't even know about". Recently though I find myself actually being a bit proud of my work. There is no way for me to describe to you how much I have grown as a programmer, even in the last 6 months but, it is incredibly apparent to me, especially when I look at "old" code and solutions vs the new. It's almost good that I haven't finished any of these huge projects cause I would just be going back to rewrite them anyway. Actually isn't that the whole reality, I keep going back to rewrite this stuff until I feel like it's "right"?

                        I feel like I'm finally getting to a point where rewrites wont really be necessary. I'm not referring to my code but my skills as a programmer. I think I have finally begun doing it right the first time. It has only taken me my entire life to get this far.
                        Last edited by MadGypsy; 05-14-2014, 05:03 PM.


                        • #87
                          Another good surprise!!!

                          I was messing with the display list and I decided that I don't like telling things what layer to be on. The reason is simple - Objects have no sort order, they are meant to be real fast so there is no extra fluff to them. If I want to directly assign the layers, I then have to go through the entire display ist again and sort it by those numbers, I also have to have one little stupid line of code that basically says "BTW don't put a var named layer on any display objects" This complicates everything too much. Layer on the Object but not the actual display object and then a bunch of blahblahblah NO GOOD.

                          So I decided that it was smarter to just make an array and place all the display Objects in that but there was only one problem. Can I do this

                          display(objectName:{/*stuff*/}, someOtherObject:{/*stuff*/})

                          In other words will my parser handle named objects in array indexes? Guess what? IT DOES! It's important to have this cause those object names are the base filename of the Object files it needs to implement. The other super cool thing about this is, the Object names can be used over and over without being considered a conflict with a pre created version of the same object. In other words


                          are the same exact thing and the last one will overwrite the one before it


                          are not the same thing even though they implement the same object. because now those values are perceived as display[0].someObject, etc. In essence I create unique namespaces for multiple instances of one object. This means, make one code and reuse it indefinitely with no worry of conflict or overwrite.

                          This is crazy! I keep finding these things that I never even thought of and my parser handles it flawlessly. I'm under the assumption that I'm not even sure what I built. It's stuff like this that lets me know I built something of quality cause the bottom line is, everything I have discovered about my parser is how it should be. These are legitimate possibilities, I just didn't know that I actually made it that good.

                          When I made the named object inside of an array, I was totally expecting my results to be all fucked up. I'm not even sure where in my parser I made that possible. Maybe it's not an "in" thing. Maybe the parser as a whole is making it possible, IDK, IDC, I'm just glad it works. My learn function picks entire objects apart from one end to another and tells me everything about everything, types, values, nesting, everything. I am positive that this really works.

                          EDIT; Wait a minute, I do understand why this works. It's incredibly simple. Let's take an example from earlier

                          displayobjectName:{/*stuff*/}, someOtherObject:{/*stuff*/})

                          my parser finds "" then ")" then "display". It then decides to make a blank array named display and it takes all the guts that were between and ) and sends it back through the interpreter. At this point it is locked on to display as the current parent while it parses the guts. At this point it is no different than if it was parsing these named object with no wrapper at all. I'm not going to explain it cause it's too in depth but if I went another way with my overall parser, this would not work at all. Basically, I got lucky and had the exact right idea for my parser to treat everything as a blank parent with future assignment. If the assignments happened right off, this would be fucked.

                          Some of you more computer savvy people may be wondering "How do you get the name back out? If you called display[0] its just gonna tell you an Object is there and it's not even going to give you the object."

                          Wow you're a smart cookie. I'm glad you asked cause that's gonna lead me to the next thing that I did and didn't realize the importance of. It ain't pretty though


                          Yup, my parser already stores the name of named objects in the same index. Go figure. It's like I was harnessing the future or some shit and I didn't even know it. Arrays are weird in AS3 you could dump a million gabillion named things in one array index and get them all back out with their name. I don't need a million gabillion things though I just need .name and the object of that name.

                          Actually array indexes in AS3 are Objects but they aren't classed as Objects which I totally don't get but I know how to work with it

                          if (someArray[n] is Object) -> would equal true
                          if (getQualifiedClassName(someArray[n]) == "Object") -> would equal false
                          Last edited by MadGypsy; 05-14-2014, 09:46 PM.


                          • #88
                            It's been a couple/few years of me playing with this and I finally have some real world results. I don't intend to post images of my results just yet but, I do intend to describe them.

                            First of all, this entire thing was originally intended to basically act as a markup language for a client that can read it and display the described results. In almost no way was this project any different than html and a browser.

                            A lot has happened since I began this. One thing in particular is, I have studied the Away3D engine to death. In those studies I realized how I could take my Object system to give 3D models more meaning and properties. Instead of loading models and then attaching them to a hard coded type with no ambiguity (think of any entity in Quake), I can attach it to a more ambiguous class (think Player, Enemy, Projectile, etc) and use the accompanying Object to define the specifics. These specifics can even include things like attaching keyEvents with property manipulations to a model. As a matter of fact you can even manipulate sub meshes through Object properties.

                            The end result is a 3D game engine that invents a game on-the-fly based on well co-ordinated external object data. Instead of having game code (qc) to modify, you simply mod descriptions. The engine has (will have) a sick amount of possibilities and options hard coded in, the idea is to tap into that potential instead of inventing it.

                            Here is an example:

                            Quake: to modify an entity, lets assume monster_zombie, you would have to modify the zombie code and recompile the entire script. The changes you make are "permanent" and hard coded.

                            my system: You define an entity in it's corresponding Object. You import this entity into the Map Object and using a merge object I can "upgrade" any entity without touching the original. This may seem like a bunch of objects to juggle but all the imported entities, the map and the merge get compiled to one object long before the game is even released. In other words I juggle a bunch of files as a developer but there is only one object per map for my engine to juggle. The merge Object is very handy. It allows me to change an entities properties on a per map basis. Maybe the zombies from level 1 are a bit harder in level 2

                            Quake: qc puts you in the drivers seat to destroy your code with shitty or wrong code

                            my system: with the exveption of syntax you can't break or slop up my objects. If you tried to include properties that don't exist they get ignored. If you give a property a wrong value type it gets cast to the proper one. Of course that will still give you questionable results but it won't break the client.

                            There are no functions, loops, classes, etc. Everything is based on the fact that the client already knows how to do what you want, you just need to include the proper flag to make it happen. I know there is no way for my client to really know everything but, as this project continues the amount of "everything" that it doesn't know will be very small.

                            I still have a long way to go but my POC is solid.


                            • #89
                              This may amuse some...

                              I spent all this time in this thread perfecting a way to pass flash objects in string format and then have flash parse and recognize the string as a qualified object. Well, it turns out that a javascript eval that calls back to a flash ExternalInterface accomplishes the same thing.

                              so however many lines of code I wrote here and all the time developing it can be replaced with something like...


                              it takes a line or 2 in flash to define and allow a call to "callbackName()" but that is still nothing in comparison to the size and complexity of my parser class. Live and learn, right?


                              • #90
                                I see there has been a couple-few visitors to this page recently. My quake engine actually uses a subset of this technology. My BSPUtility converts BSPs into an AMF serialized object, which is basically what this was all about. THIS project was based on writing object notation and having it converted to AMF. My BSP utility spins the object without having to hand write it.

                                There is a lot of good data here though about how to create a stack and parse name/value pairs from it. In a lot of ways this was (sort of) a JSON parser for Flash.

                                @ "I have studied the Away3D engine to death" (2 posts down)

                                LOL, boy was that some bullshit. I studied Away3D 3.x to the point of understanding how to control and render a scene. It was ridiculous for me to say I studied it to death. It probably just felt like it. I'm using Away3D 4.1 now and things are different. However, a lot of the ideas expressed in that post are being included in some form or another.

                                For instance:

                                Enemy animation frames are hardcoded in the QC. I have always felt like the last place you want to define a models animations is in the game code. When I get to models, they will very likely be accompanied with some JSON-esque file that describes their animations. Much like an md5_anim
                                Last edited by MadGypsy; 04-25-2016, 04:48 PM.