Announcement

Collapse
No announcement yet.

Parsing Delimiters (theory)

Collapse
X
 
  • Filter
  • Time
  • Show
Clear All
new posts

  • #61
    What language is this?
    www.quakeone.com/qrack | www.quakeone.com/cax| http://en.twitch.tv/sputnikutah

    Comment


    • #62
      XXON - Generic Object Notation ala I made it up...Gypsy Script...lol. It's primarily JSON but I added a bunch of stuff (and will add more).
      http://www.nextgenquake.com

      Comment


      • #63
        I have made some serious upgrades to this whole project.

        1) I have shortened load times and much more by concocting a different method. It used to be that my compiler would grab a text script, parse it into an object and then convert that object to visual/event data, basically using the object as a properties reference.

        Now the whole first process is preliminary to a release. I built the beginnings of a script editor for the language. When you hit [compile] it converts all the text to an object and then saves that object as bytes. This is superior in every way. Now when the app loads it does not have to parse the text into an object, it loads the bytes and voila the object is ready.

        However, that object still needs to be converted to the final visual/event data

        2) The entire project can be contained in a zip. This is mega superior to my original way. The old way had me importing/uploading external assets as it encountered them. By having all local assets/data stored in a zip, the entire zip is automatically loaded (all images, xml, whatever). I then partition the files into an associative array, meaning, they can be referred to in the xxon by their filename without the need for a full path. This also means that I can create a factory pattern that allows for one image to serve many destinations. Now, instead of each image object containing it's actual resource, the object becomes a pointer to the single resource.

        Also there is the obvious compression perk. Having my engine expand the zip internally will save bandwidth and load times.

        That's about all for right now, but just those 2 additions to my compiler/engine have added a big speed bonus.

        I have to get further before I can test it, but I would like to see if I can turn the entire completed app into bytes. I already know there are going to be some problems with this, but even if I was able to convert a percentage of the display objects to bytes it would be an optimization worth making.

        Buttons, for instance, would be a prime candidate for byte conversion. Buttons are not usually dynamic UI's. (ex: play, stop, pause, next, prev) so if I could get a completed button to convert properly to bytes, that would be a major step in itself.

        There is a balance and a trick to all of this and I am learning it as I go. An example of a bad idea for byte conversion would be a slideshow which is fed data by php. I may get all of the buttons to convert to bytes, but I don't want all of the images to become static (unchangeable) members of the slideshow. So, I have a lot of tinkering, learning and fine tuning to do, to achieve the proper level of conversion for each element.
        Last edited by MadGypsy; 09-24-2013, 02:02 PM.
        http://www.nextgenquake.com

        Comment


        • #64
          You may have noticed that I don't post as often. This is a good thing. It means I'm working on big things. I have taken this entire system up a notch from my last post.

          let's put everything in very simple perspective

          1) the original point was to be able to write out an object in string format and then have my PD work convert that string to a fully qualified object //CHECK, did that

          2) Then I was to take the Object and use it as a source of properties for display/stream/event/format objects. //CHECK done

          So far we have 2 conversion systems working to parse a string into a final display object. Along the way I added 2 more tools to my arsenal. The ability to write/read bytes and the ability to (de)compress data.

          These 2 tools fundamentally changed what my overall goal was. The original goal was to create an engine that will read the string files to display objects and to create a script editor in which to create the string based files with.

          Through figuring out how to convert objects to bytes and back I don't have to mess with strings at all. Essentially, my entire parsing delimiters script is unnecessary. I can approach this from another angle which has you choose library elements that are already Objects, you would then modify the properties of that object and upon save the object will be converted to and saved as bytes. These bytes can then be loaded and directly converted back to an object without any delimiter parsing at all.

          So, the whole script editor idea is pointless. We can go straight to a drag/drop click/define style library of premade generic objects. This does not mean that you will not be able to write object code and compile it to objects. I didn't spend over a month parsing strings to objects for nothing. It is an alternative. But unlike before, if you write a string object script it will automatically parse itself and compile to bytes, meaning you will only get away with running the program with string objects one time, by the second time it will all be converted and recompiled.

          I still have a long way to go. So far, my program determines where the source data is, gathers it, converts it all to the proper Object (Object/XML/String) and sequesters it accordingly in a hash table. So I am at the point where all data has been retrieved and shoved in a library.

          Now I need to take that data, turn it into display/stream/event/format objects and add it to the stage. That will actually finalize the entire core of the engine. The next part comes in creating generic objects that can be modified and connected to other objects.
          like:

          textbox, button, scrollbar, videoDisplay, imageDisplay, slideshow, datagrid, etc, etc, etc

          That's the real meat and potatoes right there. How many "components" can I make and how should they tie into other components? When I get over that hump and establish a meaty library I can pretty much start exposing my work.

          There is another hump though. I'm not sure how big it will be. You may have noticed that XML was mentioned earlier. XML is already recognized by flash as a "string object". Simply type an xml string as XML [ex XML(some xml string)] and [talisa:Shazaam!] you have a real XML object to work with it's [Tea Monster:a doddle]. I want to be able to allow bytes and xxon to be completely skipped by xml. In other words there are 3 data types that will get you the exact same results (Bytes/XXON/XML). Bytes is definitely the preferred method but I refuse to make it the only one.

          Anyway, that's where I am and where I am going. Hopefully I will have some kind of examples in the near future. For now, I am refining all of my current work so I can have peace of mind that it is complete before I move on to the next steps.

          Oh, I stole my name for my Radiant package to name this engine. When it is complete this will be the Virtuoso engine, which is far more fitting for this engine than my radiant game pack. However, I do not intend to rename my Radiant game pack. Maybe if I get bored in the future I will sue myself.
          http://www.nextgenquake.com

          Comment


          • #65
            More Theory

            I think a lot about my methods before I start committing code to them. Last week I basically summed it up that the final object would be converted to it's appropriate display/stream/tween/event type. My technique was rudimentary, I was going to loop through all the properties of the object and apply them one by one to the destination type. That technique has really been bugging me. I vaguely remembered summin bout summin summin (lol) so, I went digging in the docs. I actually found it pretty quick...

            registerClassAlias(namespace:String, classpointer:Class). That will allow me to simply say that the Object IS the class. It's tricky, functions are not saved in an Object so, I have to make the class constructor orchestrate the instance of the destination object.

            For those of you that have no idea what most of that said, I will make this simple.

            A class constructor is the function that automatically runs when the class is "called". A class is generally a "page" of code that performs "one complete operation". Let's say that you wanted to make a button on the screen, well, one example of a class would be all of the code that is necessary to create a working button. Actually, let's roll with that. I want to take an object (same code I have been posting here all along) and simply say [Talisa:Shazaam] You're a button! No bullshit, loops, steps, nothing...just - You're a button.

            That sums it all up right there.

            Edit: oh wait, no it doesn't sum it all up. I forgot to say that the way I will perfect this is by taking a button and saying - You're an Object. Then I'm going to see what it spits out so, I can accurately tell an Object - You're a button.

            That sums it up.
            Last edited by MadGypsy; 10-13-2013, 06:58 PM.
            http://www.nextgenquake.com

            Comment


            • #66
              Well,.. the above theory proved to be more confusing in implementation than I expected. I did however, perform a successful experiment of proportions I didn't expect. I was able to take the native flash TextField type, assign it all of it's format/style/display properties and save it in a byte array. Then I was able to read the byte array and retrieve the 100% completed TextField that I put in it.

              This is a hella powerful option. This means that my editor can simply save the completed object and when the application is opened there is no parsing/figuring/adjusting/etc to do. Each element will automatically be what it is supposed to be.

              So, there ya go. I started this by writing a parser for an Object language and brought it all the way to saving/loading the bytes for completed objects. This is actually a game changer. Now the engine is actually very simple code. It's the editor that is going to be incredibly complex.

              The editor will have to allow drag and dropping of elements (simple) with a form-like interface for it's properties (time consuming) allowing the user to compile their app down to a library of completed Objects that can interact with one another. The thing is, I don't want to eliminate my Object notation so, I have to also make a script editor. I want to expand the entire thing to accept xml as well. The editor IS the project, for me, from this point on.

              The Engine is going to be microscopic.

              1) find and import source
              2) read bytes to objects
              3) display objects

              It will also have to be able to handle events/streams/etc but that stuff is like a class a piece. Judging by what I'm saying here, my engine looks like it's gonna be about 8 classes (lol). The editor however,... it's gonna be a whole lot of work.
              Last edited by MadGypsy; 10-14-2013, 06:45 PM.
              http://www.nextgenquake.com

              Comment


              • #67
                Ok, so I spent the last few days turning de3stination objects into bytes and the bottom line is that it cannot be done consistently. To register a class alias, that class cannot accept arguments in the constructor. In laymens terms, the code page cant expect initial data.

                Almost every display object in flash is some kind of a sub class of Object class. Object utilizes the Transform class and the Transform class has arguments in it's constructor. This means that almost any display type that I attempt to convert to bytes will have all null transform properties. Here are my options:

                1) Go mod a HUGE chunk of the AS3 source to use set instead of constructor arguments
                2) Just turn generic objects to bytes and use them as a form of properties list for the final display type.

                I'm thinking #2 cause #1 is an absolutely insane idea that would take forever to completely isolate.

                This doesn't mean that no final objects can be converted to bytes. I just can't convert any that are utilizing the Transform class, even if it's just through inheritance. There is actually a #3 but I'm not prepared to bring my experiment in this direction and I'm not even sure it would work.

                I could attempt to use the IDynamicPropertyWriter interface and write all but the transform properties. Serialize everything like it is the final display object but, never registering a class alias. This will give me a generic object but everything inside will be perfectly typed and serialized as if it was the real thing.

                To get the object back out I would have to find a way to say "now you're this" which will not be as simple as TextField(txt.readObject())... hmmm, I'm reading Objects, what if I readBytes() and considered the Object on my own? [[txt.readBytes() as TextField?]]

                anyway, I have more stuff to think about.
                http://www.nextgenquake.com

                Comment


                • #68
                  Even more theory

                  There are 2 possible errors to pick from when serializing a display object.

                  1) cannot convert object@726872 to Transform
                  2) Transform expects 1 parameter, 0 given

                  Number 1 is what happens when you just do nothing about Transform. Number 2 is what happens when you register a class alias on Transform. I find 2 interesting. It's interesting because it is definitely existing as a Transform it's just missing the constructor argument.

                  What if I isolated the Transform chunk among the bytes and supplied a constructor argument before reading it out to the destination object? I've searched and searched, everywhere on the net about overcoming this Transform thing - simply says "can't be done". Maybe I can't work around this. I haven't seen anything about writing arguments to a constructor though.

                  ---
                  New Plan

                  Every final object will have different rules and properties so they will all need to be treated differently. This being the case, instead of just saving a generic object as bytes. I will save each Object as it's compiler type. When I readObject it will instantiate the class and use the base object data to make all the decisions/assignments.

                  bytes (converted)-> instantiates as "self compiler" object (returns)-> an instance of destination object

                  Actually, that's the way to go. I know how to get that version completed. All this fucking around with Transform is a waste of time.
                  Last edited by MadGypsy; 10-17-2013, 06:36 PM.
                  http://www.nextgenquake.com

                  Comment


                  • #69
                    Man, I have created like 20 different incomplete versions of my idea, just trying to find the one most balanced version. I have been to both extremes of the spectrum, from inventing everything based on a String() to compiling the completely finished Object to bytes and everything inbetween. As-well-as open/save functions and various conversion possibilities based on what you are opening/saving.

                    In short, I have discovered my possibilities from A to Z (for the master direction that I am pointing). I believe I have finally isolated the compiled save format.

                    Before I was using zip, no more zip, or at least it will not be used the same as I was using it. It will be used more as an option as opposed to the final compiled format. I was doing this much like a .pak. It's not necessary. The only thing I can't save in a byte object is an image. Actually I can't serialize it as an image in a byteArray. I could still just save the bytes and attempt to tell it it is an image when the Object is read.

                    Transform is tricky. In the case of an Image I need the Loader class to display it, which uses the Transform class, BUT an image by itself is still image data. So in this case I would write the bytes of the image (not the Loader) to the Object and then attempt to assign the bytes as someImg.bitmapData. However, I'm not gonna do any of that. Which brings us back to the zip. The option will be available to read external data from a zip, but it wont be mandatory. Images will be saved as paths. This is better anyway. A path could also be (ex) a php document that returns an image.

                    The final byte file for the application will store the entire completed Library Object.

                    In other words, there would be something like the below

                    Code:
                    library:
                    {
                    	doc:
                    	{
                    		baseType:"app",
                    		text1:
                    		{
                    			//final properties - x, y, text
                    			//inherits and can overwrite for this instance any property in text1 library element
                    		}
                    	},
                    	text1:
                    	{
                    		baseType:"textbox",
                    		//properties that configure this to be a customized reusable element - format, justification, multiline?, etc
                    	}
                    }
                    When this file is read all the bytes are immediately converted to an Object. That Object then self compiles into the final application (Library.doc) or a library element (Library.anythingElse). There is still much to consider, but this structure will allow me to have a very small file size and no need to break everything up like a pak, where multiple folders have various data... I can just put all of the data (except images) into the Library Object, compress it and save it all as one byte file.
                    http://www.nextgenquake.com

                    Comment


                    • #70
                      Extending Object Possibilities

                      Because my parser is primarily JSON and my engine will be built around reading/compiling/using these objects, there is no reason why Objects cannot be contrived in PHP against a database and sent back to the engine.

                      This is a very powerful feature. Imagine a static library Object that builds the entire application and configures/creates all the events, etc., but is powered by dynamic Objects that have been returned by PHP. Depending on how it is utilized it could be an entire security layer, a media server, an aggregate...

                      to dream too far ahead is probably bad...
                      http://www.nextgenquake.com

                      Comment


                      • #71
                        This will likely be the last post about this for a good spell. I believe I have all of the pieces put together in my head. The only detail that has changed from the above post is: my string to Object parser now is a string to LibraryObject parser.

                        What is the difference? Well, really not that much. Library Object even extends Object. The difference is in:

                        Code:
                        public function assemble():void
                        {
                        	switch(baseType)
                        	{
                        		case "image":
                        			break;
                        		case "textbox":
                        			break;
                        		case "etc":
                        			break;
                        		default:
                        			break;
                        	}
                        }
                        the above in this case represents a switch/case that determines what the data of this LibraryObject should be converted into. Normally there would be code between those case and break lines. I didn't wanna type all that. Use your imagination. So, just like I said, everything will be self compiling Objects.

                        There is one other thing I did, but it amplifies what I already intend to do. I removed all of the comments from my parser. Almost 300 lines. The script is now about 430 lines. It's still doing too much. I already (way earlier in this year) stripped out it's image and overwrite abilities, but that wasn't enough.

                        You may be wondering... "why are you deleting capability?". Because the capability does not exist natively or it is ability that should happen somewhere else. What I mean by "natively" is (ex) import["path", "instance1", "instance2"]... is a nothing. It doesn't exist in any sensical Object way and there is no way to easily describe it. It is also unnecessary due to improvements that were made elsewhere.

                        What did that ever do? It allowed you to call 1 script but make multiple instances of it, and assigned those instances the target value(s) that are supplied after path. It was a handy line before I learned more stuff and started building the other end of all of this.

                        The "still too much" part comes in the form of unnecessary information being stored. I actually purposely programmed that "unnecessary" information in. When I started all of this I had a much different idea of how I was going to proceed. Now I want the extra data out of my script and that's not going to be super fun.

                        That's it. I know what to do. See ya when I'm done.
                        Last edited by MadGypsy; 10-21-2013, 05:33 PM.
                        http://www.nextgenquake.com

                        Comment


                        • #72
                          Rewriting practically the entire parser. Don't get me wrong, my current parser works great. I just think I can write the exact same results in fewer lines and come up with an even more direct system. This project is turning into 100 steps forward and 100 steps back. I don't care. It will behoove me to make my foundation as clean and solid as I am capable of.

                          It's not a complete rewrite though. sequesterStrings(), stripComments() and findTailAndTrim() functions are ideal. I have no reason to change these. They get right to the point on accomplishing what their name implies.

                          The main function that needs to be rewritten is interpret(). That function finds all the container types Object/Array and dumps all of the contents for that container into an associative var (.content). Then that .content is run through interpret() to see if there are more container types. On and on til all that is left is String/Number/Boolean values on any given .content. Whereas that system works awesome. I think I can make an even better one.

                          The next is compile() compile goes through all the .content vars and converts the string/number/boolean values to actual fully typed values. In other words it tells the computer - "This value IS a (ex) boolean" it also assigns that value to a name if one is given. If one isn't given, it must be an array index or an error.

                          I maybe don''t have to change this much. The thing that I want to eliminate is how I handle Object/Arrays that are inside of an array. The concept I used spans interpret() and compile() and that's really where the changes will be made.

                          If interpret() is processing the contents of an Array, meaning we are "in" an Array at the moment and the .content has (ex) an Object in it. It trims out that entire object string and sequesters it in a holder array, then it replaces the object string with a token that has the array index the actual object string is stored in. Then The holder array . content for that index is run through interpret(). When we hit compile() it looks for the token and brings the interpreted .content back. I can't remember why I did it this way, but I have already thought of another way that I feel is superior. It eliminates the holder array altogether. There is some confusion though. I'm still not 100% clear of everything that will need to be modified to achieve the new way.

                          I think I'm the only person here that works so far backwards to ever get forward. I'm not saying that's good. I'm saying this stuff is extremely complicated and there is no one way to achieve the desired results. There are a gabillion ways to achieve them. I'm just trying to use the best one I can make.

                          ---

                          Another thing I want to do is break interpret and compile down into smaller functions. Certain things happen within these functions that could be global possibilities. Like parsing out a name for instance. I only need it in the one spot I use it, so making it a function is not an optimization. What making it a function will do, is make my code easier to read and manage. It may even spark understanding as to how other things can be condensed or removed. The script sits at 460 +/- lines. My goal is 400. That's just a made up number but, I know my code and I think it is probably something like 15% too "active". That's my guess. That's what it feels like when I look at it. 60 lines need to go. Which means 100 lines need to be replaced with 40, or something like that. Or maybe 460 lines need to be replaced with 400 that would really suck.
                          Last edited by MadGypsy; 11-06-2013, 01:33 PM.
                          http://www.nextgenquake.com

                          Comment


                          • #73
                            RegEx

                            I have boosted the power of chopping up the data by a LOT! I may end up replacing way more than 100 lines with way less than 40. I finally realized what was happening too much. Initially I would search for something, get it's name, get it's value, consider some comma possibilities and remove it from the main string. That all still needs to be done but, I can do it without all the looking ahead and behind that I was doing. Whereas before I was looking for a particular delimiter now I'm grabbing a possible comma prelimiter, the name, possible index [0], pairs delimiter : and possible array or object delimiter ( or {. That's huge! that's all the data in one shot! Everything about that tells me what to expect next.

                            Code:
                            opening_mark = xxon.match(/[,]?[a-zA-Z]+(\[\d+\])?:[\{\(]/g)[0];
                            that returns stuff like this
                            Code:
                            ,holder:{
                            holder[0]:(
                            ,holder[0]:{
                            holder:(
                            actually, I think that's every possibility but that's not every syntax possibility. This chunk would be to find every named object and named array first. We just dismantle the above regExp a little to get the next possibility

                            Code:
                            index_mark = xxon.match(/[,]?[a-zA-Z]+(\[\d+\]):/g)[0];
                            This will only spit out keys that are array index syntax. This will also catch everything[0] before the { or ( in the above examples but, we simply don't look at this var unless open_delimiter is null. The idea is this represents array[0] syntax that holds something other than an object or array. If this is null, there is only one more possibility

                            Code:
                            nameless_mark = xxon.match(/[,]?[\{\(]/g)[0];
                            This is only present in inline array syntax [ex array((var,var,var),var,var)] notice the array syntax in array[0]. This expression is to catch that, or nameless object syntax.

                            Once those 3 possibilities are eliminated the only thing that could be left is name:value pairs or singular array index vars. In other words String, Number or Boolean. That string gets saved on the object .content. When it comes time to compile .content is split by comma and then split by colon, voila name:value pairs all nice and tidy in an array that becomes the name:value on the Object or the val of array[num], properly typed.

                            The end.
                            Last edited by MadGypsy; 11-06-2013, 07:52 PM.
                            http://www.nextgenquake.com

                            Comment


                            • #74
                              Some people, when confronted with a problem, think "I know, I'll use regular expressions."
                              Now they have two problems.

                              Some Game Thing

                              Comment


                              • #75


                                I was already using regEx. I just wasn't capturing more than a delimiter at a time. I was also waiting too long to type the return data. Instead of waiting til I have partitioned every little thing out, I'm just jumping right in now.

                                "Oh you're an Object? OK then be one, here's your content. Now let's zap that out of the main string and move on."

                                vs

                                "OK you are an object. Let's see what else is going on. We'll go find a little delimiter over here and sequester unnecessary things over there. Let's not even call you an Object yet, let's just set you to the side and determine if you are an object again later."

                                That's the difference between now and before. Very proactive and vicious code that starts typing right off the bat. I'm at 235 lines right now. I still need to add compile() back in though. But compile() wasn't very big and I can remove a chunk that is no longer necessary. So, I'm thinking my original 460 lines is going to be 300 and achieve the exact same final results. In compile I have almost the same run of processes happening 3 times in a row, depending on which regex is not null first. I could combine that into one function with some better conditions and maybe slim this down to 250 lines. With just a little more optimization on an anal level, I could maybe reduce my parser to barely 200+ lines. The absolute original was over 1000. It's not a fair comparison though cause the absolute original was doing a lot more. I had overwrite and implement functions as well as import. I also had hundreds of lines of comments. So, it only makes sense to compare my new version to my 460+ line version which I believe is going to end up being half or less, with identical results.

                                Once I'm done and happy with my final parser. I can finally move forward cause I won't have "you're building on top of a mess" nagging me and stifling my creativity. I'll share my final parser code here when it is complete. Maybe this time it won't exceed the allowed characters by like 20,000 characters. That's why I never posted it. It didn't fit in a post, by a lot.
                                http://www.nextgenquake.com

                                Comment

                                Working...
                                X