diff options
Diffstat (limited to 'README.BTYACC')
-rw-r--r-- | README.BTYACC | 603 |
1 files changed, 603 insertions, 0 deletions
diff --git a/README.BTYACC b/README.BTYACC new file mode 100644 index 0000000..705481f --- /dev/null +++ b/README.BTYACC @@ -0,0 +1,603 @@ +-- $Id: README.BTYACC,v 1.1 2014/03/25 19:21:31 Tom.Shields Exp $ + +The original README from btyacc is below. + +The backtracking enhancements to byacc have been merged into Thomas Dickey's +byacc baseline. + +The %include and %define/%ifdef enhancements described below are not currently +incorporated. + +------------------------------------------------------------------------------- + BTYACC -- backtracking yacc + =========================== + +BTYACC was created by Chris Dodd using ideas from many +places and lots of code from the Berkeley Yacc +distribution, which is a public domain yacc clone put +together by the good folks at Berkeley. This code is +distributed with NO WARRANTY and is public domain. +It is certain to contain bugs, which you should +report to: chrisd@collins.com. + +Vadim Maslov of Siber Systems <vadik@siber.com> +considerably modified BTYACC to make it suitable +for production environment. + +Several people have suggested bug fixes that +were incorporated into BtYacc. + +See the README.BYACC files for more about +Berkeley Yacc and other sources of info. + +http://www.siber.com/btyacc/ is the current home of BtYacc. +It is provided courtesy of Siber Systems http://www.siber.com/. + + + Version 3.0 changes + ------------------- + by Vadim Maslov + +Changes mostly occurred in btyaccpa.ske file that +contains the parsing shift/reduce/backtrack algorithm. + +Version 3.0 innovations focus on: +- text position computation and propagation, +- industrial-strength error processing and recovery. + + +** Added mechanism for computing and propagating +text position of tokens and non-terminals. + +Compilers often need to build AST trees such that every node +in a tree can relate to the parsed program source it came from. +The following applications are very likely to need this: +- debuggers that show actual source of the debugged program, +- source-to-source translators that want + unchanged parts of the tree to generate the unchanged code. + +The new YYPOSN mechanism added in this version of BtYacc +helps you in automating the text position computation +and in assigning the computed text positions to the AST. +This mechanism is successfully used in commercial +parsers and source-to-source translators. + +In standard Yaccs every token and every non-terminal +has an YYSTYPE semantic value attached to it. +In this new version every token and every non-terminal +also has an YYPOSN text position attached to it. +YYPOSN is a user-defined type that can be anything and +that has a meaning of text position attached to +token or non-terminal. + +In addition to semantic value stack BtYacc now maintains +text position stack. Behavior of the text position stack +is similar to the behavior of the semantic value stack. + +If using text position mechanism, +you need to define the following: + +YYPOSN Preprocessor variable that contains C/C++ type of + the text position attached to + every token and non-terminal. + +yyposn Global variable of type YYPOSN. + The lexer must assign text position of + the returned token to yyposn, just like it assigns + semantic value of the returned token to yylval. + +YYREDUCEPOSNFUNC + Preprocessor variable that points to function that + is called after the grammar rule reduction + to reduce text positions located on the stack. + + This function is called by BtYacc to reduce text + positions. The function is called immediately after + the regular rule reduction occurs. + + The function has the following prototype: + void ReducePosn(YYPOSN &ret, + YYPOSN *terms, + YYSTYPE *term_vals, + int term_no, + int stk_pos, + int yychar, + YYPOSN &yyposn, + UserType extra); + + The function arguments are: + - ret + Reference to the text position returned by + the rule. The function must write the computed + text position returned by the rule to ret. + This is analogue of the $$ semantic value. + + - term_posns + Array of the right-hand side rule components + YYPOSN text positions. These are analogues of + $1, $2, ..., $N in the text position world. + + - term_vals + Array of the right-hand side (RHS) rule components + YYSTYPE values. These are the $1,...,$N themselves. + + - term_no + Number of the components in RHS of the reduced rule. + Equal to size of arrays term_posns and term_vals. + Also equal to N in $1,...,$N in the reduced rule. + + - stk_pos + YYSTYPE/YYPOSN stack position before the reduction. + + - yychar + Lookahead token that immediately follows + the reduced RHS components. + + - yyposn + YYPOSN of the token that immediately follows + the reduced RHS components. + + - extra + User-defined extra argument passed to ReducePosn. + + Typically this function extracts text positions from + the right-hand side rule components and either + assigns them to the returned $$ structure/tree or + if no $$ value is returned, puts them into + the ret text position from where + it will be picked up by the later reduced rules. + +YYREDUCEPOSNFUNCARG + Extra user-defined argument passed to + the ReducePosn function. This argument can use + any variables defined in btyaccpa.ske. + + +** Added code to btyaccpa.ske that automatically cleans up +semantic semantic values and text positions of tokens +and non-terminals that are discarded and deleted as +a result of error processing. + +In the previous versions the discarded token and non-terminal +semantic values were not cleaned that caused quite severe +leaks. The only way to fix it was to add garbage collection +to YYSTYPE class. + +Now BtYacc skeleton calls delete functions for semantic +values and positions of the discarded tokens and +non-terminals. + +You need to define the following functions that BtYacc +calls when it needs to delete semantic value or text position. + +YYDELETEVAL + User-defined function that is called by BtYacc + to delete semantic value of the token or non-terminal. + + The user-defined function must have the prototype: + void DeleteYYval(YYSTYPE v, int type); + v is semantic value to delete, + type is one of the following: + 0 discarding token + 1 discarding state + 2 cleaning up stack when aborting + +YYDELETEPOSN + User-defined function that is called by BtYacc + to delete text position of the token or non-terminal. + + The user-defined function must have the prototype: + void DeleteYYposn(YYPOSN p, int type); + v is semantic value to delete, + type is one of the following: + 0 discarding token + 1 discarding state + 2 cleaning up stack when aborting + + +** User can define "detailed" syntax error processing +function that reports an *exact* position of +the token that caused the error. + +If you define preprocessor variable YYERROR_DETAILED in +your grammar then you need define the following +error processing function: + +void yyerror_detailed(char *text, + int errt, + YYSTYPE &errt_value, + YYPOSN &errt_posn); + +It receives the following arguments: +text Error message. +errt Code of the token that caused the error. +errt_value Value of the token that caused the error. +errt_posn Text position of token that caused error. + + +** Dropped compatibility with C. + +Compatibility with C became increasingly difficult +to maintain as new features were added to btyaccpa.ske. +So we dropped it. If anybody wants to make the new version +compatible with C, we would gladly accept the changes. + +Meanwhile we expect that you use C++ to write grammar +actions and everything else in grammar files. +Since C is (in a sense) subset of C++, your C-based +grammar may work if you use C++ compiler to compile it. + + Version 3.0 bugs fixed + ---------------------- + +Matthias Meixner <meixner@mes.th-darmstadt.de> fixed a bug: +BtYacc does not correctly handle typenames, if one typename +is a prefix of another one and if this type is used after +the longer one. In this case BTYacc produces invalid code. + + + Version 2.1 changes + ------------------- + by Vadim Maslov + +** Added preprocessor statements to BtYacc that are similar +in function and behavior to C/C++ preprocessor statements. + +These statements are used to: + +- Introduce modularity into a grammar by breaking it + into several *.y files and assembling different + grammars from the *.y modules using %include and %ifdef. + +- Have several versions of the same grammar + by using %ifdef and $endif. + +- To include automatically generated grammar fragment. + For instance, we use %include to include + automatically generated list of tokens. + +Preprocessor statements are: + +%define <var-name> + Define preprocessor variable named <var-name>. + +%ifdef <var-name> + If preprocessor variable named <var-name> + is defined by %define, then process the text from + this %ifdef to the closing %endif. + +%endif + Closing bracket for %ifdef preprocessor statement. + Only one nesting level of %ifdef-%endif is allowed. + +%include <file-name> + Process contents of the file named <file-name>. + If <file-name> is a relative name, it is looked up + in a directory in which btyacc was started. + Only one nesting level of %include is allowed. + + + Version 2.0 changes + ------------------- + by Vadim Maslov + + +** Changed 16-bit short numbers to 32-bit int numbers in +grammar tables, so that huge grammar tables (tables that +are larger than 32768 elements) resulting from huge +grammars (Cobol grammar, for instance) can work correctly. +You need to have 32-bit integer to index table bigger than +32768 elements, 16-bit integer is not enough. + +The original BtYacc just generated non-working tables +larger than 32768 elements without even notifying about +the table overflow. + + +** Make error recovery work correctly when error happens +while processing nested conflicts. Original BtYacc could +infinitely cycle in certain situations that involved error +recovery while in nested conflict. + +More detailed explanation: when we have nested conflicts +(conflict that happens while trial-processing another +conflict), it leads btyacc into NP-complete searching of +conflict tree. The ultimate goal is YYVALID operator that +selects a particular branch of that tree as a valid one. + +If no YYVALID is found on the tree, then error recovery +takes over. The problem with this is that error recovery +is started in the same state context that exists on the +last surveyed branch of the conflict tree. Sometimes this +last branch may be of zero length and it results in +recovering to exactly the same state as existed before +entering the conflict. BtYacc cycles then. + +We solved this problem by memorizing the longest path in +the conflict tree while browsing it. If we ever get into +error recovery, we restore state that existed on the +longest path. Effectively we say: if we have an error, +let us move forward as far as we possibly could while we +were browsing the conflict tree. + + +** Introduce YYVALID_NESTED operation in addition to +simply YYVALID. When we have a nested conflict (conflict +while processing in trial mode for another conflict), we +want to relate YYVALID to a particular level of conflict +being in trial. + +Since we mostly anticipate only 2-level nested conflicts +YYVALID_NESTED tells the parser to satisfy only the +internal conflict. Therefore, in 1-level conflict +situation YYVALID_NESTED acts like a regular YYVALID, but +in 2-level conflict it is a no-op and the other YYVALID +for outer conflict will be searched for. + + +** Improved handling of situation where /tmp directory is +missing. Original btyacc just died quietly when /tmp +directory was missing. We added code that states the +problem explicitly. While on UNIX /tmp directory is always +present, it may be missing on WIN32 systems, therefore +diagnosing this situation is important. + + + Version 1.0 changes: BackTracking + ================================= + by Chris Dodd + +BTYACC is a modified version of yacc that supports +automatic backtracking and semantic disambiguation to +parse ambiguous grammars, as well as syntactic sugar for +inherited attributes (which tend to introduce conflicts). +Whenever a btyacc generated parser runs into a +shift-reduce or reduce-reduce error in the parse table, it +remembers the current parse point (yacc stack and input +stream state), and goes into trial parse mode. It then +continues parsing, ignoring most rule actions. If it runs +into an error (either through the parse table or through +an action calling YYERROR), it backtracks to the most +recent conflict point and tries a different alternative. +If it finds a successful parse (reaches the end of the +input or an action calls YYVALID), it backtracks to the +point where it first entered trial parse mode, and +continues with a full parse (executing all actions), +following the path of the successful trial. + +Actions in btyacc come in two flavors -- {}-actions, which +are only executed when not in trial mode, and []-actions +which are executed regardless of mode. There are also +inherited attributes, which look like arguments (they are +enclosed in "()") and act like []-actions. + +What this buys you: + +* No more lexer feedback hack. In yacc grammars for C, a +standard hack, know as the "lexer feedback hack" is used +to find typedef names. The lexer uses semantic +information to decide if any given identifier is a +typedef-name or not and returns a special token. With +btyacc, you no longer need to do this; the lexer should +just always return an identifier. The btyacc grammar then +needs a rule of the form: + +typename: ID [ if (!IsTypeName(LookupId($1))) YYERROR; ] + +While the hack works adequately well for parsing C, it +becomes a nightmare when you try to parse something like +C++, where treating an ID as a typedef becomes heavily +dependent on context. + +* Easy disambiguation via simple ordering. Btyacc runs +its trials via the rule "try shifting first, then try +reducing by the order that the conflicting rules appear in +the input file". This means you can deal with semantic a +disambiguation rule like: + [1] If it looks like a declaration it is, otherwise + [2] If it looks like an expression it is, otherwise + [3] it is a syntax error + [Ellis&Stroustrup, Annotated C++ Reference Manual, p93] + +To deal with this, you need only put all the rules for +declarations before the rules for expressions in the +grammar file. + +* No extra cost if you do not use it. Backtracking is +only triggered when the parse hits a shift/reduce or +reduce/reduce conflict in the table. If you have no +conflicts in your grammar, there is no extra cost, other +than some extra code which will never be invoked. + +* C++ and ANSI C compatible parsers. The parsers produced +by btyacc can be compiled with C++ correctly. If you +"#define" YYSTYPE to be some C++ type with constructor and +destructor, everything will work fine. My favorite is +"#define YYSTYPE SmartPointer", where SmartPointer is a +smart pointer type that does garbage collection on the +pointed to objects. + +BTYACC was originally written to make it easy to write a +C++ parser (my goal was to be able to use the grammar out +of the back of the ARM with as few modifications as +possible). Anyone who has ever looked at Jim Roskind +public domain C++ yacc grammar, or the yacc-based grammar +used in g++ knows how difficult this is. BTYACC is very +useful for parsing any ambiguous grammar, particularly +ones that come from trying to merge two (or more) complete +grammars. + +Limitations of the backtracking: Currently, the generated +parser does NO pruning of alternate parsing paths. To +avoid an exponential explosion of possible paths (and +parsing time), you need to manually tell the parser when +it can throw away saved paths using YYVALID. In practice, +this turns out to be fairly easy to do. A C++ parser (for +example) can just put a [YYVALID;] after every complete +declaration and statement rule, corresponding to pruning +the backtracking state after seeing a ';' or '}' -- there +will never be a situation in which it is useful to +backtrack past either of these. + +Inherited attributes in btyacc: + +Inherited attributes look a lot like function arguments to +non-terminals, which is what they end up being in a +recursive descent parser, but NOT how they are implemented +in btyacc. Basically they are just syntactic sugar for +embedded semantic actions and $0, $-1, ... in normal yacc. +btyacc gives you two big advantages besides just the +syntax: + 1. it does type checking on the inherited attributes, + so you do not have to specify $<type>0 and makes sure + you give the correct number of arguments (inherited + attributes) to every use of a non-terminal. + 2. It "collapses" identical actions from that are produced + from inherited attributes. This eliminates many + potential reduce-reduce conflicts arising from + the inherited attributes. + +You use inherited attributes by declaring the types of the +attributes in the preamble with a type declaration and +declaring names of the attributes on the lhs of the yacc +rule. You can of course have more than one rule with the +same lhs, and you can even give them different names in +each, but the type and number must be the same. + +Here is a small example: + /* lhs takes 2 inherited attributes */ +%type <t1> lhs(<t1>, <t2>) + stuff(<t1>, <t2>) +%% +lhs($i1, $i2) : { $$ = $i1 } + | lhs($i1, $i2) stuff($1,$i2) { $$ = $2; } + +This is roughly equivalent to the following yacc code: +lhs : + { $$ = $<t1>-1; } + | lhs [ $<t1>$ = $-1; ] [ $<t2>$ = $<t2>0; ] stuff + { $$ = $4; } + ; + +See the file "test/t2.y" for a longer and more complete +example. At the current time, the start symbol cannot +have any arguments. + +Variant parsers: + +Btyacc supports the -S flag to use a different parser +skeleton, changing the way that the parser is called and +used. The skeleton "push.skel" is included to produce a +"passive" parser that you feed tokens to (rather than +having the parser call a separate yylex routine). With +push.skel, yyparse is defined as follows: + +int yyparse(int token, YYSTYPE yylval) + +You should call yyparse repeatedly with successive tokens +of input. It returns 0 if more input is needed, 1 for a +successful parse, and -1 for an unrecoverable parse error. + + + Miscellaneous Features in ver. 1.0 + ---------------------------------- + by Chris Dodd + + The -r option has been implemented. The -r option tells +Yacc to put the read-only tables in y.tab.c and the code and +variables in y.code.c. Keith Bostic asked for this option so +that :yyfix could be eliminated. + + The -l and -t options have been implemented. The -l +option tells Yacc not to include #line directives in the code +it produces. The -t option causes debugging code to be +included in the compiled parser. + + The code for error recovery has been changed to +implement the same algorithm as AT&T Yacc. There will still +be differences in the way error recovery works because AT&T +Yacc uses more default reductions than Berkeley Yacc. + + The environment variable TMPDIR determines the directory +where temporary files will be created. If TMPDIR is defined, +temporary files will be created in the directory whose +pathname is the value of TMPDIR. By default, temporary files +are created in /tmp. + + The keywords are now case-insensitive. For example, +%nonassoc, %NONASSOC, %NonAssoc, and %nOnAsSoC are +all equivalent. + + Commas and semicolons that are not part of C code are +treated as commentary. + + Line-end comments, as in BCPL, are permitted. Line-end +comments begin with // and end at the next end-of-line. +Line-end comments are permitted in C code; they are converted +to C comments on output. + + The form of y.output files has been changed to look more +like those produced by AT&T Yacc. + + A new kind of declaration has been added. +The form of the declaration is + + %ident string + +where string is a sequence of characters beginning with a +double quote and ending with either a double quote or the +next end-of-line, whichever comes first. The declaration +will cause a #ident directive to be written near the start +of the output file. + + If a parser has been compiled with debugging code, that +code can be enabled by setting an environment variable. +If the environment variable YYDEBUG is set to 0, debugging +output is suppressed. If it is set to 1, debugging output +is written to standard output. + + + Building BtYacc + --------------- + by Chris Dodd and Vadim Maslov + +We used GCC and GNU make to compile BtYacc both on UNIX and +WIN32 paltforms. You are welcome to try different +combinations of makes and compilers. Most likely it will +work, but it may require Makefile changes. + +There is no config script. +Just type "make" and it should compile. + +AWK. If you want to change file btyaccpa.ske (backtracking +parser skeleton), you will need awk to compile it into +skeleton.c file. We used GNU AWK (gawk) version 3.0. + +It is known that using older versions of gawk +may create problems in compilation, because older awks +have problems with backslashes at the end of a line. + +For MSDOS, there a "makefile.dos" that should do the trick. +Note: makefile.dos was not tested for a long time. + +The result of compilation should be a single executable called +"btyacc" which you can install anywhere you like; +it does not require any other files in the distribution to run. + + + Legal Stuff + ----------- + by Chris Dodd and Vadim Maslov + +In English: BtYacc is freeware. BtYacc is distributed with +no warranty whatsoever. The author and any other contributors +take no responsibility for any and all consequences of its use. + +In Legalese: LIMITATION OF LIABILITY. NEITHER SIBER SYSTEMS +NOR ANY OF ITS LICENSORS NOR ANY BTYACC CONTRIBUTOR SHALL BE +LIABLE FOR ANY INDIRECT, INCIDENTAL, SPECIAL OR CONSEQUENTIAL +DAMAGES, OR DAMAGES FOR LOSS OF PROFITS, REVENUE, DATA OR +DATA USE, CAUSED BY BTYACC AND INCURRED BY CUSTOMER OR ANY +THIRD PARTY, WHETHER IN AN ACTION IN CONTRACT OR TORT, EVEN +IF SIBER SYSTEMS HAS BEEN ADVISED OF THE POSSIBILITY OF SUCH +DAMAGES. |