summaryrefslogtreecommitdiff
path: root/re2c.1.in
blob: 90b4b2367742f844b3f5cf0ba90e9db288d06abc (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
704
705
706
707
708
709
710
711
712
713
714
715
716
717
718
719
720
721
722
723
724
725
726
727
728
729
730
731
732
733
734
735
736
737
738
739
740
741
742
743
744
745
746
747
748
749
750
751
752
753
754
755
756
757
758
759
760
761
762
763
764
765
766
767
768
769
770
771
772
773
774
775
776
777
778
779
780
781
782
783
784
785
786
787
788
789
790
791
792
793
794
795
796
797
798
799
800
801
802
803
804
805
806
807
808
809
810
811
812
813
814
815
816
817
818
819
820
821
822
823
824
825
826
827
828
829
./" 
./" $Id: re2c.1.in 862 2008-05-25 14:30:45Z helly $
./"
.TH RE2C 1 "@PACKAGE_DATE@" "Version @PACKAGE_VERSION@"
.ds re \fBre2c\fP
.ds le \fBlex\fP
.ds rx regular-expression
.ds rxs regular-expressions
.ds lx \fIl\fP-expression
.SH NAME
\*(re \- convert \*(rxs to C/C++

.SH SYNOPSIS
\*(re [\fB-bdDefFghisuvVw1\fP] [\fB-o output\fP] [\fB-c\fP [\fB-t header\fP]] \fBfile\fP

.SH DESCRIPTION
\*(re is a preprocessor that generates C-based recognizers from regular
expressions.
The input to \*(re consists of C/C++ source interleaved with
comments of the form \fC/*!re2c\fP ... \fC*/\fP which contain
scanner specifications.
In the output these comments are replaced with code that, when
executed, will find the next input token and then execute
some user-supplied token-specific code.

For example, given the following code

.in +3
.nf
char *scan(char *p)
{
/*!re2c
        re2c:define:YYCTYPE  = "unsigned char";
        re2c:define:YYCURSOR = p;
        re2c:yyfill:enable   = 0;
        re2c:yych:conversion = 1;
        re2c:indent:top      = 1;
        [0-9]+          {return p;}
        [^]             {return (char*)0;}
*/
}
.fi
.in -3

\*(re -is will generate

.in +3
.nf
/* Generated by re2c on Sat Apr 16 11:40:58 1994 */
char *scan(char *p)
{
    {
        unsigned char yych;

        yych = (unsigned char)*p;
        if(yych <= '/') goto yy4;
        if(yych >= ':') goto yy4;
        ++p;
        yych = (unsigned char)*p;
        goto yy7;
yy3:
        {return p;}
yy4:
        ++p;
        yych = (unsigned char)*p;
        {return char*)0;}
yy6:
        ++p;
        yych = (unsigned char)*p;
yy7:
        if(yych <= '/') goto yy3;
        if(yych <= '9') goto yy6;
        goto yy3;
    }

}
.fi
.in -3

You can place one \fC/*!max:re2c */\fP comment that will output a "#define 
\fCYYMAXFILL\fP <n>" line that holds the maximum number of characters 
required to parse the input. That is the maximum value \fCYYFILL\fP(n)
will receive. If -1 is in effect then YYMAXFILL can only be triggered once
after the last \fC/*!re2c */\fP.

You can also use \fC/*!ignore:re2c */\fP blocks that allows to document the
scanner code and will not be part of the output.

.SH OPTIONS
\*(re provides the following options:
.TP
\fB-?\fP
\fB-h\fP
Invoke a short help.
.TP
\fB-b\fP
Implies \fB-s\fP.  Use bit vectors as well in the attempt to coax better
code out of the compiler.  Most useful for specifications with more than a
few keywords (e.g. for most programming languages).
.TP
\fB-c\fP
Used to support (f)lex-like condition support.
.TP
\fB-d\fP
Creates a parser that dumps information about the current position and in 
which state the parser is while parsing the input. This is useful to debug 
parser issues and states. If you use this switch you need to define a macro
\fIYYDEBUG\fP that is called like a function with two parameters:
\fIvoid YYDEBUG(int state, char current)\fP. The first parameter receives the 
state or -1 and the second parameter receives the input at the current cursor.
.TP
\fB-D\fP
Emit Graphviz dot data. It can then be processed with e.g.
"dot -Tpng input.dot > output.png". Please note that scanners with many states
may crash dot.
.TP
\fB-e\fP
Cross-compile from an ASCII platform to an EBCDIC one. 
.TP
\fB-f\fP
Generate a scanner with support for storable state.
For details see below at \fBSCANNER WITH STORABLE STATES\fP.
.TP
\fB-F\fP
Partial support for flex syntax. When this flag is active then named
definitions must be surrounded by curly braces and can be defined without an
equal sign and the terminating semi colon. Instead names are treated as direct
double quoted strings.
.TP
\fB-g\fP
Generate a scanner that utilizes GCC's computed goto feature. That is \*(re
generates jump tables whenever a decision is of a certain complexity (e.g. a 
lot of if conditions are otherwise necessary). This is only useable with GCC 
and produces output that cannot be compiled with any other compiler. Note that
this implies -b and that the complexity threshold can be configured using the
inplace configuration "cgoto:threshold".
.TP
\fB-i\fP
Do not output #line information. This is usefull when you want use a CMS tool
with the \*(re output which you might want if you do not require your users to 
have \*(re themselves when building from your source.
\fB-o output\fP
Specify the output file.
.TP
\fB-r\fP
Allows reuse of scanner definitions with '\fB/*!use:re2c\fP' after
'\fB/*!rules:re2c\fP'. In this mode no '\fB/*!re2c\fP' block and exactly one
'\fB/*!rules:re2c\fP' must be present. The rules are being saved and used by
every '\fB/*!use:re2c\fP' block that follows. These blocks can contain
inplace configurations, especially '\fBre2c:flags:w\fP' and '\fBre2c:flags:u\fP'.
That way it is possible to create the same scanner multiple times for different
character types, different input mechanisms or different output mechanisms.
The '\fB/*!use:re2c\fP' blocks can also contain additional rules that will be
appended to the set of rules in '\fB/*!rules:re2c\fP'.
.TP
\fB-s\fP
Generate nested \fCif\fPs for some \fCswitch\fPes.  Many compilers need this
assist to generate better code.
.TP
\fB-t\fP
Create a header file that contains types for the (f)lex-like condition support.
This can only be activated when \fB-c\fP is in use.
.TP
\fB-u\fP
Generate a parser that supports Unicode chars (UTF-32). This means the 
generated code can deal with any valid Unicode character up to 0x10FFFF. When
UTF-8 or UTF-16 needs to be supported you need to convert the incoming stream
to UTF-32 upon input yourself.
.TP
\fB-v\fP
Show version information.
.TP
\fB-V\fP
Show the version as a number XXYYZZ.
.TP
\fB-w\fP
Create a parser that supports wide chars (UCS-2). This implies \fB-s\fP and 
cannot be used together with \fB-e\fP switch.
.TP
\fB-1\fP
Force single pass generation, this cannot be combined with -f and disables 
YYMAXFILL generation prior to last \*(re block.
.TP
\fB--no-generation-date\fP
Suppress date output in the generated output so that it only shows the re2c
version.
.TP
\fb--case-insensitive\fP
All strings are case insensitive, so all "-expressions are treated
in the same way '-expressions are.
.TP
\fB--case-inverted\fP
Invert the meaning of single and double quoted strings.
With this switch single quotes are case sensitive and
double quotes are case insensitive.

.SH "INTERFACE CODE"
Unlike other scanner generators, \*(re does not generate complete scanners:
the user must supply some interface code.
In particular, the user must define the following macros or use the 
corresponding inplace configurations:
.TP
\fCYYCONDTYPE\fP
In \fB-c\fP mode you can use \fB-t\fP to generate a file that contains the 
enumeration used as conditions. Each of the values refers to a condition of
a rule set.
.TP
\fCYYCTXMARKER\fP
\*(lx of type \fC*YYCTYPE\fP.
The generated code saves trailing context backtracking information in \fCYYCTXMARKER\fP.
The user only needs to define this macro if a scanner specification uses trailing
context in one or more of its \*(rxs.
.TP
\fCYYCTYPE\fP
Type used to hold an input symbol.
Usually \fCchar\fP or \fCunsigned char\fP.
.TP
\fCYYCURSOR\fP
\*(lx of type \fC*YYCTYPE\fP that points to the current input symbol.
The generated code advances \fCYYCURSOR\fP as symbols are matched.
On entry, \fCYYCURSOR\fP is assumed to point to the first character of the
current token.  On exit, \fCYYCURSOR\fP will point to the first character of
the following token.
.TP
\fCYYDEBUG(\fP\fIstate\fP,\fIcurrent\fC)\fP
This is only needed if the \fB-d\fP flag was specified. It allows to easily debug
the generated parser by calling a user defined function for every state. The function
should have the following signature: \fIvoid YYDEBUG(int state, char current)\fP. 
The first parameter receives the state or -1 and the second parameter receives the 
input at the current cursor.
.TP
\fCYYFILL\fP(\fIn\fP\fC\fP)
The generated code "calls" \fCYYFILL\fP(n) when the buffer needs
(re)filling:  at least \fIn\fP additional characters should
be provided. \fCYYFILL\fP(n) should adjust \fCYYCURSOR\fP, \fCYYLIMIT\fP,
\fCYYMARKER\fP and \fCYYCTXMARKER\fP as needed.  Note that for typical 
programming languages \fIn\fP will be the length of the longest keyword plus one.
The user can place a comment of the form \fC/*!max:re2c */\fP once to insert 
a \fCYYMAXFILL\fP(n) definition that is set to the maximum length value. If -1 
switch is used then \fCYYMAXFILL\fP can be triggered only once after the 
last \fC/*!re2c */\fP
block.
.TP
\fCYYGETCONDITION\fP()
This define is used to get the condition prior to entering the scanner code
when using \fB-c\fP switch. The value must be initialized with a value from
the enumeration \fCYYCONDTYPE\fP type.
.TP
\fCYYGETSTATE\fP()
The user only needs to define this macro if the \fB-f\fP flag was specified.
In that case, the generated code "calls" \fCYYGETSTATE\fP() at the very beginning
of the scanner in order to obtain the saved state. \fCYYGETSTATE\fP() must return a signed
integer. The value must be either -1, indicating that the scanner is entered for the
first time, or a value previously saved by \fCYYSETSTATE\fP(s).  In the second case, the
scanner will resume operations right after where the last \fCYYFILL\fP(n) was called.
.TP
\fCYYLIMIT\fP
Expression of type \fC*YYCTYPE\fP that marks the end of the buffer
(\fCYYLIMIT[-1]\fP is the last character in the buffer).
The generated code repeatedly compares \fCYYCURSOR\fP to \fCYYLIMIT\fP
to determine when the buffer needs (re)filling.
.TP
\fCYYMARKER\fP
\*(lx of type \fC*YYCTYPE\fP.
The generated code saves backtracking information in \fCYYMARKER\fP. Some easy
scanners might not use this.
.TP
\fCYYMAXFILL
This will be automatically defined by \fC/*!max:re2c */\fP blocks as explained above.
.TP
\fCYYSETCONDITION(\fP\fIc\fP\fC)\fP
This define is used to set the condition in transition rules.  This is only
being used when \fB-c\fP is active and transition rules are being used.
.TP
\fCYYSETSTATE(\fP\fIs\fP\fC)\fP
The user only needs to define this macro if the \fB-f\fP flag was specified.
In that case, the generated code "calls" \fCYYSETSTATE\fP just before calling
\fCYYFILL\fP(n).  The parameter to \fCYYSETSTATE\fP is a signed integer that uniquely
identifies the specific instance of \fCYYFILL\fP(n) that is about to be called.
Should the user wish to save the state of the scanner and have \fCYYFILL\fP(n) return
to the caller, all he has to do is store that unique identifer in a variable.
Later, when the scannered is called again, it will call \fCYYGETSTATE()\fP and
resume execution right where it left off. The generated code will contain 
both \fCYYSETSTATE\fP(s) and \fCYYGETSTATE\fP even if \fCYYFILL\fP(n) is being
disabled.

.SH "SCANNER WITH STORABLE STATES"
When the \fB-f\fP flag is specified, \*(re generates a scanner that
can store its current state, return to the caller, and later resume
operations exactly where it left off.

The default operation of \*(re is a "pull" model, where the scanner asks
for extra input whenever it needs it. However, this mode of operation
assumes that the scanner is the "owner" the parsing loop, and that may
not always be convenient.

Typically, if there is a preprocessor ahead of the scanner in the stream,
or for that matter any other procedural source of data, the scanner cannot
"ask" for more data unless both scanner and source live in a separate threads.

The \fB-f\fP flag is useful for just this situation : it lets users design
scanners that work in a "push" model, i.e. where data is fed to the scanner
chunk by chunk. When the scanner runs out of data to consume, it just stores
its state, and return to the caller. When more input data is fed to the scanner,
it resumes operations exactly where it left off.

When using the -f option \*(re does not accept stdin because it has to do the 
full generation process twice which means it has to read the input twice. That
means \*(re would fail in case it cannot open the input twice or reading the
input for the first time influences the second read attempt.

Changes needed compared to the "pull" model.

1. User has to supply macros YYSETSTATE() and YYGETSTATE(state)

2. The \fB-f\fP option inhibits declaration of \fIyych\fP and
\fIyyaccept\fP. So the user has to declare these. Also the user has
to save and restore these. In the example \fIexamples/push.re\fP these
are declared as fields of the (C++) class of which the scanner is a
method, so they do not need to be saved/restored explicitly. For C
they could e.g. be made macros that select fields from a structure
passed in as parameter. Alternatively, they could be declared as local
variables, saved with YYFILL(n) when it decides to return and restored
at entry to the function. Also, it could be more efficient to save the
state from YYFILL(n) because YYSETSTATE(state) is called
unconditionally. YYFILL(n) however does not get \fIstate\fP as
parameter, so we would have to store state in a local variable by
YYSETSTATE(state).

3. Modify YYFILL(n) to return (from the function calling it) if more
input is needed.

4. Modify caller to recognise "more input is needed" and respond
appropriately.

5. The generated code will contain a switch block that is used to restores 
the last state by jumping behind the corrspoding YYFILL(n) call. This code is
automatically generated in the epilog of the first "\fC/*!re2c */\fP" block. 
It is possible to trigger generation of the YYGETSTATE() block earlier by 
placing a "\fC/*!getstate:re2c */\fP" comment. This is especially useful when
the scanner code should be wrapped inside a loop.

Please see examples/push.re for push-model scanner. The generated code can be
tweaked using inplace configurations "\fBstate:abort\fP" and "\fBstate:nextlabel\fP".

.SH "SCANNER WITH CONDITION SUPPORT"
You can preceed \*(rxs with a list of condition names when using the \fB-c\fP 
switch. In this case \*(re generates scanner blocks for each conditon. Where each of the
generated blocks has its own precondition. The precondition is given by the 
interface define \fBYYGETCONDITON\fP and must be of type \fBYYCONDTYPE\fP.
.LP
There are two special rule types. First, the rules of the condition '*' are 
merged to all  conditions. And second the empty condition list allows to 
provide a code block that does not have a scanner part. Meaning it does not 
allow any regular expression. The condition value referring to this special 
block is always the one with the enumeration value 0. This way the code of this
special rule can be used to initialize a scanner. It is in no way necessary to
have these rules: but sometimes it is helpful to have a dedicated uninitialized
condition state.
.LP
Non empty rules allow to specify the new condition, which makes them
transition rules. Besides generating calls for the define \fBYYSETCONDTITION\fP
no other special code is generated.
.LP
There is another kind of special rules that allow to prepend code to any code
block of all rules of a certain set of conditions or to all code blocks to all
rules. This can be helpful when some operation is common among rules. For
instance this can be used to store the length of the scanned string. These
special setup rules start with an exclamation mark followed by either a list
of conditions \fB<! condition, ... >\fP or a star \fB<!*>\fP.
When \*(re generates the code for a rule whose state does not have a
setup rule and a star'd setup rule is present, than that code will be used
as setup code.

.SH "SCANNER SPECIFICATIONS"
Each scanner specification consists of a set of \fIrules\fP, \fInamed
definitions\fP and \fIconfigurations\fP.
.LP
\fIRules\fP consist of a \*(rx along with a block of C/C++ code that
is to be executed when the associated \fI\*(rx\fP is matched. You can either
start the code with an opening curly brace or the sequence '\fB:=\fP'. When
the code with a curly brace then \*(re counts the brace depth and stops looking
for code automatically. Otherwise curly braces are not allowed and \*(re stops
looking for code at the first line that does not begin with whitespace.
.P
.RS
\fI\*(rx\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fI\*(rx\fP \fC:=\fP \fIC/C++ code\fP
.RE
.P
If \fB-c\fP is active then each \*(rx is preceeded by a list of 
comma separated condition names. Besides normal naming rules there are two 
special cases. A rule may contain the single condition name '*' and no contition 
name at all. In the latter case the rule cannot have a \*(rx. Non 
empty rules may further more specify the new condition. In that case \*(re will
generated the necessary code to chnage the condition automatically. Just as above
code can be started with a curly brace of the sequence '\fB:=\fP'. Further more
rules can use ':=>' as a shortcut to automatically generate code that not only
sets the new condition state but also continues execution with the new state. A
shortcut rule should not be used in a loop where there is code between the start
of the loop and the \*(re block unless \fIre2c:cond:goto\fP is changed
to '\fIcontinue;\fP'. If code is necessary before all rule (though not simple
jumps) you can doso by using <! pseudo-rules.
.P
.RS
\fC<\fP\fIcondition-list\fP\fC>\fP \fI\*(rx\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<\fP\fIcondition-list\fP\fC>\fP \fI\*(rx\fP \fC:=\fP \fIC/C++ code\fP
.P
\fC<\fP\fIcondition-list\fP\fC>\fP \fI\*(rx\fP \fC=>\fP \fP\fIcondition\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<\fP\fIcondition-list\fP\fC>\fP \fI\*(rx\fP \fC=>\fP \fP\fIcondition\fP \fC:=\fP \fIC/C++ code\fP
.P
\fC<\fP\fIcondition-list\fP\fC>\fP \fI\*(rx\fP \fC:=>\fP \fP\fIcondition\fP
.P
\fC<\fP\fI*\fP\fC>\fP \fI\*(rx\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<\fP\fI*\fP\fC>\fP \fI\*(rx\fP \fC:=\fP \fIC/C++ code\fP
.P
\fC<\fP\fI*\fP\fC>\fP \fI\*(rx\fP \fC=>\fP \fP\fIcondition\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<\fP\fI*\fP\fC>\fP \fI\*(rx\fP \fC=>\fP \fP\fIcondition\fP \fC:=\fP \fIC/C++ code\fP
.P
\fC<\fP\fI*\fP\fC>\fP \fI\*(rx\fP \fC:=>\fP \fP\fIcondition\fP
.P
\fC<>\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<>\fP \fC:=\fP \fIC/C++ code\fP
.P
\fC<>\fP \fC=>\fP \fP\fIcondition\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<>\fP \fC=>\fP \fP\fIcondition\fP \fC:=\fP \fIC/C++ code\fP
.P
\fC<>\fP \fC:=>\fP \fP\fIcondition\fP
.P
\fC<!\fIcondition-list\fP\fC>\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<!\fIcondition-list\fP\fC>\fP \fC:=\fP \fIC/C++ code\fP
.P
\fC<!*>\fP \fC{\fP \fIC/C++ code\fP \fC}\fP
.P
\fC<!*>\fP \fC:=\fP \fIC/C++ code\fP
.RE
.LP
Named definitions are of the form:
.P
.RS
\fIname\fP \fC=\fP \fI\*(rx\fP\fC;\fP
.RE
.LP
\fB-F\fP is active, then named definitions are also of the form:
.P
.RS
\fIname\fP \fI\*(rx\fP
.RE
.LP
Configurations look like named definitions whose names start 
with "\fBre2c:\fP":
.P
.RS
\fCre2c:\fP\fIname\fP \fC=\fP \fIvalue\fP\fC;\fP
.RE
.RS
\fCre2c:\fP\fIname\fP \fC=\fP \fB"\fP\fIvalue\fP\fB"\fP\fC;\fP
.RE

.SH "SUMMARY OF RE2C REGULAR-EXPRESSIONS"
.TP
\fC"foo"\fP
the literal string \fCfoo\fP.
ANSI-C escape sequences can be used.
.TP
\fC'foo'\fP
the literal string \fCfoo\fP (characters [a-zA-Z] treated case-insensitive).
ANSI-C escape sequences can be used.
.TP
\fC[xyz]\fP
a "character class"; in this case,
the \*(rx matches either an '\fCx\fP', a '\fCy\fP', or a '\fCz\fP'.
.TP
\fC[abj-oZ]\fP
a "character class" with a range in it;
matches an '\fCa\fP', a '\fCb\fP', any letter from '\fCj\fP' through '\fCo\fP',
or a '\fCZ\fP'.
.TP
\fC[^\fIclass\fP\fC]\fP
an inverted "character class".
.TP
\fIr\fP\fC\e\fP\fIs\fP
match any \fIr\fP which isn't an \fIs\fP. \fIr\fP and \fIs\fP must be \*(rxs
which can be expressed as character classes.
.TP
\fIr\fP\fC*\fP
zero or more \fIr\fP's, where \fIr\fP is any \*(rx
.TP
\fC\fIr\fP\fC+\fP
one or more \fIr\fP's
.TP
\fC\fIr\fP\fC?\fP
zero or one \fIr\fP's (that is, "an optional \fIr\fP")
.TP
name
the expansion of the "named definition" (see above)
.TP
\fC(\fP\fIr\fP\fC)\fP
an \fIr\fP; parentheses are used to override precedence
(see below)
.TP
\fIrs\fP
an \fIr\fP followed by an \fIs\fP ("concatenation")
.TP
\fIr\fP\fC|\fP\fIs\fP
either an \fIr\fP or an \fIs\fP
.TP
\fIr\fP\fC/\fP\fIs\fP
an \fIr\fP but only if it is followed by an \fIs\fP. The \fIs\fP is not part of
the matched text. This type of \*(rx is called "trailing context". A trailing
context can only be the end of a rule and not part of a named definition.
.TP
\fIr\fP\fC{\fP\fIn\fP\fC}\fP
matches \fIr\fP exactly \fIn\fP times.
.TP
\fIr\fP\fC{\fP\fIn\fP\fC,}\fP
matches \fIr\fP at least \fIn\fP times.
.TP
\fIr\fP\fC{\fP\fIn\fP\fC,\fP\fIm\fP\fC}\fP
matches \fIr\fP at least \fIn\fP but not more than \fIm\fP times.
.TP
\fC.\fP
match any character except newline (\\n).
.TP
\fIdef\fP
matches named definition as specified by \fIdef\fP only if \fB-F\fP is
off. If the switch \fB-F\fP is active then this behaves like it was enclosed
in double quotes and matches the string \fIdef\fP.
.LP
Character classes and string literals may contain octoal or hexadecimal 
character definitions and the following set of escape sequences (\fB\\n\fP,
 \fB\\t\fP, \fB\\v\fP, \fB\\b\fP, \fB\\r\fP, \fB\\f\fP, \fB\\a\fP, \fB\\\\\fP).
An octal character is defined by a backslash followed by its three octal digits
and a hexadecimal character is defined by backslash, a lower cased '\fBx\fP' 
and its two hexadecimal digits or a backslash, an upper cased \fBX\fP and its 
four hexadecimal digits.
.LP
\*(re further more supports the c/c++ unicode notation. That is a backslash followed
by either a lowercased \fBu\fP and its four hexadecimal digits or an uppercased 
\fBU\fP and its eight hexadecimal digits. However only in \fB-u\fP mode the
generated code can deal with any valid Unicode character up to 0x10FFFF.
.LP
Since characters greater \fB\\X00FF\fP are not allowed in non unicode mode, the 
only portable "\fBany\fP" rules are \fB(.|"\\n")\fP and \fB[^]\fP.
.LP
The \*(rxs listed above are grouped according to
precedence, from highest precedence at the top to lowest at the bottom.
Those grouped together have equal precedence.

.SH "INPLACE CONFIGURATION"
.LP
It is possible to configure code generation inside \*(re blocks. The following
lists the available configurations:
.TP
\fIre2c:condprefix\fP \fB=\fP yyc_ \fB;\fP
Allows to specify the prefix used for condition labels. That is this text is 
prepended to any condition label in the generated output file.
.TP
\fIre2c:condenumprefix\fP \fB=\fP yyc \fB;\fP
Allows to specify the prefix used for condition values. That is this text is 
prepended to any condition enum value in the generated output file. 
.TP
\fIre2c:cond:divider\fP \fB=\fP "/* *********************************** */" \fB;\fP
Allows to customize the devider for condition blocks. You can use '@@' to 
put the name of the condition or ustomize the plaeholder
using \fIre2c:cond:divider@cond\fP.
.TP
\fIre2c:cond:divider@cond\fP \fB=\fP @@ \fB;\fP
Specifies the placeholder that will be replaced with the condition name
in \fIre2c:cond:divider\fP.
.TP
\fIre2c:cond:goto\fP \fB=\fP "goto @@;" \fB;\fP
Allows to customize the condition goto statements used with ':=>' style rules.
You can use '@@' to put the name of the condition or ustomize the plaeholder
using \fIre2c:cond:goto@cond\fP. You can also change this to 'continue;',
which would allow you to continue with the next loop cycle including any code
between loop start and re2c block.
.TP
\fIre2c:cond:goto@cond\fP \fB=\fP @@ \fB;\fP
Spcifies the placeholder that will be replaced with the condition label
in \fIre2c:cond:goto\fP.
.TP
\fIre2c:indent:top\fP \fB=\fP 0 \fB;\fP
Specifies the minimum number of indendation to use. Requires a numeric value 
greater than or equal zero.
.TP
\fIre2c:indent:string\fP \fB=\fP "\\t" \fB;\fP
Specifies the string to use for indendation. Requires a string that should 
contain only whitespace unless you need this for external tools. The easiest 
way to specify spaces is to enclude them in single or double quotes. If you do 
not want any indendation at all you can simply set this to \fB""\fP.
.TP
\fIre2c:yych:conversion\fP \fB=\fP 0 \fB;\fP
When this setting is non zero, then \*(re automatically generates conversion 
code whenever yych gets read. In this case the type must be defined using
\fBre2c:define:YYCTYPE\fP.
.TP
\fIre2c:yych:emit\fP \fB=\fP 1 \fB;\fP
Generation of \fByych\fP can be suppressed by setting this to 0.
.TP
\fIre2c:yybm:hex\fP \fB=\fP 0 \fB;\fP
If set to zero then a decimal table is being used else a hexadecimal table 
will be generated.
.TP
\fIre2c:yyfill:enable\fP \fB=\fP 1 \fB;\fP
Set this to zero to suppress generation of YYFILL(n). When using this be sure
to verify that the generated scanner does not read behind input. Allowing
this behavior might introduce sever security issues to you programs.
.TP
\fIre2c:yyfill:check\fP \fB=\fP 1 \fB;\fP
This can be set 0 to suppress output of the pre condition using YYCURSOR and
YYLIMIT which becomes usefull when YYLIMIT + max(YYFILL) is always accessible.
.TP
\fIre2c:yyfill:parameter\fP \fB=\fP 1 \fB;\fP
Allows to suppress parameter passing to \fBYYFILL\fP calls. If set to zero 
then no parameter is passed to \fBYYFILL\fP. However \fBdefine:YYFILL@LEN\fP
allows to specify a replacement string for the actual length value. If set to
a non zero value then \fBYYFILL\fP usage will be followed by the number of 
requested characters in braces unless \fBre2c:define:YYFILL:naked\fP is set. 
Also look at \fBre2c:define:YYFILL:naked\fP and \fBre2c:define:YYFILL@LEN\fP.
.TP
\fIre2c:startlabel\fP \fB=\fP 0 \fB;\fP
If set to a non zero integer then the start label of the next scanner blocks 
will be generated even if not used by the scanner itself. Otherwise the normal 
\fByy0\fP like start label is only being generated if needed. If set to a text 
value then a label with that text will be generated regardless of whether the 
normal start label is being used or not. This setting is being reset to \fB0\fP
after a start label has been generated.
.TP
\fIre2c:labelprefix\fP \fB=\fP yy \fB;\fP
Allows to change the prefix of numbered labels. The default is \fByy\fP and
can be set any string that is a valid label.
.TP
\fIre2c:state:abort\fP \fB=\fP 0 \fB;\fP
When not zero and switch -f is active then the \fCYYGETSTATE\fP block will 
contain a default case that aborts and a -1 case is used for initialization.
.TP
\fIre2c:state:nextlabel\fP \fB=\fP 0 \fB;\fP
Used when -f is active to control whether the \fCYYGETSTATE\fP block is 
followed by a \fCyyNext:\fP label line. Instead of using \fCyyNext\fP you can 
usually also use configuration \fIstartlabel\fP to force a specific start label
or default to \fCyy0\fP as start label. Instead of using a dedicated label it 
is often better to separate the YYGETSTATE code from the actual scanner code by
placing a "\fC/*!getstate:re2c */\fP" comment.
.TP
\fIre2c:cgoto:threshold\fP \fB=\fP 9 \fB;\fP
When -g is active this value specifies the complexity threshold that triggers
generation of jump tables rather than using nested if's and decision bitfields.
The threshold is compared against a calculated estimation of if-s needed where 
every used bitmap divides the threshold by 2.
.TP
\fIre2c:yych:conversion\fP \fB=\fP 0 \fB;\fP
When the input uses signed characters and \fB-s\fP or \fB-b\fP switches are 
in effect re2c allows to automatically convert to the unsigned character type 
that is then necessary for its internal single character. When this setting 
is zero or an empty string the conversion is disabled. Using a non zero number
the conversion is taken from \fBYYCTYPE\fP. If that is given by an inplace 
configuration that value is being used. Otherwise it will be \fB(YYCTYPE)\fP 
and changes to that configuration are  no longer possible. When this setting is
a string the braces must be specified. Now assuming your input is a \fBchar*\fP
buffer and you are using above mentioned switches you can set \fBYYCTYPE\fP to
\fBunsigned char\fP and this setting to either \fB1\fP or \fB"(unsigned char)"\fP.
.TP
\fIre2c:define:define:YYCONDTYPE\fP \fB=\fP YYCONDTYPE \fB;\fP
Enumeration used for condition support with \fB-c\fP mode.
.TP
\fIre2c:define:YYCTXMARKER\fP \fB=\fP YYCTXMARKER \fB;\fP
Allows to overwrite the define YYCTXMARKER and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYCTYPE\fP \fB=\fP YYCTYPE \fB;\fP
Allows to overwrite the define YYCTYPE and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYCURSOR\fP \fB=\fP YYCURSOR \fB;\fP
Allows to overwrite the define YYCURSOR and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYDEBUG\fP \fB=\fP YYDEBUG \fB;\fP
Allows to overwrite the define \fBYYDEBUG\fP and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYFILL\fP \fB=\fP YYFILL \fB;\fP
Allows to overwrite the define \fBYYFILL\fP and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYFILL:naked\fP \fB=\fP 0 \fB;\fP
When set to 1 neither braces, parameter nor semicolon gets emitted.
.TP
\fIre2c:define:YYFILL@len\fP \fB=\fP @@ \fB;\fP
When using \fIre2c:define:YYFILL\fP and \fIre2c:yyfill:parameter\fP is 0 then
any occurence of this text inside \fBYYFILL\fP will be replaced with the actual
length value.
.TP
\fIre2c:define:YYGETCONDITION\fP \fB=\fP YYGETCONDITION \fB;\fP
Allows to overwrite the define \fBYYGETCONDITION\fP.
.TP
\fIre2c:define:YYGETCONDITION:naked\fP \fB=\fP  \fB;\fP
When set to 1 neither braces, parameter nor semicolon gets emitted.
.TP
\fIre2c:define:YYGETSTATE\fP \fB=\fP YYGETSTATE \fB;\fP
Allows to overwrite the define \fBYYGETSTATE\fP and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYGETSTATE:naked\fP \fB=\fP 0 \fB;\fP
When set to 1 neither braces, parameter nor semicolon gets emitted.
.TP
\fIre2c:define:YYLIMIT\fP \fB=\fP YYLIMIT \fB;\fP
Allows to overwrite the define \fBYYLIMIT\fP and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYMARKER\fP \fB=\fP YYMARKER \fB;\fP
Allows to overwrite the define \fBYYMARKER\fP and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYSETCONDITION\fP \fB=\fP YYSETCONDITION \fB;\fP
Allows to overwrite the define \fBYYSETCONDITION\fP.
.TP
\fIre2c:define:YYSETCONDITION@cond\fP \fB=\fP @@ \fB;\fP
When using \fIre2c:define:YYSETCONDITION\fP then any occurence of this text 
inside \fBYYSETCONDITION\fP will be replaced with the actual new condition value.
.TP
\fIre2c:define:YYSETSTATE\fP \fB=\fP YYSETSTATE \fB;\fP
Allows to overwrite the define \fBYYSETSTATE\fP and thus avoiding it by setting the
value to the actual code needed.
.TP
\fIre2c:define:YYSETSTATE:naked\fP \fB=\fP 0 \fB;\fP
When set to 1 neither braces, parameter nor semicolon gets emitted.
.TP
\fIre2c:define:YYSETSTATE@state\fP \fB=\fP @@ \fB;\fP
When using \fIre2c:define:YYSETSTATE\fP then any occurence of this text 
inside \fBYYSETSTATE\fP will be replaced with the actual new state value.
.TP
\fIre2c:label:yyFillLabel\fP \fB=\fP yyFillLabel \fB;\fP
Allows to overwrite the name of the label yyFillLabel.
.TP
\fIre2c:label:yyNext\fP \fB=\fP yyNext \fB;\fP
Allows to overwrite the name of the label yyNext.
.TP
\fIre2c:variable:yyaccept\fP \fB=\fP yyaccept \fB;\fP
Allows to overwrite the name of the variable yyaccept.
.TP
\fIre2c:variable:yybm\fP \fB=\fP yybm \fB;\fP
Allows to overwrite the name of the variable yybm.
.TP
\fIre2c:variable:yych\fP \fB=\fP yych \fB;\fP
Allows to overwrite the name of the variable yych.
.TP
\fIre2c:variable:yyctable\fP \fB=\fP yyctable \fB;\fP
When both \fB-c\fP and \fB-g\fP are active then \*(re uses this variable to 
generate a static jump table for YYGETCONDITION.
.TP
\fIre2c:variable:yystable\fP \fB=\fP yystable \fB;\fP
When both \fB-f\fP and \fB-g\fP are active then \*(re uses this variable to 
generate a static jump table for YYGETSTATE.
.TP
\fIre2c:variable:yytarget\fP \fB=\fP yytarget \fB;\fP
Allows to overwrite the name of the variable yytarget.

.SH "UNDERSTANDING RE2C"
.LP
The subdirectory lessons of the \*(re distribution contains a few step by step
lessons to get you started with \*(re. All examples in the lessons subdirectory
can be compiled and actually work.

.SH FEATURES
.LP
\*(re does not provide a default action:
the generated code assumes that the input
will consist of a sequence of tokens.
Typically this can be dealt with by adding a rule such as the one for
unexpected characters in the example above.
.LP
The user must arrange for a sentinel token to appear at the end of input
(and provide a rule for matching it):
\*(re does not provide an \fC<<EOF>>\fP expression.
If the source is from a null-byte terminated string, a
rule matching a null character will suffice.  If the source is from a
file then you could pad the input with a newline (or some other character that 
cannot appear within another token); upon recognizing such a character check 
to see if it is the sentinel and act accordingly. And you can also use YYFILL(n)
to end the scanner in case not enough characters are available which is nothing
else then e detection of end of data/file.

.SH BUGS
.LP
Difference only works for character sets.
.LP
The \*(re internal algorithms need documentation.

.SH "SEE ALSO"
.LP
flex(1), lex(1).
.P
More information on \*(re can be found here:
.PD 0
.P
.B http://re2c.org/
.PD 1

.SH AUTHORS
.PD 0
.P
Peter Bumbulis <peter@csg.uwaterloo.ca>
.P
Brian Young <bayoung@acm.org>
.P
Dan Nuffer <nuffer@users.sourceforge.net>
.P
Marcus Boerger <helly@users.sourceforge.net>
.P
Hartmut Kaiser <hkaiser@users.sourceforge.net>
.P
Emmanuel Mogenet <mgix@mgix.com> added storable state
.P
.PD 1

.SH VERSION INFORMATION
This manpage describes \*(re, version @PACKAGE_VERSION@.

.fi