summaryrefslogtreecommitdiff
path: root/doc/html/algorithm/Searching.html
blob: a9e743a1393a00ebae79057105790f7b4fbe2011 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN" "http://www.w3.org/TR/html4/loose.dtd">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=US-ASCII">
<title>Searching Algorithms</title>
<link rel="stylesheet" href="../../../doc/src/boostbook.css" type="text/css">
<meta name="generator" content="DocBook XSL Stylesheets V1.79.1">
<link rel="home" href="../index.html" title="The Boost C++ Libraries BoostBook Documentation Subset">
<link rel="up" href="../algorithm.html" title="Chapter&#160;3.&#160;The Boost Algorithm Library">
<link rel="prev" href="../algorithm.html" title="Chapter&#160;3.&#160;The Boost Algorithm Library">
<link rel="next" href="CXX11.html" title="C++11 Algorithms">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table cellpadding="2" width="100%"><tr>
<td valign="top"><img alt="Boost C++ Libraries" width="277" height="86" src="../../../boost.png"></td>
<td align="center"><a href="../../../index.html">Home</a></td>
<td align="center"><a href="../../../libs/libraries.htm">Libraries</a></td>
<td align="center"><a href="http://www.boost.org/users/people.html">People</a></td>
<td align="center"><a href="http://www.boost.org/users/faq.html">FAQ</a></td>
<td align="center"><a href="../../../more/index.htm">More</a></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../algorithm.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../algorithm.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="CXX11.html"><img src="../../../doc/src/images/next.png" alt="Next"></a>
</div>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="algorithm.Searching"></a><a class="link" href="Searching.html" title="Searching Algorithms">Searching Algorithms</a>
</h2></div></div></div>
<div class="toc"><dl class="toc">
<dt><span class="section"><a href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore">Boyer-Moore
      Search</a></span></dt>
<dt><span class="section"><a href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool">Boyer-Moore-Horspool
      Search</a></span></dt>
<dt><span class="section"><a href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt">Knuth-Morris-Pratt
      Search</a></span></dt>
</dl></div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="the_boost_algorithm_library.Searching.BoyerMoore"></a><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore" title="Boyer-Moore Search">Boyer-Moore
      Search</a>
</h3></div></div></div>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h0"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.overview"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.overview">Overview</a>
      </h5>
<p>
        The header file 'boyer_moore.hpp' contains an implementation of the Boyer-Moore
        algorithm for searching sequences of values.
      </p>
<p>
        The Boyer&#8211;Moore string search algorithm is a particularly efficient string
        searching algorithm, and it has been the standard benchmark for the practical
        string search literature. The Boyer-Moore algorithm was invented by Bob Boyer
        and J. Strother Moore, and published in the October 1977 issue of the Communications
        of the ACM , and a copy of that article is available at <a href="http://www.cs.utexas.edu/~moore/publications/fstrpos.pdf" target="_top">http://www.cs.utexas.edu/~moore/publications/fstrpos.pdf</a>.
      </p>
<p>
        The Boyer-Moore algorithm uses two precomputed tables to give better performance
        than a naive search. These tables depend on the pattern being searched for,
        and give the Boyer-Moore algorithm larger a memory footprint and startup
        costs than a simpler algorithm, but these costs are recovered quickly during
        the searching process, especially if the pattern is longer than a few elements.
      </p>
<p>
        However, the Boyer-Moore algorithm cannot be used with comparison predicates
        like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>.
      </p>
<p>
        Nomenclature: I refer to the sequence being searched for as the "pattern",
        and the sequence being searched in as the "corpus".
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h1"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.interface"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.interface">Interface</a>
      </h5>
<p>
        For flexibility, the Boyer-Moore algorithm has two interfaces; an object-based
        interface and a procedural one. The object-based interface builds the tables
        in the constructor, and uses operator () to perform the search. The procedural
        interface builds the table and does the search all in one step. If you are
        going to be searching for the same pattern in multiple corpora, then you
        should use the object interface, and only build the tables once.
      </p>
<p>
        Here is the object interface:
</p>
<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">&gt;</span>
<span class="keyword">class</span> <span class="identifier">boyer_moore</span> <span class="special">{</span>
<span class="keyword">public</span><span class="special">:</span>
    <span class="identifier">boyer_moore</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span>
    <span class="special">~</span><span class="identifier">boyer_moore</span> <span class="special">();</span>

    <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
    <span class="identifier">corpusIter</span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span>
    <span class="special">};</span>
</pre>
<p>
      </p>
<p>
        and here is the corresponding procedural interface:
      </p>
<p>
</p>
<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
<span class="identifier">corpusIter</span> <span class="identifier">boyer_moore_search</span> <span class="special">(</span>
        <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span>
        <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span>
</pre>
<p>
      </p>
<p>
        Each of the functions is passed two pairs of iterators. The first two define
        the corpus and the second two define the pattern. Note that the two pairs
        need not be of the same type, but they do need to "point" at the
        same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code>
        and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type.
      </p>
<p>
        The return value of the function is an iterator pointing to the start of
        the pattern in the corpus. If the pattern is not found, it returns the end
        of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>).
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h2"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.performance"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.performance">Performance</a>
      </h5>
<p>
        The execution time of the Boyer-Moore algorithm, while still linear in the
        size of the string being searched, can have a significantly lower constant
        factor than many other search algorithms: it doesn't need to check every
        character of the string to be searched, but rather skips over some of them.
        Generally the algorithm gets faster as the pattern being searched for becomes
        longer. Its efficiency derives from the fact that with each unsuccessful
        attempt to find a match between the search string and the text it is searching,
        it uses the information gained from that attempt to rule out as many positions
        of the text as possible where the string cannot match.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h3"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.memory_use"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.memory_use">Memory
        Use</a>
      </h5>
<p>
        The algorithm allocates two internal tables. The first one is proportional
        to the length of the pattern; the second one has one entry for each member
        of the "alphabet" in the pattern. For (8-bit) character types,
        this table contains 256 entries.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h4"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.complexity"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.complexity">Complexity</a>
      </h5>
<p>
        The worst-case performance to find a pattern in the corpus is <span class="emphasis"><em>O(N)</em></span>
        (linear) time; that is, proportional to the length of the corpus being searched.
        In general, the search is sub-linear; not every entry in the corpus need
        be checked.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h5"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.exception_safety"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.exception_safety">Exception
        Safety</a>
      </h5>
<p>
        Both the object-oriented and procedural versions of the Boyer-Moore algorithm
        take their parameters by value and do not use any information other than
        what is passed in. Therefore, both interfaces provide the strong exception
        guarantee.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h6"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.notes"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.notes">Notes</a>
      </h5>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
            When using the object-based interface, the pattern must remain unchanged
            for during the searches; i.e, from the time the object is constructed
            until the final call to operator () returns.
          </li>
<li class="listitem">
            The Boyer-Moore algorithm requires random-access iterators for both the
            pattern and the corpus.
          </li>
</ul></div>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMoore.h7"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMoore.customization_points"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMoore.customization_points">Customization
        points</a>
      </h5>
<p>
        The Boyer-Moore object takes a traits template parameter which enables the
        caller to customize how one of the precomputed tables is stored. This table,
        called the skip table, contains (logically) one entry for every possible
        value that the pattern can contain. When searching 8-bit character data,
        this table contains 256 elements. The traits class defines the table to be
        used.
      </p>
<p>
        The default traits class uses a <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">array</span></code>
        for small 'alphabets' and a <code class="computeroutput"><span class="identifier">tr1</span><span class="special">::</span><span class="identifier">unordered_map</span></code>
        for larger ones. The array-based skip table gives excellent performance,
        but could be prohibitively large when the 'alphabet' of elements to be searched
        grows. The unordered_map based version only grows as the number of unique
        elements in the pattern, but makes many more heap allocations, and gives
        slower lookup performance.
      </p>
<p>
        To use a different skip table, you should define your own skip table object
        and your own traits class, and use them to instantiate the Boyer-Moore object.
        The interface to these objects is described TBD.
      </p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool"></a><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool" title="Boyer-Moore-Horspool Search">Boyer-Moore-Horspool
      Search</a>
</h3></div></div></div>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h0"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.overview">Overview</a>
      </h5>
<p>
        The header file 'boyer_moore_horspool.hpp' contains an implementation of
        the Boyer-Moore-Horspool algorithm for searching sequences of values.
      </p>
<p>
        The Boyer-Moore-Horspool search algorithm was published by Nigel Horspool
        in 1980. It is a refinement of the Boyer-Moore algorithm that trades space
        for time. It uses less space for internal tables than Boyer-Moore, and has
        poorer worst-case performance.
      </p>
<p>
        The Boyer-Moore-Horspool algorithm cannot be used with comparison predicates
        like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h1"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.interface">Interface</a>
      </h5>
<p>
        Nomenclature: I refer to the sequence being searched for as the "pattern",
        and the sequence being searched in as the "corpus".
      </p>
<p>
        For flexibility, the Boyer-Moore-Horspool algorithm has two interfaces; an
        object-based interface and a procedural one. The object-based interface builds
        the tables in the constructor, and uses operator () to perform the search.
        The procedural interface builds the table and does the search all in one
        step. If you are going to be searching for the same pattern in multiple corpora,
        then you should use the object interface, and only build the tables once.
      </p>
<p>
        Here is the object interface:
</p>
<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">&gt;</span>
<span class="keyword">class</span> <span class="identifier">boyer_moore_horspool</span> <span class="special">{</span>
<span class="keyword">public</span><span class="special">:</span>
    <span class="identifier">boyer_moore_horspool</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span>
    <span class="special">~</span><span class="identifier">boyer_moore_horspool</span> <span class="special">();</span>

    <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
    <span class="identifier">corpusIter</span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span>
    <span class="special">};</span>
</pre>
<p>
      </p>
<p>
        and here is the corresponding procedural interface:
      </p>
<p>
</p>
<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
<span class="identifier">corpusIter</span> <span class="identifier">boyer_moore_horspool_search</span> <span class="special">(</span>
        <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span>
        <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span>
</pre>
<p>
      </p>
<p>
        Each of the functions is passed two pairs of iterators. The first two define
        the corpus and the second two define the pattern. Note that the two pairs
        need not be of the same type, but they do need to "point" at the
        same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code>
        and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type.
      </p>
<p>
        The return value of the function is an iterator pointing to the start of
        the pattern in the corpus. If the pattern is not found, it returns the end
        of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>).
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h2"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.performance">Performance</a>
      </h5>
<p>
        The execution time of the Boyer-Moore-Horspool algorithm is linear in the
        size of the string being searched; it can have a significantly lower constant
        factor than many other search algorithms: it doesn't need to check every
        character of the string to be searched, but rather skips over some of them.
        Generally the algorithm gets faster as the pattern being searched for becomes
        longer. Its efficiency derives from the fact that with each unsuccessful
        attempt to find a match between the search string and the text it is searching,
        it uses the information gained from that attempt to rule out as many positions
        of the text as possible where the string cannot match.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h3"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.memory_use">Memory
        Use</a>
      </h5>
<p>
        The algorithm an internal table that has one entry for each member of the
        "alphabet" in the pattern. For (8-bit) character types, this table
        contains 256 entries.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h4"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.complexity">Complexity</a>
      </h5>
<p>
        The worst-case performance is <span class="emphasis"><em>O(m x n)</em></span>, where <span class="emphasis"><em>m</em></span>
        is the length of the pattern and <span class="emphasis"><em>n</em></span> is the length of
        the corpus. The average time is <span class="emphasis"><em>O(n)</em></span>. The best case
        performance is sub-linear, and is, in fact, identical to Boyer-Moore, but
        the initialization is quicker and the internal loop is simpler than Boyer-Moore.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h5"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.exception_safety">Exception
        Safety</a>
      </h5>
<p>
        Both the object-oriented and procedural versions of the Boyer-Moore-Horspool
        algorithm take their parameters by value and do not use any information other
        than what is passed in. Therefore, both interfaces provide the strong exception
        guarantee.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h6"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.notes">Notes</a>
      </h5>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
            When using the object-based interface, the pattern must remain unchanged
            for during the searches; i.e, from the time the object is constructed
            until the final call to operator () returns.
          </li>
<li class="listitem">
            The Boyer-Moore-Horspool algorithm requires random-access iterators for
            both the pattern and the corpus.
          </li>
</ul></div>
<h5>
<a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.h7"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.BoyerMooreHorspool.customization_points">Customization
        points</a>
      </h5>
<p>
        The Boyer-Moore-Horspool object takes a traits template parameter which enables
        the caller to customize how the precomputed table is stored. This table,
        called the skip table, contains (logically) one entry for every possible
        value that the pattern can contain. When searching 8-bit character data,
        this table contains 256 elements. The traits class defines the table to be
        used.
      </p>
<p>
        The default traits class uses a <code class="computeroutput"><span class="identifier">boost</span><span class="special">::</span><span class="identifier">array</span></code>
        for small 'alphabets' and a <code class="computeroutput"><span class="identifier">tr1</span><span class="special">::</span><span class="identifier">unordered_map</span></code>
        for larger ones. The array-based skip table gives excellent performance,
        but could be prohibitively large when the 'alphabet' of elements to be searched
        grows. The unordered_map based version only grows as the number of unique
        elements in the pattern, but makes many more heap allocations, and gives
        slower lookup performance.
      </p>
<p>
        To use a different skip table, you should define your own skip table object
        and your own traits class, and use them to instantiate the Boyer-Moore-Horspool
        object. The interface to these objects is described TBD.
      </p>
</div>
<div class="section">
<div class="titlepage"><div><div><h3 class="title">
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt"></a><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt" title="Knuth-Morris-Pratt Search">Knuth-Morris-Pratt
      Search</a>
</h3></div></div></div>
<h5>
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h0"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.overview"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.overview">Overview</a>
      </h5>
<p>
        The header file 'knuth_morris_pratt.hpp' contains an implementation of the
        Knuth-Morris-Pratt algorithm for searching sequences of values.
      </p>
<p>
        The basic premise of the Knuth-Morris-Pratt algorithm is that when a mismatch
        occurs, there is information in the pattern being searched for that can be
        used to determine where the next match could begin, enabling the skipping
        of some elements of the corpus that have already been examined.
      </p>
<p>
        It does this by building a table from the pattern being searched for, with
        one entry for each element in the pattern.
      </p>
<p>
        The algorithm was conceived in 1974 by Donald Knuth and Vaughan Pratt, and
        independently by James H. Morris. The three published it jointly in 1977
        in the SIAM Journal on Computing <a href="http://citeseer.ist.psu.edu/context/23820/0" target="_top">http://citeseer.ist.psu.edu/context/23820/0</a>
      </p>
<p>
        However, the Knuth-Morris-Pratt algorithm cannot be used with comparison
        predicates like <code class="computeroutput"><span class="identifier">std</span><span class="special">::</span><span class="identifier">search</span></code>.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h1"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.interface"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.interface">Interface</a>
      </h5>
<p>
        Nomenclature: I refer to the sequence being searched for as the "pattern",
        and the sequence being searched in as the "corpus".
      </p>
<p>
        For flexibility, the Knuth-Morris-Pratt algorithm has two interfaces; an
        object-based interface and a procedural one. The object-based interface builds
        the table in the constructor, and uses operator () to perform the search.
        The procedural interface builds the table and does the search all in one
        step. If you are going to be searching for the same pattern in multiple corpora,
        then you should use the object interface, and only build the tables once.
      </p>
<p>
        Here is the object interface:
</p>
<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">&gt;</span>
<span class="keyword">class</span> <span class="identifier">knuth_morris_pratt</span> <span class="special">{</span>
<span class="keyword">public</span><span class="special">:</span>
    <span class="identifier">knuth_morris_pratt</span> <span class="special">(</span> <span class="identifier">patIter</span> <span class="identifier">first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">last</span> <span class="special">);</span>
    <span class="special">~</span><span class="identifier">knuth_morris_pratt</span> <span class="special">();</span>

    <span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
    <span class="identifier">corpusIter</span> <span class="keyword">operator</span> <span class="special">()</span> <span class="special">(</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span> <span class="special">);</span>
    <span class="special">};</span>
</pre>
<p>
      </p>
<p>
        and here is the corresponding procedural interface:
      </p>
<p>
</p>
<pre class="programlisting"><span class="keyword">template</span> <span class="special">&lt;</span><span class="keyword">typename</span> <span class="identifier">patIter</span><span class="special">,</span> <span class="keyword">typename</span> <span class="identifier">corpusIter</span><span class="special">&gt;</span>
<span class="identifier">corpusIter</span> <span class="identifier">knuth_morris_pratt_search</span> <span class="special">(</span>
        <span class="identifier">corpusIter</span> <span class="identifier">corpus_first</span><span class="special">,</span> <span class="identifier">corpusIter</span> <span class="identifier">corpus_last</span><span class="special">,</span>
        <span class="identifier">patIter</span> <span class="identifier">pat_first</span><span class="special">,</span> <span class="identifier">patIter</span> <span class="identifier">pat_last</span> <span class="special">);</span>
</pre>
<p>
      </p>
<p>
        Each of the functions is passed two pairs of iterators. The first two define
        the corpus and the second two define the pattern. Note that the two pairs
        need not be of the same type, but they do need to "point" at the
        same type. In other words, <code class="computeroutput"><span class="identifier">patIter</span><span class="special">::</span><span class="identifier">value_type</span></code>
        and <code class="computeroutput"><span class="identifier">curpusIter</span><span class="special">::</span><span class="identifier">value_type</span></code> need to be the same type.
      </p>
<p>
        The return value of the function is an iterator pointing to the start of
        the pattern in the corpus. If the pattern is not found, it returns the end
        of the corpus (<code class="computeroutput"><span class="identifier">corpus_last</span></code>).
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h2"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.performance"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.performance">Performance</a>
      </h5>
<p>
        The execution time of the Knuth-Morris-Pratt algorithm is linear in the size
        of the string being searched. Generally the algorithm gets faster as the
        pattern being searched for becomes longer. Its efficiency derives from the
        fact that with each unsuccessful attempt to find a match between the search
        string and the text it is searching, it uses the information gained from
        that attempt to rule out as many positions of the text as possible where
        the string cannot match.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h3"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.memory_use"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.memory_use">Memory
        Use</a>
      </h5>
<p>
        The algorithm an that contains one entry for each element the pattern, plus
        one extra. So, when searching for a 1026 byte string, the table will have
        1027 entries.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h4"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.complexity"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.complexity">Complexity</a>
      </h5>
<p>
        The worst-case performance is <span class="emphasis"><em>O(2n)</em></span>, where <span class="emphasis"><em>n</em></span>
        is the length of the corpus. The average time is <span class="emphasis"><em>O(n)</em></span>.
        The best case performance is sub-linear.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h5"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.exception_safety"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.exception_safety">Exception
        Safety</a>
      </h5>
<p>
        Both the object-oriented and procedural versions of the Knuth-Morris-Pratt
        algorithm take their parameters by value and do not use any information other
        than what is passed in. Therefore, both interfaces provide the strong exception
        guarantee.
      </p>
<h5>
<a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.h6"></a>
        <span class="phrase"><a name="the_boost_algorithm_library.Searching.KnuthMorrisPratt.notes"></a></span><a class="link" href="Searching.html#the_boost_algorithm_library.Searching.KnuthMorrisPratt.notes">Notes</a>
      </h5>
<div class="itemizedlist"><ul class="itemizedlist" style="list-style-type: disc; ">
<li class="listitem">
            When using the object-based interface, the pattern must remain unchanged
            for during the searches; i.e, from the time the object is constructed
            until the final call to operator () returns.
          </li>
<li class="listitem">
            The Knuth-Morris-Pratt algorithm requires random-access iterators for
            both the pattern and the corpus. It should be possible to write this
            to use bidirectional iterators (or possibly even forward ones), but this
            implementation does not do that.
          </li>
</ul></div>
</div>
</div>
<table xmlns:rev="http://www.cs.rpi.edu/~gregod/boost/tools/doc/revision" width="100%"><tr>
<td align="left"></td>
<td align="right"><div class="copyright-footer">Copyright &#169; 2010-2012 Marshall Clow<p>
        Distributed under the Boost Software License, Version 1.0. (See accompanying
        file LICENSE_1_0.txt or copy at <a href="http://www.boost.org/LICENSE_1_0.txt" target="_top">http://www.boost.org/LICENSE_1_0.txt</a>)
      </p>
</div></td>
</tr></table>
<hr>
<div class="spirit-nav">
<a accesskey="p" href="../algorithm.html"><img src="../../../doc/src/images/prev.png" alt="Prev"></a><a accesskey="u" href="../algorithm.html"><img src="../../../doc/src/images/up.png" alt="Up"></a><a accesskey="h" href="../index.html"><img src="../../../doc/src/images/home.png" alt="Home"></a><a accesskey="n" href="CXX11.html"><img src="../../../doc/src/images/next.png" alt="Next"></a>
</div>
</body>
</html>