1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
|
=head1 NAME
XML::LibXML::InputCallback - XML::LibXML Class for Input Callbacks
=head1 SYNOPSIS
use XML::LibXML;
=head1 DESCRIPTION
You may get unexpected results if you are trying to load external documents
during libxml2 parsing if the location of the resource is not a HTTP, FTP or
relative location but a absolute path for example. To get around this
limitation, you may add your own input handler to open, read and close
particular types of locations or URI classes. Using this input callback
handlers, you can handle your own custom URI schemes for example.
The input callbacks are used whenever XML::LibXML has to get something other
than externally parsed entities from somewhere. They are implemented using a
callback stack on the Perl layer in analogy to libxml2's native callback stack.
The XML::LibXML::InputCallback class transparently registers the input
callbacks for the libxml2's parser processes.
=head2 How does XML::LibXML::InputCallback work?
The libxml2 library offers a callback implementation as global functions only.
To work-around the troubles resulting in having only global callbacks - for
example, if the same global callback stack is manipulated by different
applications running together in a single Apache Web-server environment -,
XML::LibXML::InputCallback comes with a object-oriented and a function-oriented
part.
Using the function-oriented part the global callback stack of libxml2 can be
manipulated. Those functions can be used as interface to the callbacks on the
C- and XS Layer. At the object-oriented part, operations for working with the
"pseudo-localized" callback stack are implemented. Currently, you can register
and de-register callbacks on the Perl layer and initialize them on a per parser
basis.
=head3 Callback Groups
The libxml2 input callbacks come in groups. One group contains a URI matcher (I<<<<<< match >>>>>>), a data stream constructor (I<<<<<< open >>>>>>), a data stream reader (I<<<<<< read >>>>>>), and a data stream destructor (I<<<<<< close >>>>>>). The callbacks can be manipulated on a per group basis only.
=head3 The Parser Process
The parser process works on an XML data stream, along which, links to other
resources can be embedded. This can be links to external DTDs or XIncludes for
example. Those resources are identified by URIs. The callback implementation of
libxml2 assumes that one callback group can handle a certain amount of URIs and
a certain URI scheme. Per default, callback handlers for I<<<<<< file://* >>>>>>, I<<<<<< file:://*.gz >>>>>>, I<<<<<< http://* >>>>>> and I<<<<<< ftp://* >>>>>> are registered.
Callback groups in the callback stack are processed from top to bottom, meaning
that callback groups registered later will be processed before the earlier
registered ones.
While parsing the data stream, the libxml2 parser checks if a registered
callback group will handle a URI - if they will not, the URI will be
interpreted as I<<<<<< file://URI >>>>>>. To handle a URI, the I<<<<<< match >>>>>> callback will have to return '1'. If that happens, the handling of the URI will
be passed to that callback group. Next, the URI will be passed to the I<<<<<< open >>>>>> callback, which should return a I<<<<<< reference >>>>>> to the data stream if it successfully opened the file, '0' otherwise. If
opening the stream was successful, the I<<<<<< read >>>>>> callback will be called repeatedly until it returns an empty string. After the
read callback, the I<<<<<< close >>>>>> callback will be called to close the stream.
=head3 Organisation of callback groups in XML::LibXML::InputCallback
Callback groups are implemented as a stack (Array), each entry holds a
reference to an array of the callbacks. For the libxml2 library, the
XML::LibXML::InputCallback callback implementation appears as one single
callback group. The Perl implementation however allows one to manage different
callback stacks on a per libxml2-parser basis.
=head2 Using XML::LibXML::InputCallback
After object instantiation using the parameter-less constructor, you can
register callback groups.
my $input_callbacks = XML::LibXML::InputCallback->new();
$input_callbacks->register_callbacks([ $match_cb1, $open_cb1,
$read_cb1, $close_cb1 ] );
$input_callbacks->register_callbacks([ $match_cb2, $open_cb2,
$read_cb2, $close_cb2 ] );
$input_callbacks->register_callbacks( [ $match_cb3, $open_cb3,
$read_cb3, $close_cb3 ] );
$parser->input_callbacks( $input_callbacks );
$parser->parse_file( $some_xml_file );
=head2 What about the old callback system prior to XML::LibXML::InputCallback?
In XML::LibXML versions prior to 1.59 - i.e. without the
XML::LibXML::InputCallback module - you could define your callbacks either
using globally or locally. You still can do that using
XML::LibXML::InputCallback, and in addition to that you can define the
callbacks on a per parser basis!
If you use the old callback interface through global callbacks,
XML::LibXML::InputCallback will treat them with a lower priority as the ones
registered using the new interface. The global callbacks will not override the
callback groups registered using the new interface. Local callbacks are
attached to a specific parser instance, therefore they are treated with highest
priority. If the I<<<<<< match >>>>>> callback of the callback group registered as local variable is identical to one
of the callback groups registered using the new interface, that callback group
will be replaced.
Users of the old callback implementation whose I<<<<<< open >>>>>> callback returned a plain string, will have to adapt their code to return a
reference to that string after upgrading to version >= 1.59. The new callback
system can only deal with the I<<<<<< open >>>>>> callback returning a reference!
=head1 INTERFACE DESCRIPTION
=head2 Global Variables
=over 4
=item $_CUR_CB
Stores the current callback and can be used as shortcut to access the callback
stack.
=item @_GLOBAL_CALLBACKS
Stores all callback groups for the current parser process.
=item @_CB_STACK
Stores the currently used callback group. Used to prevent parser errors when
dealing with nested XML data.
=back
=head2 Global Callbacks
=over 4
=item _callback_match
Implements the interface for the I<<<<<< match >>>>>> callback at C-level and for the selection of the callback group from the
callbacks defined at the Perl-level.
=item _callback_open
Forwards the I<<<<<< open >>>>>> callback from libxml2 to the corresponding callback function at the Perl-level.
=item _callback_read
Forwards the read request to the corresponding callback function at the
Perl-level and returns the result to libxml2.
=item _callback_close
Forwards the I<<<<<< close >>>>>> callback from libxml2 to the corresponding callback function at the
Perl-level..
=back
=head2 Class methods
=over 4
=item new()
A simple constructor.
=item register_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])
The four callbacks I<<<<<< have >>>>>> to be given as array reference in the above order I<<<<<< match >>>>>>, I<<<<<< open >>>>>>, I<<<<<< read >>>>>>, I<<<<<< close >>>>>>!
=item unregister_callbacks( [ $match_cb, $open_cb, $read_cb, $close_cb ])
With no arguments given, C<<<<<< unregister_callbacks() >>>>>> will delete the last registered callback group from the stack. If four
callbacks are passed as array reference, the callback group to unregister will
be identified by the I<<<<<< match >>>>>> callback and deleted from the callback stack. Note that if several identical I<<<<<< match >>>>>> callbacks are defined in different callback groups, ALL of them will be deleted
from the stack.
=item init_callbacks( $parser )
Initializes the callback system for the provided parser before starting a
parsing process.
=item cleanup_callbacks()
Resets global variables and the libxml2 callback stack.
=item lib_init_callbacks()
Used internally for callback registration at C-level.
=item lib_cleanup_callbacks()
Used internally for callback resetting at the C-level.
=back
=head1 EXAMPLE CALLBACKS
The following example is a purely fictitious example that uses a
MyScheme::Handler object that responds to methods similar to an IO::Handle.
# Define the four callback functions
sub match_uri {
my $uri = shift;
return $uri =~ /^myscheme:/; # trigger our callback group at a 'myscheme' URIs
}
sub open_uri {
my $uri = shift;
my $handler = MyScheme::Handler->new($uri);
return $handler;
}
# The returned $buffer will be parsed by the libxml2 parser
sub read_uri {
my $handler = shift;
my $length = shift;
my $buffer;
read($handler, $buffer, $length);
return $buffer; # $buffer will be an empty string '' if read() is done
}
# Close the handle associated with the resource.
sub close_uri {
my $handler = shift;
close($handler);
}
# Register them with a instance of XML::LibXML::InputCallback
my $input_callbacks = XML::LibXML::InputCallback->new();
$input_callbacks->register_callbacks([ \&match_uri, \&open_uri,
\&read_uri, \&close_uri ] );
# Register the callback group at a parser instance
$parser->input_callbacks( $input_callbacks );
# $some_xml_file will be parsed using our callbacks
$parser->parse_file( $some_xml_file );
=head1 AUTHORS
Matt Sergeant,
Christian Glahn,
Petr Pajas
=head1 VERSION
2.0206
=head1 COPYRIGHT
2001-2007, AxKit.com Ltd.
2002-2006, Christian Glahn.
2006-2009, Petr Pajas.
=cut
=head1 LICENSE
This program is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.
|