summaryrefslogtreecommitdiff
path: root/lib/XML/LibXML/Document.pod
blob: 4efa5d9ca431ecb87c4da05a35604466c93c8b61 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
104
105
106
107
108
109
110
111
112
113
114
115
116
117
118
119
120
121
122
123
124
125
126
127
128
129
130
131
132
133
134
135
136
137
138
139
140
141
142
143
144
145
146
147
148
149
150
151
152
153
154
155
156
157
158
159
160
161
162
163
164
165
166
167
168
169
170
171
172
173
174
175
176
177
178
179
180
181
182
183
184
185
186
187
188
189
190
191
192
193
194
195
196
197
198
199
200
201
202
203
204
205
206
207
208
209
210
211
212
213
214
215
216
217
218
219
220
221
222
223
224
225
226
227
228
229
230
231
232
233
234
235
236
237
238
239
240
241
242
243
244
245
246
247
248
249
250
251
252
253
254
255
256
257
258
259
260
261
262
263
264
265
266
267
268
269
270
271
272
273
274
275
276
277
278
279
280
281
282
283
284
285
286
287
288
289
290
291
292
293
294
295
296
297
298
299
300
301
302
303
304
305
306
307
308
309
310
311
312
313
314
315
316
317
318
319
320
321
322
323
324
325
326
327
328
329
330
331
332
333
334
335
336
337
338
339
340
341
342
343
344
345
346
347
348
349
350
351
352
353
354
355
356
357
358
359
360
361
362
363
364
365
366
367
368
369
370
371
372
373
374
375
376
377
378
379
380
381
382
383
384
385
386
387
388
389
390
391
392
393
394
395
396
397
398
399
400
401
402
403
404
405
406
407
408
409
410
411
412
413
414
415
416
417
418
419
420
421
422
423
424
425
426
427
428
429
430
431
432
433
434
435
436
437
438
439
440
441
442
443
444
445
446
447
448
449
450
451
452
453
454
455
456
457
458
459
460
461
462
463
464
465
466
467
468
469
470
471
472
473
474
475
476
477
478
479
480
481
482
483
484
485
486
487
488
489
490
491
492
493
494
495
496
497
498
499
500
501
502
503
504
505
506
507
508
509
510
511
512
513
514
515
516
517
518
519
520
521
522
523
524
525
526
527
528
529
530
531
532
533
534
535
536
537
538
539
540
541
542
543
544
545
546
547
548
549
550
551
552
553
554
555
556
557
558
559
560
561
562
563
564
565
566
567
568
569
570
571
572
573
574
575
576
577
578
579
580
581
582
583
584
585
586
587
588
589
590
591
592
593
594
595
596
597
598
599
600
601
602
603
604
605
606
607
608
609
610
611
612
613
614
615
616
617
618
619
620
621
622
623
624
625
626
627
628
629
630
631
632
633
634
635
636
637
638
639
640
641
642
643
644
645
646
647
648
649
650
651
652
653
654
655
656
657
658
659
660
661
662
663
664
665
666
667
668
669
670
671
672
673
674
675
676
677
678
679
680
681
682
683
684
685
686
687
688
689
690
691
692
693
694
695
696
697
698
699
700
701
702
703
=head1 NAME

XML::LibXML::Document - XML::LibXML DOM Document Class

=head1 SYNOPSIS



  use XML::LibXML;
  # Only methods specific to Document nodes are listed here,
  # see the XML::LibXML::Node manpage for other methods

  $dom = XML::LibXML::Document->new( $version, $encoding );
  $dom = XML::LibXML::Document->createDocument( $version, $encoding );
  $strURI = $doc->URI();
  $doc->setURI($strURI);
  $strEncoding = $doc->encoding();
  $strEncoding = $doc->actualEncoding();
  $doc->setEncoding($new_encoding);
  $strVersion = $doc->version();
  $doc->standalone
  $doc->setStandalone($numvalue);
  my $compression = $doc->compression;
  $doc->setCompression($ziplevel);
  $docstring = $dom->toString($format);
  $c14nstr = $doc->toStringC14N($comment_flag, $xpath [, $xpath_context ]);
  $ec14nstr = $doc->toStringEC14N($comment_flag, $xpath [, $xpath_context ], $inclusive_prefix_list);
  $str = $doc->serialize($format);
  $state = $doc->toFile($filename, $format);
  $state = $doc->toFH($fh, $format);
  $str = $document->toStringHTML();
  $str = $document->serialize_html();
  $bool = $dom->is_valid();
  $dom->validate();
  $root = $dom->documentElement();
  $dom->setDocumentElement( $root );
  $element = $dom->createElement( $nodename );
  $element = $dom->createElementNS( $namespaceURI, $nodename );
  $text = $dom->createTextNode( $content_text );
  $comment = $dom->createComment( $comment_text );
  $attrnode = $doc->createAttribute($name [,$value]);
  $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );
  $fragment = $doc->createDocumentFragment();
  $cdata = $dom->createCDATASection( $cdata_content );
  my $pi = $doc->createProcessingInstruction( $target, $data );
  my $entref = $doc->createEntityReference($refname);
  $dtd = $document->createInternalSubset( $rootnode, $public, $system);
  $dtd = $document->createExternalSubset( $rootnode_name, $publicId, $systemId);
  $document->importNode( $node );
  $document->adoptNode( $node );
  my $dtd = $doc->externalSubset;
  my $dtd = $doc->internalSubset;
  $doc->setExternalSubset($dtd);
  $doc->setInternalSubset($dtd);
  my $dtd = $doc->removeExternalSubset();
  my $dtd = $doc->removeInternalSubset();
  my @nodelist = $doc->getElementsByTagName($tagname);
  my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);
  my @nodelist = $doc->getElementsByLocalName($localname);
  my $node = $doc->getElementById($id);
  $dom->indexElements();

=head1 DESCRIPTION

The Document Class is in most cases the result of a parsing process. But
sometimes it is necessary to create a Document from scratch. The DOM Document
Class provides functions that conform to the DOM Core naming style.

It inherits all functions from L<<<<<< XML::LibXML::Node >>>>>> as specified in the DOM specification. This enables access to the nodes besides
the root element on document level - a C<<<<<< DTD >>>>>> for example. The support for these nodes is limited at the moment.

While generally nodes are bound to a document in the DOM concept it is
suggested that one should always create a node not bound to any document. There
is no need of really including the node to the document, but once the node is
bound to a document, it is quite safe that all strings have the correct
encoding. If an unbound text node with an ISO encoded string is created (e.g.
with $CLASS->new()), the C<<<<<< toString >>>>>> function may not return the expected result.

To prevent such problems, it is recommended to pass all data to XML::LibXML
methods as character strings (i.e. UTF-8 encoded, with the UTF8 flag on).


=head1 METHODS

Many functions listed here are extensively documented in the DOM Level 3 specification (L<<<<<< http://www.w3.org/TR/DOM-Level-3-Core/ >>>>>>). Please refer to the specification for extensive documentation.

=over 4

=item new

  $dom = XML::LibXML::Document->new( $version, $encoding );

alias for createDocument()


=item createDocument

  $dom = XML::LibXML::Document->createDocument( $version, $encoding );

The constructor for the document class. As Parameter it takes the version
string and (optionally) the encoding string. Simply calling I<<<<<< createDocument >>>>>>() will create the document:



  <?xml version="your version" encoding="your encoding"?>

Both parameter are optional. The default value for I<<<<<< $version >>>>>> is C<<<<<< 1.0 >>>>>>, of course. If the I<<<<<< $encoding >>>>>> parameter is not set, the encoding will be left unset, which means UTF-8 is
implied.

The call of I<<<<<< createDocument >>>>>>() without any parameter will result the following code:



  <?xml version="1.0"?>

Alternatively one can call this constructor directly from the XML::LibXML class
level, to avoid some typing. This will not have any effect on the class
instance, which is always XML::LibXML::Document.



  my $document = XML::LibXML->createDocument( "1.0", "UTF-8" );

is therefore a shortcut for



  my $document = XML::LibXML::Document->createDocument( "1.0", "UTF-8" );


=item URI

  $strURI = $doc->URI();

Returns the URI (or filename) of the original document. For documents obtained
by parsing a string of a FH without using the URI parsing argument of the
corresponding C<<<<<< parse_* >>>>>> function, the result is a generated string unknown-XYZ where XYZ is some
number; for documents created with the constructor C<<<<<< new >>>>>>, the URI is undefined.

The value can be modified by calling C<<<<<< setURI >>>>>> method on the document node.


=item setURI

  $doc->setURI($strURI);

Sets the URI of the document reported by the method URI (see also the URI
argument to the various C<<<<<< parse_* >>>>>> functions).


=item encoding

  $strEncoding = $doc->encoding();

returns the encoding string of the document.



  my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
  print $doc->encoding; # prints ISO-8859-15


=item actualEncoding

  $strEncoding = $doc->actualEncoding();

returns the encoding in which the XML will be returned by $doc->toString().
This is usually the original encoding of the document as declared in the XML
declaration and returned by $doc->encoding. If the original encoding is not
known (e.g. if created in memory or parsed from a XML without a declared
encoding), 'UTF-8' is returned.



  my $doc = XML::LibXML->createDocument( "1.0", "ISO-8859-15" );
  print $doc->encoding; # prints ISO-8859-15


=item setEncoding

  $doc->setEncoding($new_encoding);

This method allows one to change the declaration of encoding in the XML
declaration of the document. The value also affects the encoding in which the
document is serialized to XML by $doc->toString(). Use setEncoding() to remove
the encoding declaration.


=item version

  $strVersion = $doc->version();

returns the version string of the document

I<<<<<< getVersion() >>>>>> is an alternative form of this function.


=item standalone

  $doc->standalone

This function returns the Numerical value of a documents XML declarations
standalone attribute. It returns I<<<<<< 1 >>>>>> if standalone="yes" was found, I<<<<<< 0 >>>>>> if standalone="no" was found and I<<<<<< -1 >>>>>> if standalone was not specified (default on creation).


=item setStandalone

  $doc->setStandalone($numvalue);

Through this method it is possible to alter the value of a documents standalone
attribute. Set it to I<<<<<< 1 >>>>>> to set standalone="yes", to I<<<<<< 0 >>>>>> to set standalone="no" or set it to I<<<<<< -1 >>>>>> to remove the standalone attribute from the XML declaration.


=item compression

  my $compression = $doc->compression;

libxml2 allows reading of documents directly from gzipped files. In this case
the compression variable is set to the compression level of that file (0-8). If
XML::LibXML parsed a different source or the file wasn't compressed, the
returned value will be I<<<<<< -1 >>>>>>.


=item setCompression

  $doc->setCompression($ziplevel);

If one intends to write the document directly to a file, it is possible to set
the compression level for a given document. This level can be in the range from
0 to 8. If XML::LibXML should not try to compress use I<<<<<< -1 >>>>>> (default).

Note that this feature will I<<<<<< only >>>>>> work if libxml2 is compiled with zlib support and toFile() is used for output.


=item toString

  $docstring = $dom->toString($format);

I<<<<<< toString >>>>>> is a DOM serializing function, so the DOM Tree is serialized into an XML
string, ready for output.

IMPORTANT: unlike toString for other nodes, on document nodes this function
returns the XML as a byte string in the original encoding of the document (see
the actualEncoding() method)! This means you can simply do:



  open my $out_fh, '>', $file;
  print {$out_fh} $doc->toString;

regardless of the actual encoding of the document. See the section on encodings
in L<<<<<< XML::LibXML >>>>>> for more details.

The optional I<<<<<< $format >>>>>> parameter sets the indenting of the output. This parameter is expected to be an C<<<<<< integer >>>>>> value, that specifies that indentation should be used. The format parameter can
have three different values if it is used:

If $format is 0, than the document is dumped as it was originally parsed

If $format is 1, libxml2 will add ignorable white spaces, so the nodes content
is easier to read. Existing text nodes will not be altered

If $format is 2 (or higher), libxml2 will act as $format == 1 but it add a
leading and a trailing line break to each text node.

libxml2 uses a hard-coded indentation of 2 space characters per indentation
level. This value can not be altered on run-time.


=item toStringC14N

  $c14nstr = $doc->toStringC14N($comment_flag, $xpath [, $xpath_context ]);

See the documentation in L<<<<<< XML::LibXML::Node >>>>>>.


=item toStringEC14N

  $ec14nstr = $doc->toStringEC14N($comment_flag, $xpath [, $xpath_context ], $inclusive_prefix_list);

See the documentation in L<<<<<< XML::LibXML::Node >>>>>>.


=item serialize

  $str = $doc->serialize($format);

An alias for toString(). This function was name added to be more consistent
with libxml2.


=item serialize_c14n

An alias for toStringC14N().


=item serialize_exc_c14n

An alias for toStringEC14N().


=item toFile

  $state = $doc->toFile($filename, $format);

This function is similar to toString(), but it writes the document directly
into a filesystem. This function is very useful, if one needs to store large
documents.

The format parameter has the same behaviour as in toString().


=item toFH

  $state = $doc->toFH($fh, $format);

This function is similar to toString(), but it writes the document directly to
a filehandle or a stream. A byte stream in the document encoding is passed to
the file handle. Do NOT apply any C<<<<<< :encoding(...) >>>>>> or C<<<<<< :utf8 >>>>>> PerlIO layer to the filehandle! See the section on encodings in L<<<<<< XML::LibXML >>>>>> for more details.

The format parameter has the same behaviour as in toString().


=item toStringHTML

  $str = $document->toStringHTML();

I<<<<<< toStringHTML >>>>>> serialize the tree to a byte string in the document encoding as HTML. With this
method indenting is automatic and managed by libxml2 internally.


=item serialize_html

  $str = $document->serialize_html();

An alias for toStringHTML().


=item is_valid

  $bool = $dom->is_valid();

Returns either TRUE or FALSE depending on whether the DOM Tree is a valid
Document or not.

You may also pass in a L<<<<<< XML::LibXML::Dtd >>>>>> object, to validate against an external DTD:



  if (!$dom->is_valid($dtd)) {
       warn("document is not valid!");
   }


=item validate

  $dom->validate();

This is an exception throwing equivalent of is_valid. If the document is not
valid it will throw an exception containing the error. This allows you much
better error reporting than simply is_valid or not.

Again, you may pass in a DTD object


=item documentElement

  $root = $dom->documentElement();

Returns the root element of the Document. A document can have just one root
element to contain the documents data.

Optionally one can use I<<<<<< getDocumentElement >>>>>>.


=item setDocumentElement

  $dom->setDocumentElement( $root );

This function enables you to set the root element for a document. The function
supports the import of a node from a different document tree, but does not
support a document fragment as $root.


=item createElement

  $element = $dom->createElement( $nodename );

This function creates a new Element Node bound to the DOM with the name C<<<<<< $nodename >>>>>>.


=item createElementNS

  $element = $dom->createElementNS( $namespaceURI, $nodename );

This function creates a new Element Node bound to the DOM with the name C<<<<<< $nodename >>>>>> and placed in the given namespace.


=item createTextNode

  $text = $dom->createTextNode( $content_text );

As an equivalent of I<<<<<< createElement >>>>>>, but it creates a I<<<<<< Text Node >>>>>> bound to the DOM.


=item createComment

  $comment = $dom->createComment( $comment_text );

As an equivalent of I<<<<<< createElement >>>>>>, but it creates a I<<<<<< Comment Node >>>>>> bound to the DOM.


=item createAttribute

  $attrnode = $doc->createAttribute($name [,$value]);

Creates a new Attribute node.


=item createAttributeNS

  $attrnode = $doc->createAttributeNS( namespaceURI, $name [,$value] );

Creates an Attribute bound to a namespace.


=item createDocumentFragment

  $fragment = $doc->createDocumentFragment();

This function creates a DocumentFragment.


=item createCDATASection

  $cdata = $dom->createCDATASection( $cdata_content );

Similar to createTextNode and createComment, this function creates a
CDataSection bound to the current DOM.


=item createProcessingInstruction

  my $pi = $doc->createProcessingInstruction( $target, $data );

create a processing instruction node.

Since this method is quite long one may use its short form I<<<<<< createPI() >>>>>>.


=item createEntityReference

  my $entref = $doc->createEntityReference($refname);

If a document has a DTD specified, one can create entity references by using
this function. If one wants to add a entity reference to the document, this
reference has to be created by this function.

An entity reference is unique to a document and cannot be passed to other
documents as other nodes can be passed.

I<<<<<< NOTE: >>>>>> A text content containing something that looks like an entity reference, will
not be expanded to a real entity reference unless it is a predefined entity



  my $string = "&foo;";
   $some_element->appendText( $string );
   print $some_element->textContent; # prints "&amp;foo;"


=item createInternalSubset

  $dtd = $document->createInternalSubset( $rootnode, $public, $system);

This function creates and adds an internal subset to the given document.
Because the function automatically adds the DTD to the document there is no
need to add the created node explicitly to the document.



  my $document = XML::LibXML::Document->new();
   my $dtd      = $document->createInternalSubset( "foo", undef, "foo.dtd" );

will result in the following XML document:



  <?xml version="1.0"?>
   <!DOCTYPE foo SYSTEM "foo.dtd">

By setting the public parameter it is possible to set PUBLIC DTDs to a given
document. So



  my $document = XML::LibXML::Document->new();
  my $dtd      = $document->createInternalSubset( "foo", "-//FOO//DTD FOO 0.1//EN", undef );

will cause the following declaration to be created on the document:



  <?xml version="1.0"?>
  <!DOCTYPE foo PUBLIC "-//FOO//DTD FOO 0.1//EN">


=item createExternalSubset

  $dtd = $document->createExternalSubset( $rootnode_name, $publicId, $systemId);

This function is similar to C<<<<<< createInternalSubset() >>>>>> but this DTD is considered to be external and is therefore not added to the
document itself. Nevertheless it can be used for validation purposes.


=item importNode

  $document->importNode( $node );

If a node is not part of a document, it can be imported to another document. As
specified in DOM Level 2 Specification the Node will not be altered or removed
from its original document (C<<<<<< $node-E<gt>cloneNode(1) >>>>>> will get called implicitly).

I<<<<<< NOTE: >>>>>> Don't try to use importNode() to import sub-trees that contain an entity
reference - even if the entity reference is the root node of the sub-tree. This
will cause serious problems to your program. This is a limitation of libxml2
and not of XML::LibXML itself.


=item adoptNode

  $document->adoptNode( $node );

If a node is not part of a document, it can be imported to another document. As
specified in DOM Level 3 Specification the Node will not be altered but it will
removed from its original document.

After a document adopted a node, the node, its attributes and all its
descendants belong to the new document. Because the node does not belong to the
old document, it will be unlinked from its old location first.

I<<<<<< NOTE: >>>>>> Don't try to adoptNode() to import sub-trees that contain entity references -
even if the entity reference is the root node of the sub-tree. This will cause
serious problems to your program. This is a limitation of libxml2 and not of
XML::LibXML itself.


=item externalSubset

  my $dtd = $doc->externalSubset;

If a document has an external subset defined it will be returned by this
function.

I<<<<<< NOTE >>>>>> Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in
XML::LibXML is still limited. In particular one may not want use common node
function on doctype declaration nodes!


=item internalSubset

  my $dtd = $doc->internalSubset;

If a document has an internal subset defined it will be returned by this
function.

I<<<<<< NOTE >>>>>> Dtd nodes are no ordinary nodes in libxml2. The support for these nodes in
XML::LibXML is still limited. In particular one may not want use common node
function on doctype declaration nodes!


=item setExternalSubset

  $doc->setExternalSubset($dtd);

I<<<<<< EXPERIMENTAL! >>>>>>

This method sets a DTD node as an external subset of the given document.


=item setInternalSubset

  $doc->setInternalSubset($dtd);

I<<<<<< EXPERIMENTAL! >>>>>>

This method sets a DTD node as an internal subset of the given document.


=item removeExternalSubset

  my $dtd = $doc->removeExternalSubset();

I<<<<<< EXPERIMENTAL! >>>>>>

If a document has an external subset defined it can be removed from the
document by using this function. The removed dtd node will be returned.


=item removeInternalSubset

  my $dtd = $doc->removeInternalSubset();

I<<<<<< EXPERIMENTAL! >>>>>>

If a document has an internal subset defined it can be removed from the
document by using this function. The removed dtd node will be returned.


=item getElementsByTagName

  my @nodelist = $doc->getElementsByTagName($tagname);

Implements the DOM Level 2 function

In SCALAR context this function returns an L<<<<<< XML::LibXML::NodeList >>>>>> object.


=item getElementsByTagNameNS

  my @nodelist = $doc->getElementsByTagNameNS($nsURI,$tagname);

Implements the DOM Level 2 function

In SCALAR context this function returns an L<<<<<< XML::LibXML::NodeList >>>>>> object.


=item getElementsByLocalName

  my @nodelist = $doc->getElementsByLocalName($localname);

This allows the fetching of all nodes from a given document with the given
Localname.

In SCALAR context this function returns an L<<<<<< XML::LibXML::NodeList >>>>>> object.


=item getElementById

  my $node = $doc->getElementById($id);

Returns the element that has an ID attribute with the given value. If no such
element exists, this returns undef.

Note: the ID of an element may change while manipulating the document. For
documents with a DTD, the information about ID attributes is only available if
DTD loading/validation has been requested. For HTML documents parsed with the
HTML parser ID detection is done automatically. In XML documents, all "xml:id"
attributes are considered to be of type ID. You can test ID-ness of an
attribute node with $attr->isId().

In versions 1.59 and earlier this method was called getElementsById() (plural)
by mistake. Starting from 1.60 this name is maintained as an alias only for
backward compatibility.


=item indexElements

  $dom->indexElements();

This function causes libxml2 to stamp all elements in a document with their
document position index which considerably speeds up XPath queries for large
documents. It should only be used with static documents that won't be further
changed by any DOM methods, because once a document is indexed, XPath will
always prefer the index to other methods of determining the document order of
nodes. XPath could therefore return improperly ordered node-lists when applied
on a document that has been changed after being indexed. It is of course
possible to use this method to re-index a modified document before using it
with XPath again. This function is not a part of the DOM specification.

This function returns number of elements indexed, -1 if error occurred, or -2
if this feature is not available in the running libxml2.



=back

=head1 AUTHORS

Matt Sergeant,
Christian Glahn,
Petr Pajas


=head1 VERSION

2.0206

=head1 COPYRIGHT

2001-2007, AxKit.com Ltd.

2002-2006, Christian Glahn.

2006-2009, Petr Pajas.

=cut


=head1 LICENSE

This program is free software; you can redistribute it and/or modify it under
the same terms as Perl itself.