summaryrefslogtreecommitdiff
path: root/docs/html/parser-grddl.html
blob: e530c209eb1357c4b1ddf7b22e3c1d95a0df27f5 (plain)
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
69
70
71
72
73
74
75
76
77
78
79
80
81
82
83
84
85
86
87
88
89
90
91
92
93
94
95
96
97
98
99
100
101
102
103
<!DOCTYPE html PUBLIC "-//W3C//DTD HTML 4.01 Transitional//EN">
<html>
<head>
<meta http-equiv="Content-Type" content="text/html; charset=UTF-8">
<title>GRDDL parser (name grddl)</title>
<meta name="generator" content="DocBook XSL Stylesheets V1.76.1">
<link rel="home" href="index.html" title="Raptor RDF Syntax Library Manual">
<link rel="up" href="raptor-parsers.html" title="Parsers in Raptor (syntax to triples)">
<link rel="prev" href="raptor-parsers.html" title="Parsers in Raptor (syntax to triples)">
<link rel="next" href="parser-guess.html" title="Guess parser (name guess)">
<meta name="generator" content="GTK-Doc V1.18 (XML mode)">
<link rel="stylesheet" href="style.css" type="text/css">
</head>
<body bgcolor="white" text="black" link="#0000FF" vlink="#840084" alink="#0000FF">
<table class="navigation" id="top" width="100%" summary="Navigation header" cellpadding="2" cellspacing="2"><tr valign="middle">
<td><a accesskey="p" href="raptor-parsers.html"><img src="left.png" width="24" height="24" border="0" alt="Prev"></a></td>
<td><a accesskey="u" href="raptor-parsers.html"><img src="up.png" width="24" height="24" border="0" alt="Up"></a></td>
<td><a accesskey="h" href="index.html"><img src="home.png" width="24" height="24" border="0" alt="Home"></a></td>
<th width="100%" align="center">Raptor RDF Syntax Library Manual</th>
<td><a accesskey="n" href="parser-guess.html"><img src="right.png" width="24" height="24" border="0" alt="Next"></a></td>
</tr></table>
<div class="section">
<div class="titlepage"><div><div><h2 class="title" style="clear: both">
<a name="parser-grddl"></a>GRDDL parser (name <code class="literal">grddl</code>)</h2></div></div></div>
<p>A parser for the
<a class="ulink" href="http://www.w3.org/TR/2007/PR-grddl-20070716/" target="_top">Gleaning Resource Descriptions from Dialects of Languages (GRDDL)</a>,
W3C Proposed Recommendation of 2007-07-16 which allows reading XHTML
and XML as RDF triples by using profiles in the document that declare
XSLT transforms from the XHTML or XML content into RDF/XML or other
RDF syntax which can then be parsed.</p>
<p>The GRDDL parser is rather complex and different from the other
parsers in that it retrieves URIs, reads HTML documents (possibly
with errors), transforms the documents with XSLT and turns the result
into a single graph.  The default configuration of the GRDDL parser
also reads microformats (hcard, hcalendar) and follows &lt;link&gt;
tags that point to RDF/XML.  Parts of the GRDDL process can be
altered by configuration, which are describe below.
</p>
<p>The GRDDL parser defines 'base', 'Base' and 'url' XSLT parameters
with the value of the base URI to allow some XSLT sheets to work. These
set of parameters cannot be disabled.
</p>
<p>If the XSLT transform returns an empty string, no further
processing of the result is done, and a warning is generated.  The
xsl:output method is mapped to result document mime types as follows:
'text' to text/plain; 'xml' to application/xml and 'html' to text/html.
Any result that is of type 'application/xml' or unknown mime type
is assumed to be RDF/XML.
</p>
<p>The URIs that are processed during GRDDL operations can be checked
and skipped if required using a handler set with the
<a class="link" href="raptor2-section-parser.html#raptor-parser-set-uri-filter" title="raptor_parser_set_uri_filter ()"><code class="function">raptor_parser_set_uri_filter()</code></a>
function.  If the handler returns non-0, the URI is rejected.
This uses
<a class="link" href="raptor2-section-www.html#raptor-www-set-uri-filter" title="raptor_www_set_uri_filter ()"><code class="function">raptor_www_set_uri_filter()</code></a>
internally.
</p>
<p>If the value of option
<a class="link" href="raptor2-section-option.html#RAPTOR-OPTION-WWW-TIMEOUT:CAPS"><code class="literal">RAPTOR_OPTION_WWW_TIMEOUT</code></a>
if set to a number &gt;0, it is used as the timeout in seconds
for retrieving of URIs during GRDDL processing.
This uses
<a class="link" href="raptor2-section-www.html#raptor-www-set-connection-timeout" title="raptor_www_set_connection_timeout ()"><code class="function">raptor_www_set_connection_timeout()</code></a>
internally.
</p>
<p>The hardcoded support for hcard and hcalendar
microformats can be disabled by setting parser option
<a class="link" href="raptor2-section-option.html#RAPTOR-OPTION-MICROFORMATS:CAPS"><code class="literal">RAPTOR_OPTION_MICROFORMATS</code></a>
to 0
or using
<a class="link" href="raptor2-section-parser.html#raptor-parser-set-option" title="raptor_parser_set_option ()"><code class="function">raptor_parser_set_option()</code></a>
with option
<a class="link" href="raptor2-section-option.html#RAPTOR-OPTION-STRICT:CAPS"><code class="literal">RAPTOR_OPTION_STRICT</code></a>
and a boolean value of 1.
</p>
<p>The GRDDL parser by default will try an XML parser on the
content followed by a lax HTML parser.  This can be disabled by
setting parser option
<a class="link" href="raptor2-section-option.html#RAPTOR-OPTION-HTML-TAG-SOUP:CAPS"><code class="literal">RAPTOR_OPTION_HTML_TAG_SOUP</code></a>
to 0
or using 
<a class="link" href="raptor2-section-parser.html#raptor-parser-set-option" title="raptor_parser_set_option ()"><code class="function">raptor_parser_set_option()</code></a>
with option
<a class="link" href="raptor2-section-option.html#RAPTOR-OPTION-STRICT:CAPS"><code class="literal">RAPTOR_OPTION_STRICT</code></a>
and a boolean value of 1.
</p>
<p>The GRDDL parser by default will try to look for an HTML
&lt;link&gt; tag that points to RDF/XML.  This can be disabled by
setting parser option
<a class="link" href="raptor2-section-option.html#RAPTOR-OPTION-HTML-LINK:CAPS"><code class="literal">RAPTOR_OPTION_HTML_LINK</code></a>
to 0
or using
<a class="link" href="raptor2-section-parser.html#raptor-parser-set-option" title="raptor_parser_set_option ()"><code class="function">raptor_parser_set_option()</code></a>
with option
<a class="link" href="raptor2-section-option.html#RAPTOR-OPTION-STRICT:CAPS"><code class="literal">RAPTOR_OPTION_STRICT</code></a>
and a boolean value of 1.
</p>
</div>
<div class="footer">
<hr>
          Generated by GTK-Doc V1.18</div>
</body>
</html>