summaryrefslogtreecommitdiff
path: root/doc/libidn.texi
diff options
context:
space:
mode:
Diffstat (limited to 'doc/libidn.texi')
-rw-r--r--doc/libidn.texi2176
1 files changed, 2176 insertions, 0 deletions
diff --git a/doc/libidn.texi b/doc/libidn.texi
new file mode 100644
index 0000000..c7f4698
--- /dev/null
+++ b/doc/libidn.texi
@@ -0,0 +1,2176 @@
+\input texinfo @c -*- mode: texinfo; coding: us-ascii; -*-
+@c This file is part of GNU Libidn.
+@c See below for copyright and license.
+
+@setfilename libidn.info
+@documentencoding UTF-8
+@include version.texi
+@settitle GNU Libidn
+@finalout
+
+@syncodeindex pg cp
+
+@copying
+This manual is last updated @value{UPDATED} for version
+@value{VERSION} of GNU Libidn.
+
+Copyright @copyright{} 2002, 2003, 2004, 2005, 2006, 2007, 2008, 2009 Simon Josefsson.
+
+@quotation
+Permission is granted to copy, distribute and/or modify this document
+under the terms of the GNU Free Documentation License, Version 1.3 or
+any later version published by the Free Software Foundation; with no
+Invariant Sections, no Front-Cover Texts, and no Back-Cover Texts. A
+copy of the license is included in the section entitled ``GNU Free
+Documentation License''.
+@end quotation
+@end copying
+
+@dircategory Software libraries
+@direntry
+* libidn: (libidn). Internationalized string processing library.
+@end direntry
+
+@dircategory Localization
+@direntry
+* idn: (libidn)Invoking idn. Internationalized Domain Name (IDN) string conversion.
+@end direntry
+
+@dircategory Emacs
+@direntry
+* IDN Library: (libidn)Emacs API. Emacs API for IDN functions.
+@end direntry
+
+@titlepage
+@title GNU Libidn
+@subtitle Internationalized string processing for the GNU system
+@subtitle for version @value{VERSION}, @value{UPDATED}
+@author Simon Josefsson
+@page
+@vskip 0pt plus 1filll
+@insertcopying
+@end titlepage
+
+@contents
+
+@ifnottex
+@node Top
+@top GNU Libidn
+
+@insertcopying
+@end ifnottex
+
+@menu
+* Introduction:: How to use this manual.
+* Preparation:: What you should do before using the library.
+* Utility Functions:: Unicode transformation utility functions.
+* Stringprep Functions:: Stringprep functions.
+* Punycode Functions:: Punycode functions.
+* IDNA Functions:: IDNA functions.
+* TLD Functions:: TLD functions.
+* PR29 Functions:: Detect strings non-idempotent under NFKC.
+* Examples:: Demonstrate how to use the library.
+* Invoking idn:: Command line interface to the library.
+* Emacs API:: Emacs Lisp API for Libidn.
+* Java API:: Notes on the Java port of Libidn.
+* C# API:: Notes on the C# port of Libidn.
+* Acknowledgements:: Whom to blame.
+* History:: Rough outline of development history.
+
+Appendices
+
+* PR29 discussion:: Implementation aspects of the PR29 flaw.
+* On Label Separators:: Discussions of a flaw in the IDNA spec.
+* Copying Information:: License text covering the Libidn library.
+
+Indices
+
+* Function and Variable Index::
+* Concept Index::
+
+@end menu
+
+
+@node Introduction
+@chapter Introduction
+
+GNU Libidn is a fully documented implementation of the Stringprep,
+Punycode and IDNA specifications. Libidn's purpose is to encode and
+decode internationalized domain names. The native C, C# and Java
+libraries are available under the GNU Lesser General Public License
+version 2.1 or later (@pxref{GNU LGPL}).
+
+The library contains a generic Stringprep implementation. Profiles
+for Nameprep, iSCSI, SASL, XMPP and Kerberos V5 are included.
+Punycode and ASCII Compatible Encoding (ACE) via IDNA are supported.
+A mechanism to define Top-Level Domain (TLD) specific validation
+tables, and to compare strings against those tables, is included.
+Default tables for some TLDs are also included.
+
+The Stringprep API consists of two main functions, one for converting
+data from the system's native representation into UTF-8, and one
+function to perform the Stringprep processing. Adding a new
+Stringprep profile for your application within the API is
+straightforward. The Punycode API consists of one encoding function
+and one decoding function. The IDNA API consists of the ToASCII and
+ToUnicode functions, as well as an high-level interface for converting
+entire domain names to and from the ACE encoded form. The TLD API
+consists of one set of functions to extract the TLD name from a domain
+string, one set of functions to locate the proper TLD table to use
+based on the TLD name, and core functions to validate a string against
+a TLD table, and some utility wrappers to perform all the steps in one
+call.
+
+The library is used by, e.g., GNU SASL and Shishi to process user
+names and passwords. Libidn can be built into GNU Libc to enable a
+new system-wide getaddrinfo flag for IDN processing.
+
+Libidn is developed for the GNU/Linux system, but runs on over 20 Unix
+platforms (including Solaris, IRIX, AIX, and Tru64) and Windows. The
+library is written in C and (parts of) the API is also accessible from
+C++, Emacs Lisp, Python and Java. A native Java and C# port is
+included.
+
+Also included is a command line tool, several self tests, code
+examples, and more, all licensed under the GNU General Public License
+version 3.0 or later (@pxref{GNU GPL}).
+
+@menu
+* Getting Started::
+* Features::
+* Library Overview::
+* Supported Platforms::
+* Getting help::
+* Commercial Support::
+* Downloading and Installing::
+* Bug Reports::
+* Contributing::
+@end menu
+
+@node Getting Started
+@section Getting Started
+
+This manual documents the library programming interface. All
+functions and data types provided by the library are explained.
+Included are also examples, and documentation for the command line
+tool @file{idn} that provide a quick interface to the library. The
+Emacs Lisp bindings for the library is also discussed.
+
+The reader is assumed to possess basic familiarity with
+internationalization concepts and network programming in C or C++.
+
+This manual can be used in several ways. If read from the beginning
+to the end, it gives a good introduction into the library and how it
+can be used in an application. Forward references are included where
+necessary. Later on, the manual can be used as a reference manual to
+get just the information needed about any particular interface of the
+library. Experienced programmers might want to start looking at the
+examples at the end of the manual (@pxref{Examples}), and then only
+read up those parts of the interface which are unclear.
+
+@node Features
+@section Features
+
+This library might have a couple of advantages over other libraries
+doing a similar job.
+
+@table @asis
+@item It's Free Software
+Anybody can use, modify, and redistribute it under the terms of the
+GNU Lesser General Public License version 2.1 or later (@pxref{GNU
+LGPL}).
+
+@item It's thread-safe
+No global state is kept in the library. All functions are reentrant.
+
+@item It's portable
+The code is intended to be written in pure ANSI C89. It has been
+tested on many Unix like operating systems, and Windows.
+
+@item It's modularized
+The library is composed of several modules, and the only interaction
+between modules is through each modules' public API. If you only need
+one piece of functionality, it is possible to take the files you need
+and incorporate them into your own project.
+
+@item It's not bloated
+The design of the library is based on the smallest API necessary to
+implement the basic functionality. It has been carefully extended
+with a small number of high-level wrappers to make it comfortable to
+use the library. However, it does not implement additional
+functionality just for the sake of completeness.
+
+@item It's documented
+Sadly, not all software comes with documentation these days. This one
+does.
+
+@end table
+
+@node Library Overview
+@section Library Overview
+
+The following illustration show the components that make up Libidn,
+and how your application relates to the library. In the illustration,
+various components are shown as boxes. You see the generic StringPrep
+component, the various StringPrep profiles including Nameprep, the
+Punycode component, the IDNA component, and the TLD component. The
+arrows indicate aggregation, e.g., IDNA uses Punycode and Nameprep,
+and in turn Nameprep uses the generic StringPrep interface. The
+interfaces to all components are available for applications, no
+component within the library is hidden from the application.
+
+@image{libidn-components}
+
+@node Supported Platforms
+@section Supported Platforms
+
+Libidn has at some point in time been tested on the following
+platforms. Online build reports for each platforms and Libidn version
+is available at @url{http://autobuild.josefsson.org/libidn/}.
+
+@enumerate
+
+@item Debian GNU/Linux 3.0 (Woody)
+@cindex Debian
+
+GCC 2.95.4 and GNU Make. This is the main development platform.
+@code{alphaev67-unknown-linux-gnu}, @code{alphaev6-unknown-linux-gnu},
+@code{arm-unknown-linux-gnu}, @code{armv4l-unknown-linux-gnu},
+@code{hppa-unknown-linux-gnu}, @code{hppa64-unknown-linux-gnu},
+@code{i686-pc-linux-gnu}, @code{ia64-unknown-linux-gnu},
+@code{m68k-unknown-linux-gnu}, @code{mips-unknown-linux-gnu},
+@code{mipsel-unknown-linux-gnu}, @code{powerpc-unknown-linux-gnu},
+@code{s390-ibm-linux-gnu}, @code{sparc-unknown-linux-gnu},
+@code{sparc64-unknown-linux-gnu}.
+
+@item Debian GNU/Linux 2.1
+@cindex Debian
+
+GCC 2.95.1 and GNU Make. @code{armv4l-unknown-linux-gnu}.
+
+@item Tru64 UNIX
+@cindex Tru64
+
+Tru64 UNIX C compiler and Tru64 Make. @code{alphaev67-dec-osf5.1},
+@code{alphaev68-dec-osf5.1}.
+
+@item SuSE Linux 7.1
+@cindex SuSE
+
+GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
+@code{alphaev67-unknown-linux-gnu}.
+
+@item SuSE Linux 7.2a
+@cindex SuSE Linux
+
+GCC 3.0 and GNU Make. @code{ia64-unknown-linux-gnu}.
+
+@item SuSE Linux
+@cindex SuSE Linux
+
+GCC 3.2.2 and GNU Make. @code{x86_64-unknown-linux-gnu} (AMD64
+Opteron ``Melody'').
+
+@item SuSE Enterprise Server 9 on IBM OpenPower 720
+@cindex SuSE Linux
+@cindex OpenPower 720
+
+GCC 3.3.3 and GNU Make. @code{powerpc64-unknown-linux-gnu}.
+
+@item RedHat Linux 7.2
+@cindex RedHat
+
+GCC 2.96 and GNU Make. @code{alphaev6-unknown-linux-gnu},
+@code{alphaev67-unknown-linux-gnu}, @code{ia64-unknown-linux-gnu}.
+
+@item RedHat Linux 8.0
+@cindex RedHat
+
+GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
+
+@item RedHat Advanced Server 2.1
+@cindex RedHat Advanced Server
+
+GCC 2.96 and GNU Make. @code{i686-pc-linux-gnu}.
+
+@item Slackware Linux 8.0.01
+@cindex RedHat
+
+GCC 2.95.3 and GNU Make. @code{i686-pc-linux-gnu}.
+
+@item Mandrake Linux 9.0
+@cindex Mandrake
+
+GCC 3.2 and GNU Make. @code{i686-pc-linux-gnu}.
+
+@item IRIX 6.5
+@cindex IRIX
+
+MIPS C compiler, IRIX Make. @code{mips-sgi-irix6.5}.
+
+@item AIX 4.3.2
+@cindex AIX
+
+IBM C for AIX compiler, AIX Make. @code{rs6000-ibm-aix4.3.2.0}.
+
+@item Microsoft Windows 2000 (Cygwin)
+@cindex Windows
+
+GCC 3.2, GNU make. @code{i686-pc-cygwin}.
+
+@item HP-UX 11
+@cindex HP-UX
+
+HP-UX C compiler and HP Make. @code{ia64-hp-hpux11.22},
+@code{hppa2.0w-hp-hpux11.11}.
+
+@item SUN Solaris 2.7
+@cindex Solaris
+
+GCC 3.0.4 and GNU Make. @code{sparc-sun-solaris2.7}.
+
+@item SUN Solaris 2.8
+@cindex Solaris
+
+Sun WorkShop Compiler C 6.0 and SUN Make. @code{sparc-sun-solaris2.8}.
+
+@item SUN Solaris 2.9
+@cindex Solaris
+
+Sun Forte Developer 7 C compiler and GNU
+Make. @code{sparc-sun-solaris2.9}.
+
+@item NetBSD 1.6
+@cindex NetBSD
+
+GCC 2.95.3 and GNU Make. @code{alpha-unknown-netbsd1.6},
+@code{i386-unknown-netbsdelf1.6}.
+
+@item OpenBSD 3.1 and 3.2
+@cindex OpenBSD
+
+GCC 2.95.3 and GNU Make. @code{alpha-unknown-openbsd3.1},
+@code{i386-unknown-openbsd3.1}.
+
+@item FreeBSD 4.7 and 4.8
+@cindex FreeBSD
+
+GCC 2.95.4 and GNU Make. @code{alpha-unknown-freebsd4.7},
+@code{alpha-unknown-freebsd4.8}, @code{i386-unknown-freebsd4.7},
+@code{i386-unknown-freebsd4.8}.
+
+@item MacOS X 10.2 Server Edition
+@cindex MacOS X
+
+GCC 3.1 and GNU Make. @code{powerpc-apple-darwin6.5}.
+
+@item MacOS X 10.4 ``Tiger'' with Xcode 2.0
+@cindex MacOS X
+
+GCC 4.0 and GNU Make. @code{powerpc-apple-darwin8.0}.
+
+@item Cross compiled to uClinux/uClibc on Motorola Coldfire
+@cindex Motorola Coldfire
+@cindex uClinux
+@cindex uClibc
+
+GCC 3.4 and GNU Make @code{m68k-uclinux-elf}.
+
+@item Cross compiled to ARM using Glibc
+@cindex ARM
+
+GCC 2.95 and GNU Make @code{arm-linux}.
+
+@item Cross compiled to Mingw32.
+@cindex Windows
+@cindex Microsoft
+@cindex mingw32
+
+GCC 3.4.4 and GNU Make @code{i586-mingw32msvc}.
+
+@end enumerate
+
+If you use Libidn on, or port Libidn to, a new platform please report
+it to the author.
+
+@node Getting help
+@section Getting help
+
+A mailing list where users of Libidn may help each other exists, and
+you can reach it by sending e-mail to @email{help-libidn@@gnu.org}.
+Archives of the mailing list discussions, and an interface to manage
+subscriptions, is available through the World Wide Web at
+@url{http://lists.gnu.org/mailman/listinfo/help-libidn}.
+
+@node Commercial Support
+@section Commercial Support
+
+Commercial support is available for users of GNU Libidn. The kind of
+support that can be purchased may include:
+
+@itemize
+
+@item Implement new features.
+Such as country code specific profiling to support a restricted subset
+of Unicode.
+
+@item Port Libidn to new platforms.
+This could include porting Libidn to an embedded platforms that may
+need memory or size optimization.
+
+@item Integrating IDN support in your existing project.
+
+@item System design of components related to IDN.
+
+@end itemize
+
+If you are interested, please write to:
+
+@verbatim
+Simon Josefsson Datakonsult
+Hagagatan 24
+113 47 Stockholm
+Sweden
+
+E-mail: simon@josefsson.org
+@end verbatim
+
+If your company provide support related to GNU Libidn and would like
+to be mentioned here, contact the author (@pxref{Bug Reports}).
+
+@node Downloading and Installing
+@section Downloading and Installing
+@cindex Installation
+@cindex Download
+
+The package can be downloaded from several places, including:
+
+@url{ftp://alpha.gnu.org/pub/gnu/libidn/}
+
+The latest version is stored in a file, e.g.,
+@samp{libidn-@value{VERSION}.tar.gz} where the @samp{@value{VERSION}}
+value is the highest version number in the directory.
+
+The package is then extracted, configured and built like many other
+packages that use Autoconf. For detailed information on configuring
+and building it, refer to the @file{INSTALL} file that is part of the
+distribution archive.
+
+Here is an example terminal session that download, configure, build
+and install the package. You will need a few basic tools, such as
+@samp{sh}, @samp{make} and @samp{cc}.
+
+@example
+$ wget -q ftp://alpha.gnu.org/pub/gnu/libidn/libidn-@value{VERSION}.tar.gz
+$ tar xfz libidn-@value{VERSION}.tar.gz
+$ cd libidn-@value{VERSION}/
+$ ./configure
+...
+$ make
+...
+$ make install
+...
+@end example
+
+After that Libidn should be properly installed and ready for use.
+
+A few @code{configure} options may be relevant, summarized in the
+table.
+
+@table @code
+
+@item --enable-java
+Build the Java port into a *.JAR file. @xref{Java API}, for more
+information.
+
+@item --disable-tld
+Disable the TLD module. This would typically only be useful if you
+are building on a memory restricted platforms. @xref{TLD Functions},
+for more information.
+
+@item --enable-csharp[=IMPL]
+Build the @code{C#} port into a @code{*.DLL} file. @xref{C# API}, for
+more information. Here, @code{IMPL} is @code{pnet} or @code{mono},
+indicating whether the PNET @command{cscc} compiler or the Mono
+@command{mcs} compiler should be used, respectively.
+
+@end table
+
+For the complete list, refer to the output from @code{configure
+--help}.
+
+@menu
+* Installing under Windows:: Windows specific build instructions.
+@end menu
+
+@node Installing under Windows
+@subsection Installing under Windows
+
+There are two ways to build Libidn on Windows: via MinGW or via Visual
+Studio.
+
+With MinGW, you can build a Libidn DLL and use it from other
+applications. After installing MinGW (@url{http://mingw.org/}) follow
+the generic installation instructions (@pxref{Downloading and
+Installing}). The DLL is installed by default.
+
+For information on how to use the DLL in other applications, see:
+@url{http://www.mingw.org/mingwfaq.shtml#faq-msvcdll}.
+
+You can build Libidn as a native Visual Studio C++ project. This
+allows you to build the code for other platforms that VS supports,
+such as Windows Mobile. You need Visual Studio 2005 or later.
+
+First download and unpack the archive as described in the generic
+installation instructions (@pxref{Downloading and Installing}). Don't
+run @code{./configure}. Instead, start Visual Studio and open the
+project file @file{win32/libidn.sln} inside the Libidn directory. You
+should be able to build the project using Build Project.
+
+Output libraries will be written into the @code{win32/lib} (or
+@code{win32/lib/debug} for Debug versions) folder.
+
+When working with Windows you may want to look into the special memory
+handling functions that may be needed (@pxref{Memory handling under
+Windows}).
+
+@node Bug Reports
+@section Bug Reports
+@cindex Reporting Bugs
+
+If you think you have found a bug in Libidn, please investigate it and
+report it.
+
+@itemize @bullet
+
+@item Please make sure that the bug is really in Libidn, and
+preferably also check that it hasn't already been fixed in the latest
+version.
+
+@item You have to send us a test case that makes it possible for us to
+reproduce the bug.
+
+@item You also have to explain what is wrong; if you get a crash, or
+if the results printed are not good and in that case, in what way.
+Make sure that the bug report includes all information you would need
+to fix this kind of bug for someone else.
+
+@end itemize
+
+Please make an effort to produce a self-contained report, with
+something definite that can be tested or debugged. Vague queries or
+piecemeal messages are difficult to act on and don't help the
+development effort.
+
+If your bug report is good, we will do our best to help you to get a
+corrected version of the software; if the bug report is poor, we won't
+do anything about it (apart from asking you to send better bug
+reports).
+
+If you think something in this manual is unclear, or downright
+incorrect, or if the language needs to be improved, please also send a
+note.
+
+Send your bug report to:
+
+@center @samp{bug-libidn@@gnu.org}
+
+
+@node Contributing
+@section Contributing
+@cindex Contributing
+@cindex Hacking
+
+If you want to submit a patch for inclusion -- from solve a typo you
+discovered, up to adding support for a new feature -- you should
+submit it as a bug report (@pxref{Bug Reports}). There are some
+things that you can do to increase the chances for it to be included
+in the official package.
+
+Unless your patch is very small (say, under 10 lines) we require that
+you assign the copyright of your work to the Free Software Foundation.
+This is to protect the freedom of the project. If you have not
+already signed papers, we will send you the necessary information when
+you submit your contribution.
+
+For contributions that doesn't consist of actual programming code, the
+only guidelines are common sense. Use it.
+
+For code contributions, a number of style guides will help you:
+
+@itemize @bullet
+
+@item Coding Style.
+Follow the GNU Standards document (@pxref{top, GNU Coding Standards,,
+standards}).
+
+If you normally code using another coding standard, there is no
+problem, but you should use @samp{indent} to reformat the code
+(@pxref{top, GNU Indent,, indent}) before submitting your work.
+
+@item Use the unified diff format @samp{diff -u}.
+
+@item Return errors.
+No reason whatsoever should abort the execution of the library. Even
+memory allocation errors, e.g. when malloc return NULL, should work
+although result in an error code.
+
+@item Design with thread safety in mind.
+Don't use global variables and the like.
+
+@item Avoid using the C math library.
+It causes problems for embedded implementations, and in most
+situations it is very easy to avoid using it.
+
+@item Document your functions.
+Use comments before each function headers, that, if properly
+formatted, are extracted into GTK-DOC web pages. Don't forget to
+update the Texinfo manual as well.
+
+@item Supply a ChangeLog and NEWS entries, where appropriate.
+
+@end itemize
+
+@c **********************************************************
+@c ******************* Preparation ************************
+@c **********************************************************
+@node Preparation
+@chapter Preparation
+
+To use `Libidn', you have to perform some changes to your sources and
+the build system. The necessary changes are small and explained in
+the following sections. At the end of this chapter, it is described
+how the library is initialized, and how the requirements of the
+library are verified.
+
+A faster way to find out how to adapt your application for use with
+`Libidn' may be to look at the examples at the end of this manual
+(@pxref{Examples}).
+
+@menu
+* Header::
+* Initialization::
+* Version Check::
+* Building the source::
+* Autoconf tests::
+* Memory handling under Windows::
+@end menu
+
+@node Header
+@section Header
+
+The library contains a few independent parts, and each part export the
+interfaces (data types and functions) in a header file. You must
+include the appropriate header files in all programs using the
+library, either directly or through some other header file, like this:
+
+@example
+#include <stringprep.h>
+@end example
+
+The header files and the functions they define are categorized as
+follows:
+
+@table @asis
+@item stringprep.h
+
+The low-level stringprep API entry point. For IDN applications, this
+is usually invoked via IDNA. Some applications, specifically non-IDN
+ones, may want to prepare strings directly though, and should include
+this header file.
+
+The name space of the stringprep part of Libidn is @code{stringprep*}
+for function names, @code{Stringprep*} for data types and
+@code{STRINGPREP_*} for other symbols. In addition,
+@code{_stringprep*} is reserved for internal use and should never be
+used by applications.
+
+@item punycode.h
+
+The entry point to Punycode encoding and decoding functions. Normally
+punycode is used via the idna.h interface, but some application may
+want to perform raw punycode operations.
+
+The name space of the punycode part of Libidn is @code{punycode_*} for
+function names, @code{Punycode*} for data types and @code{PUNYCODE_*}
+for other symbols. In addition, @code{_punycode*} is reserved for
+internal use and should never be used by applications.
+@item idna.h
+
+The entry point to the IDNA functions. This is the normal entry point
+for applications that need IDN functionality.
+
+The name space of the IDNA part of Libidn is @code{idna_*} for
+function names, @code{Idna*} for data types and @code{IDNA_*} for
+other symbols. In addition, @code{_idna*} is reserved for internal
+use and should never be used by applications.
+
+@item tld.h
+
+The entry point to the TLD functions. Normal applications are not
+expected to need this functionality, but it is present for
+applications that are used by TLDs to validate customer input.
+
+The name space of the TLD part of Libidn is @code{tld_*} for function
+names, @code{Tld_*} for data types and @code{TLD_*} for other symbols.
+In addition, @code{_tld*} is reserved for internal use and should
+never be used by applications.
+
+@item pr29.h
+
+The entry point to the PR29 functions. These functions are used to
+detect ``problem sequences'' (@pxref{PR29 Functions}), mostly for use
+in security critical applications.
+
+The name space of the PR29 part of Libidn is @code{pr29_*} for
+function names, @code{Pr29_*} for data types and @code{PR29_*} for
+other symbols. In addition, @code{_pr29*} is reserved for internal
+use and should never be used by applications.
+
+@item idn-free.h
+
+The entry point to the Windows memory de-allocation function
+(@pxref{Memory handling under Windows}). It contains only one
+function @code{idn_free}.
+
+@end table
+
+All header files defined and use the symbol @code{IDNAPI} to decorate
+the API functions.
+
+@node Initialization
+@section Initialization
+
+Libidn is stateless and does not need any initialization.
+
+@node Version Check
+@section Version Check
+
+It is often desirable to check that the version of `Libidn' used is
+indeed one which fits all requirements. Even with binary
+compatibility new features may have been introduced but due to problem
+with the dynamic linker an old version is actually used. So you may
+want to check that the version is okay right after program startup.
+
+@include texi/stringprep_check_version.texi
+
+The normal way to use the function is to put something similar to the
+following first in your @code{main}:
+
+@example
+ if (!stringprep_check_version (STRINGPREP_VERSION))
+ @{
+ printf ("stringprep_check_version() failed:\n"
+ "Header file incompatible with shared library.\n");
+ exit(1);
+ @}
+@end example
+
+@node Building the source
+@section Building the source
+@cindex Compiling your application
+
+If you want to compile a source file including e.g. the `idna.h' header
+file, you must make sure that the compiler can find it in the
+directory hierarchy. This is accomplished by adding the path to the
+directory in which the header file is located to the compilers include
+file search path (via the @option{-I} option).
+
+However, the path to the include file is determined at the time the
+source is configured. To solve this problem, `Libidn' uses the
+external package @command{pkg-config} that knows the path to the
+include file and other configuration options. The options that need
+to be added to the compiler invocation at compile time are output by
+the @option{--cflags} option to @command{pkg-config libidn}. The
+following example shows how it can be used at the command line:
+
+@example
+gcc -c foo.c `pkg-config libidn --cflags`
+@end example
+
+Adding the output of @samp{pkg-config libidn --cflags} to the
+compilers command line will ensure that the compiler can find e.g. the
+idna.h header file.
+
+A similar problem occurs when linking the program with the library.
+Again, the compiler has to find the library files. For this to work,
+the path to the library files has to be added to the library search
+path (via the @option{-L} option). For this, the option
+@option{--libs} to @command{pkg-config libidn} can be used. For
+convenience, this option also outputs all other options that are
+required to link the program with the `libidn' libarary. The example
+shows how to link @file{foo.o} with the `libidn' library to a program
+@command{foo}.
+
+@example
+gcc -o foo foo.o `pkg-config libidn --libs`
+@end example
+
+Of course you can also combine both examples to a single command by
+specifying both options to @command{pkg-config}:
+
+@example
+gcc -o foo foo.c `pkg-config libidn --cflags --libs`
+@end example
+
+@node Autoconf tests
+@section Autoconf tests
+@cindex Autoconf tests
+@cindex Configure tests
+
+If your project uses Autoconf (@pxref{top, GNU Autoconf,, autoconf})
+to check for installed libraries, you might find the following snippet
+illustrative. It add a new @file{configure} parameter
+@code{--with-libidn}, and check for @file{idna.h} and @samp{-lidn}
+(possibly below the directory specified as the optional argument to
+@code{--with-libidn}), and define the @acronym{CPP} symbol
+@code{LIBIDN} if the library is found. The default behaviour is to
+search for the library and enable the functionality (that is, define
+the symbol) when the library is found, but if you wish to make the
+default behaviour of your package be that Libidn is not used (even if
+it is installed on the system), change @samp{libidn=yes} to
+@samp{libidn=no} on the third line.
+
+@example
+AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
+ [Support IDN (needs GNU Libidn)]),
+ libidn=$withval, libidn=yes)
+if test "$libidn" != "no"; then
+ if test "$libidn" != "yes"; then
+ LDFLAGS="$@{LDFLAGS@} -L$libidn/lib"
+ CPPFLAGS="$@{CPPFLAGS@} -I$libidn/include"
+ fi
+ AC_CHECK_HEADER(idna.h,
+ AC_CHECK_LIB(idn, stringprep_check_version,
+ [libidn=yes LIBS="$@{LIBS@} -lidn"], libidn=no),
+ libidn=no)
+fi
+if test "$libidn" != "no" ; then
+ AC_DEFINE(LIBIDN, 1, [Define to 1 if you want IDN support.])
+else
+ AC_MSG_WARN([Libidn not found])
+fi
+AC_MSG_CHECKING([if Libidn should be used])
+AC_MSG_RESULT($libidn)
+@end example
+
+If you require that your users have installed @code{pkg-config} (which
+I cannot recommend generally), the above can be done more easily as
+follows.
+
+@example
+AC_ARG_WITH(libidn, AC_HELP_STRING([--with-libidn=[DIR]],
+ [Support IDN (needs GNU Libidn)]),
+ libidn=$withval, libidn=yes)
+if test "$libidn" != "no" ; then
+ PKG_CHECK_MODULES(LIBIDN, libidn >= 0.0.0, [libidn=yes], [libidn=no])
+ if test "$libidn" != "yes" ; then
+ libidn=no
+ AC_MSG_WARN([Libidn not found])
+ else
+ libidn=yes
+ AC_DEFINE(LIBIDN, 1, [Define to 1 if you want Libidn.])
+ fi
+fi
+AC_MSG_CHECKING([if Libidn should be used])
+AC_MSG_RESULT($libidn)
+@end example
+
+@node Memory handling under Windows
+@section Memory handling under Windows
+@cindex free
+@cindex Memory handling
+@cindex de-allocation
+@cindex heap memory
+
+Several functions in the library allocates memory. The memory is
+expected to be de-allocated using the @code{free} function. Under
+Windows, it is sometimes necessary to de-allocate memory in the same
+module that allocated a memory region. The reason is that different
+modules use separate heap memory regions. To solve this problem we
+provide a function to de-allocate memory inside the library.
+
+Note that we do not recommend using this interface generally if you do
+not care about Windows portability.
+
+@section Header file @code{idn-free.h}
+
+To use the function explained in this chapter, you need to include the
+file @file{idn-free.h} using:
+
+@example
+#include <idn-free.h>
+@end example
+
+@section Memory de-allocation function
+
+@include texi/idn_free.texi
+
+@c **********************************************************
+@c ******************** Utility Functions ******************
+@c **********************************************************
+@node Utility Functions
+@chapter Utility Functions
+@cindex Utility Functions
+
+The rest of this library makes extensive use of Unicode characters.
+In order to interface this library with the outside world, your
+application may need to make various Unicode transformations.
+
+@section Header file @code{stringprep.h}
+
+To use the functions explained in this chapter, you need to include
+the file @file{stringprep.h} using:
+
+@example
+#include <stringprep.h>
+@end example
+
+@section Unicode Encoding Transformation
+
+@include texi/stringprep_unichar_to_utf8.texi
+@include texi/stringprep_utf8_to_unichar.texi
+@include texi/stringprep_ucs4_to_utf8.texi
+@include texi/stringprep_utf8_to_ucs4.texi
+
+@section Unicode Normalization
+
+@include texi/stringprep_ucs4_nfkc_normalize.texi
+@include texi/stringprep_utf8_nfkc_normalize.texi
+
+@section Character Set Conversion
+
+@include texi/stringprep_locale_charset.texi
+@include texi/stringprep_convert.texi
+@include texi/stringprep_locale_to_utf8.texi
+@include texi/stringprep_utf8_to_locale.texi
+
+
+@c **********************************************************
+@c ****************** Stringprep Functions *****************
+@c **********************************************************
+@node Stringprep Functions
+@chapter Stringprep Functions
+@cindex Stringprep Functions
+
+Stringprep describes a framework for preparing Unicode text strings in
+order to increase the likelihood that string input and string
+comparison work in ways that make sense for typical users throughout
+the world. The stringprep protocol is useful for protocol identifier
+values, company and personal names, internationalized domain names,
+and other text strings.
+
+@section Header file @code{stringprep.h}
+
+To use the functions explained in this chapter, you need to include
+the file @file{stringprep.h} using:
+
+@example
+#include <stringprep.h>
+@end example
+
+@section Defining A Stringprep Profile
+
+Further types and structures are defined for applications that want to
+specify their own stringprep profile. As these are fairly obscure,
+and by necessity tied to the implementation, we do not document them
+here. Look into the @file{stringprep.h} header file, and the
+@file{profiles.c} source code for the details.
+
+@section Control Flags
+
+@deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_NFKC}
+Disable the NFKC normalization, as well as selecting the non-NFKC case
+folding tables. Usually the profile specifies BIDI and NFKC settings,
+and applications should not override it unless in special situations.
+@end deftypevr
+
+@deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_BIDI}
+Disable the BIDI step. Usually the profile specifies BIDI and NFKC
+settings, and applications should not override it unless in special
+situations.
+@end deftypevr
+
+@deftypevr {Stringprep flags} {Stringprep_profile_flags} {STRINGPREP_NO_UNASSIGNED}
+Make the library return with an error if string contains unassigned
+characters according to profile.
+@end deftypevr
+
+@section Core Functions
+
+@include texi/stringprep_4i.texi
+@include texi/stringprep_4zi.texi
+@include texi/stringprep.texi
+@include texi/stringprep_profile.texi
+
+@section Error Handling
+
+@include texi/stringprep_strerror.texi
+
+@section Stringprep Profile Macros
+
+@deftypefun {int} stringprep_nameprep_no_unassigned (char * @var{in}, int @var{maxlen})
+
+@var{in}: input/ouput array with string to prepare.
+
+@var{maxlen}: maximum length of input/output array.
+
+Prepare the input UTF-8 string according to the nameprep profile. The
+AllowUnassigned flag is false, use @code{stringprep_nameprep} for
+true AllowUnassigned. Returns 0 iff successful, or an error code.
+@end deftypefun
+
+@deftypefun {int} stringprep_iscsi (char * @var{in}, int @var{maxlen})
+
+@var{in}: input/ouput array with string to prepare.
+
+@var{maxlen}: maximum length of input/output array.
+
+Prepare the input UTF-8 string according to the draft iSCSI stringprep
+profile. Returns 0 iff successful, or an error code.
+@end deftypefun
+
+@deftypefun {int} stringprep_plain (char * @var{in}, int @var{maxlen})
+
+@var{in}: input/ouput array with string to prepare.
+
+@var{maxlen}: maximum length of input/output array.
+
+Prepare the input UTF-8 string according to the draft SASL ANONYMOUS
+profile. Returns 0 iff successful, or an error code.
+@end deftypefun
+
+@deftypefun {int} stringprep_xmpp_nodeprep (char * @var{in}, int @var{maxlen})
+
+@var{in}: input/ouput array with string to prepare.
+
+@var{maxlen}: maximum length of input/output array.
+
+Prepare the input UTF-8 string according to the draft XMPP node
+identifier profile. Returns 0 iff successful, or an error code.
+@end deftypefun
+
+@deftypefun {int} stringprep_xmpp_resourceprep (char * @var{in}, int @var{maxlen})
+
+@var{in}: input/ouput array with string to prepare.
+
+@var{maxlen}: maximum length of input/output array.
+
+Prepare the input UTF-8 string according to the draft XMPP resource
+identifier profile. Returns 0 iff successful, or an error code.
+@end deftypefun
+
+@c **********************************************************
+@c ******************* Punycode Functions ******************
+@c **********************************************************
+@node Punycode Functions
+@chapter Punycode Functions
+@cindex Punycode Functions
+
+Punycode is a simple and efficient transfer encoding syntax designed
+for use with Internationalized Domain Names in Applications. It
+uniquely and reversibly transforms a Unicode string into an ASCII
+string. ASCII characters in the Unicode string are represented
+literally, and non-ASCII characters are represented by ASCII
+characters that are allowed in host name labels (letters, digits, and
+hyphens). A general algorithm called Bootstring allows a string of
+basic code points to uniquely represent any string of code points
+drawn from a larger set. Punycode is an instance of Bootstring that
+uses particular parameter values, appropriate for IDNA.
+
+@section Header file @code{punycode.h}
+
+To use the functions explained in this chapter, you need to include
+the file @file{punycode.h} using:
+
+@example
+#include <punycode.h>
+@end example
+
+@section Unicode Code Point Data Type
+
+The punycode function uses a special type to denote Unicode code
+points. It is guaranteed to always be a 32 bit unsigned integer.
+
+@deftypevr {Punycode Unicode code point} uint32_t punycode_uint
+A unsigned integer that hold Unicode code points.
+@end deftypevr
+
+@section Core Functions
+
+Note that the current implementation will fail if the
+@code{input_length} exceed 4294967295 (the size of
+@code{punycode_uint}). This restriction may be removed in the future.
+Meanwhile applications are encouraged to not depend on this problem,
+and use @code{sizeof} to initialize @code{input_length} and
+@code{output_length}.
+
+The functions provided are the following two entry points:
+
+@include texi/punycode_encode.texi
+@include texi/punycode_decode.texi
+
+@section Error Handling
+
+@include texi/punycode_strerror.texi
+
+@c **********************************************************
+@c ********************* IDNA Functions *********************
+@c **********************************************************
+@node IDNA Functions
+@chapter IDNA Functions
+@cindex IDNA Functions
+
+Until now, there has been no standard method for domain names to use
+characters outside the ASCII repertoire. The IDNA document defines
+internationalized domain names (IDNs) and a mechanism called IDNA for
+handling them in a standard fashion. IDNs use characters drawn from a
+large repertoire (Unicode), but IDNA allows the non-ASCII characters
+to be represented using only the ASCII characters already allowed in
+so-called host names today. This backward-compatible representation is
+required in existing protocols like DNS, so that IDNs can be
+introduced with no changes to the existing infrastructure. IDNA is
+only meant for processing domain names, not free text.
+
+@section Header file @code{idna.h}
+
+To use the functions explained in this chapter, you need to include
+the file @file{idna.h} using:
+
+@example
+#include <idna.h>
+@end example
+
+@section Control Flags
+
+The IDNA @code{flags} parameter can take on the following values, or a
+bit-wise inclusive or of any subset of the parameters:
+
+@deftypevr {Return code} {Idna_flags} IDNA_ALLOW_UNASSIGNED
+Allow unassigned Unicode code points.
+@end deftypevr
+
+@deftypevr {Return code} {Idna_flags} IDNA_USE_STD3_ASCII_RULES
+Check output to make sure it is a STD3 conforming host name.
+@end deftypevr
+
+@section Prefix String
+
+@deftypevr {Macro} {#define} IDNA_ACE_PREFIX
+String with the official IDNA prefix, @code{xn--}.
+@end deftypevr
+
+@section Core Functions
+
+The idea behind the IDNA function names are as follows: the
+@code{idna_to_ascii_4i} and @code{idna_to_unicode_44i} functions are
+the core IDNA primitives. The @code{4} indicate that the function
+takes UCS-4 strings (i.e., Unicode code points encoded in a 32-bit
+unsigned integer type) of the specified length. The @code{i} indicate
+that the data is written ``inline'' into the buffer. This means the
+caller is responsible for allocating (and deallocating) the string,
+and providing the library with the allocated length of the string.
+The output length is written in the output length variable. The
+remaining functions all contain the @code{z} indicator, which means
+the strings are zero terminated. All output strings are allocated by
+the library, and must be deallocated by the caller. The @code{4}
+indicator again means that the string is UCS-4, the @code{8} means the
+strings are UTF-8 and the @code{l} indicator means the strings are
+encoded in the encoding used by the current locale.
+
+The functions provided are the following entry points:
+
+@include texi/idna_to_ascii_4i.texi
+@include texi/idna_to_unicode_44i.texi
+
+@section Simplified ToASCII Interface
+
+@include texi/idna_to_ascii_4z.texi
+@include texi/idna_to_ascii_8z.texi
+@include texi/idna_to_ascii_lz.texi
+
+@section Simplified ToUnicode Interface
+
+@include texi/idna_to_unicode_4z4z.texi
+@include texi/idna_to_unicode_8z4z.texi
+@include texi/idna_to_unicode_8z8z.texi
+@include texi/idna_to_unicode_8zlz.texi
+@include texi/idna_to_unicode_lzlz.texi
+
+@section Error Handling
+
+@include texi/idna_strerror.texi
+
+@c **********************************************************
+@c ********************** TLD Functions *********************
+@c **********************************************************
+@node TLD Functions
+@chapter TLD Functions
+@cindex TLD Functions
+
+Organizations that manage some Top Level Domains (@acronym{TLD}s) have
+published tables with characters they accept within the domain. The
+reason may be to reduce complexity that come from using the full
+Unicode range, and to protect themselves from future (backwards
+incompatible) changes in the IDN or Unicode specifications. Libidn
+implement an infrastructure for defining and checking strings against
+such tables. Libidn also ship some tables from @acronym{TLD}s that we
+have managed to get permission to use them from. Because these tables
+are even less static than Unicode or StringPrep tables, it is likely
+that they will be updated from time to time (even in backwards
+incompatibe ways). The Libidn interface provide a ``version'' field
+for each @acronym{TLD} table, which can be compared for equality to
+guarantee the same operation over time.
+
+From a design point of view, you can regard the @acronym{TLD} tables
+for IDN as the ``localization'' step that come after the
+``internationalization'' step provided by the IETF standards.
+
+The TLD functionality rely on up-to-date tables. The latest version
+of Libidn aim to provide these, but tables with unclear copying
+conditions, or generally experimental tables, are not included. Some
+such tables can be found at @url{http://tldchk.berlios.de}.
+
+@section Header file @code{tld.h}
+
+To use the functions explained in this chapter, you need to include
+the file @file{tld.h} using:
+
+@example
+#include <tld.h>
+@end example
+
+@c @section Data Types
+@c
+@c @deftp {Data type} {Tld_table_element} @var{start} @var{end}
+@c @example
+@c /* Interval of valid code points in the TLD. */
+@c struct Tld_table_element
+@c @{
+@c uint32_t start; /* Start of range. */
+@c uint32_t end; /* End of range, end == start if single. */
+@c @};
+@c typedef struct Tld_table_element Tld_table_element;
+@c @end example
+@c This @code{struct} contain the @var{start} and @var{end} positions
+@c (inclusive) of a range. If the range is a single (i.e., starts and
+@c ends in the same character), then set @var{end} to the same as
+@c @var{start}. This structure is normally used as an array.
+@c @end deftp
+@c
+@c @deftp {Data type} {Tld_table} @var{name} @var{version} @var{nvalid} @var{valid}
+@c @example
+@c /* List valid code points in a TLD. */
+@c struct Tld_table
+@c @{
+@c char *name; /* TLD name, e.g., "no". */
+@c char *version; /* Version string from TLD file. */
+@c size_t nvalid; /* Number of entries in data. */
+@c Tld_table_element *valid[]; /* Sorted array of valid code points. */
+@c @};
+@c typedef struct Tld_table Tld_table;
+@c @end example
+@c In this @code{struct}, the @var{name} field is a string (@samp{char*})
+@c indicating the TLD name (e.g., ``no''). The @var{version} field is a
+@c string (@samp{char*}) containing a free form humanly readable string
+@c that can be used for equality comparison to compare different versions
+@c of the table. The @var{nvalid} field indicate how many entries there
+@c are in @var{valid}, which brings us finally to @var{valid} that
+@c contain the actual code points that are valid for this TLD (see
+@c @code{Tld_table_element} above).
+@c @end deftp
+
+@section Core Functions
+
+@include texi/tld_check_4t.texi
+@include texi/tld_check_4tz.texi
+
+@section Utility Functions
+
+@include texi/tld_get_4.texi
+@include texi/tld_get_4z.texi
+@include texi/tld_get_z.texi
+@include texi/tld_get_table.texi
+@include texi/tld_default_table.texi
+
+@section High-Level Wrapper Functions
+
+@include texi/tld_check_4.texi
+@include texi/tld_check_4z.texi
+@include texi/tld_check_8z.texi
+@include texi/tld_check_lz.texi
+
+@section Error Handling
+
+@include texi/tld_strerror.texi
+
+@c **********************************************************
+@c ********************** PR29 Functions ********************
+@c **********************************************************
+@node PR29 Functions
+@chapter PR29 Functions
+@cindex PR29 Functions
+
+A deficiency in the specification of Unicode Normalization Forms has
+been found. The consequence is that some strings can be normalized
+into different strings by different implementations. In other words,
+two different implementations may return different output for the same
+input (because the interpretation of the specification is
+ambiguous). Further, an implementation invoked again on the one of the
+output strings may return a different string (because one of the
+interpretation of the ambiguous specification make normalization
+non-idempotent). Fortunately, only a select few character sequence
+exhibit this problem, and none of them are expected to occur in
+natural languages (due to different linguistic uses of the involved
+characters).
+
+A full discussion of the problem may be found at:
+
+@url{http://www.unicode.org/review/pr-29.html}
+
+The PR29 functions below allow you to detect the problem sequence. So
+when would you want to use these functions? For most applications,
+such as those using Nameprep for IDN, this is likely only to be an
+interoperability problem. Thus, you may not want to care about it, as
+the character sequences will rarely occur naturally. However, if you
+are using a profile, such as SASLPrep, to process authentication
+tokens; authorization tokens; or passwords, there is a real danger
+that attackers may try to use the peculiarities in these strings to
+attack parts of your system. As only a small number of strings, and
+no naturally occurring strings, exhibit this problem, the conservative
+approach of rejecting the strings is recommended. If this approach is
+not used, you should instead verify that all parts of your system,
+that process the tokens and passwords, use a NFKC implementation that
+produce the same output for the same input.
+
+Technically inclined readers may be interested in knowing more about
+the implementation aspects of the PR29 flaw. @xref{PR29 discussion}.
+
+@section Header file @code{pr29.h}
+
+To use the functions explained in this chapter, you need to include
+the file @file{pr29.h} using:
+
+@example
+#include <pr29.h>
+@end example
+
+@section Core Functions
+
+@include texi/pr29_4.texi
+
+@section Utility Functions
+
+@include texi/pr29_4z.texi
+@include texi/pr29_8z.texi
+
+@section Error Handling
+
+@include texi/pr29_strerror.texi
+
+@c **********************************************************
+@c *********************** Examples ***********************
+@c **********************************************************
+@node Examples
+@chapter Examples
+@cindex Examples
+
+This chapter contains example code which illustrate how `Libidn' can
+be used when writing your own application.
+
+@menu
+* Example 1:: Example using stringprep.
+* Example 2:: Example using punycode.
+* Example 3:: Example using IDNA ToASCII.
+* Example 4:: Example using IDNA ToUnicode.
+* Example 5:: Example using TLD checking.
+@end menu
+
+@node Example 1
+@section Example 1
+
+This example demonstrates how the stringprep functions are used.
+
+@verbatiminclude example.c
+
+@node Example 2
+@section Example 2
+
+This example demonstrates how the punycode functions are used.
+
+@verbatiminclude example2.c
+
+@node Example 3
+@section Example 3
+
+This example demonstrates how the library is used to convert
+internationalized domain names into ASCII compatible names.
+
+@verbatiminclude example3.c
+
+@node Example 4
+@section Example 4
+
+This example demonstrates how the library is used to convert ASCII
+compatible names to internationalized domain names.
+
+@verbatiminclude example4.c
+
+@node Example 5
+@section Example 5
+
+This example demonstrates how the library is used to check a string
+for invalid characters within a specific TLD.
+
+@verbatiminclude example5.c
+
+@c **********************************************************
+@c ********************* Invoking idn *********************
+@c **********************************************************
+@node Invoking idn
+@chapter Invoking idn
+
+@pindex idn
+@cindex invoking @command{idn}
+@cindex command line
+
+@section Name
+
+GNU Libidn (idn) -- Internationalized Domain Names command line tool
+
+@section Description
+@code{idn} allows internationalized string preparation
+(@samp{stringprep}), encoding and decoding of punycode data, and IDNA
+ToASCII/ToUnicode operations to be performed on the command line.
+
+If strings are specified on the command line, they are used as input
+and the computed output is printed to standard output @code{stdout}.
+If no strings are specified on the command line, the program read
+data, line by line, from the standard input @code{stdin}, and print
+the computed output to standard output. What processing is performed
+(e.g., ToASCII, or Punycode encode) is indicated by options. If any
+errors are encountered, the execution of the applications is aborted.
+
+All strings are expected to be encoded in the preferred charset used
+by your locale. Use @code{--debug} to find out what this charset is.
+You can override the charset used by setting environment variable
+@code{CHARSET}.
+
+To process a string that starts with @code{-}, for example
+@code{-foo}, use @code{--} to signal the end of parameters, as in
+@code{idn --quiet -a -- -foo}.
+
+@section Options
+@code{idn} recognizes these commands:
+
+@verbatim
+ -h, --help Print help and exit
+
+ -V, --version Print version and exit
+
+ -s, --stringprep Prepare string according to nameprep profile
+
+ -d, --punycode-decode Decode Punycode
+
+ -e, --punycode-encode Encode Punycode
+
+ -a, --idna-to-ascii Convert to ACE according to IDNA (default mode)
+
+ -u, --idna-to-unicode Convert from ACE according to IDNA
+
+ --allow-unassigned Toggle IDNA AllowUnassigned flag (default off)
+
+ --usestd3asciirules Toggle IDNA UseSTD3ASCIIRules flag (default off)
+
+ --no-tld Don't check string for TLD specific rules
+ Only for --idna-to-ascii and --idna-to-unicode
+
+ -n, --nfkc Normalize string according to Unicode v3.2 NFKC
+
+ -p, --profile=STRING Use specified stringprep profile instead
+ Valid stringprep profiles: `Nameprep',
+ `iSCSI', `Nodeprep', `Resourceprep',
+ `trace', `SASLprep'
+
+ --debug Print debugging information
+
+ --quiet Silent operation
+@end verbatim
+
+@section Environment Variables
+
+The @var{CHARSET} environment variable can be used to override what
+character set to be used for decoding incoming data (i.e., on the
+command line or on the standard input stream), and to encode data to
+the standard output. If your system is set up correctly, however, the
+application will guess which character set is used automatically.
+Example usage:
+
+@example
+$ CHARSET=ISO-8859-1 idn --punycode-encode
+...
+@end example
+
+@section Examples
+
+Standard usage, reading input from standard input:
+
+@example
+jas@@latte:~$ idn
+libidn 0.3.5
+Copyright 2002, 2003 Simon Josefsson.
+GNU Libidn comes with NO WARRANTY, to the extent permitted by law.
+You may redistribute copies of GNU Libidn under the terms of
+the GNU Lesser General Public License. For more information
+about these matters, see the file named COPYING.LIB.
+Type each input string on a line by itself, terminated by a newline character.
+r@"aksm@"org@aa{}s.se
+xn--rksmrgs-5wao1o.se
+jas@@latte:~$
+@end example
+
+Reading input from command line, and disabling copyright and license
+information:
+
+@example
+jas@@latte:~$ idn --quiet r@"aksm@"org@aa{}s.se bl@aa{}b@ae{}rgr@o{}d.no
+xn--rksmrgs-5wao1o.se
+xn--blbrgrd-fxak7p.no
+jas@@latte:~$
+@end example
+
+Accessing a specific StringPrep profile directly:
+
+@example
+jas@@latte:~$ idn --quiet --profile=SASLprep --stringprep te@ss{}t@ordf{}
+te@ss{}ta
+jas@@latte:~$
+@end example
+
+@section Troubleshooting
+
+Getting character data encoded right, and making sure Libidn use the
+same encoding, can be difficult. The reason for this is that most
+systems encode character data in more than one character encoding,
+i.e., using @code{UTF-8} together with @code{ISO-8859-1} or
+@code{ISO-2022-JP}. This problem is likely to continue to exist until
+only one character encoding come out as the evolutionary winner, or
+(more likely, at least to some extents) forever.
+
+The first step to troubleshooting character encoding problems with
+Libidn is to use the @samp{--debug} parameter to find out which
+character set encoding @samp{idn} believe your locale uses.
+
+@example
+jas@@latte:~$ idn --debug --quiet ""
+system locale uses charset `UTF-8'.
+
+jas@@latte:~$
+@end example
+
+If it prints @code{ANSI_X3.4-1968} (i.e., @code{US-ASCII}), this
+indicate you have not configured your locale properly. To configure
+the locale, you can, for example, use @samp{LANG=sv_SE.UTF-8; export
+LANG} at a @code{/bin/sh} prompt, to set up your locale for a Swedish
+environment using @code{UTF-8} as the encoding.
+
+Sometimes @samp{idn} appear to be unable to translate from your system
+locale into @code{UTF-8} (which is used internally), and you get an
+error like the following:
+
+@example
+jas@@latte:~$ idn --quiet foo
+idn: could not convert from ISO-8859-1 to UTF-8.
+jas@@latte:~$
+@end example
+
+The simplest explanation is that you haven't installed the
+@samp{iconv} conversion tools. You can find it as a standalone
+library in @acronym{GNU} Libiconv
+(@uref{http://www.gnu.org/software/libiconv/}). On many
+@acronym{GNU}/Linux systems, this library is part of the system, but
+you may have to install additional packages (e.g., @samp{glibc-locale}
+for Debian) to be able to use it.
+
+Another explanation is that the error is correct and you are feeding
+@samp{idn} invalid data. This can happen inadvertently if you are not
+careful with the character set encodings you use. For example, if
+your shell run in a @code{ISO-8859-1} environment, and you invoke
+@samp{idn} with the @samp{CHARSET} environment variable as follows,
+you will feed it @code{ISO-8859-1} characters but force it to believe
+they are @code{UTF-8}. Naturally this will lead to an error, unless
+the byte sequences happen to be parsable as @code{UTF-8}. Note that
+even if you don't get an error, the output may be incorrect in this
+situation, because @code{ISO-8859-1} and @code{UTF-8} does not in
+general encode the same characters as the same byte sequences.
+
+@example
+jas@@latte:~$ idn --quiet --debug ""
+system locale uses charset `ISO-8859-1'.
+
+jas@@latte:~$ CHARSET=UTF-8 idn --quiet --debug r@"aksm@"org@aa{}s
+system locale uses charset `UTF-8'.
+input[0] = U+0072
+input[1] = U+4af3
+input[2] = U+006d
+input[3] = U+1b29e5
+input[4] = U+0073
+output[0] = U+0078
+output[1] = U+006e
+output[2] = U+002d
+output[3] = U+002d
+output[4] = U+0072
+output[5] = U+006d
+output[6] = U+0073
+output[7] = U+002d
+output[8] = U+0068
+output[9] = U+0069
+output[10] = U+0036
+output[11] = U+0064
+output[12] = U+0035
+output[13] = U+0039
+output[14] = U+0037
+output[15] = U+0035
+output[16] = U+0035
+output[17] = U+0032
+output[18] = U+0061
+xn--rms-hi6d597552a
+jas@@latte:~$
+@end example
+
+The sense moral here is to forget about @samp{CHARSET} (configure your
+locales properly instead) unless you know what you are doing, and if
+you want to use it, do it carefully, after verifying with
+@samp{--debug} that you get the desired results.
+
+@node Emacs API
+@chapter Emacs API
+
+Included in Libidn are @file{punycode.el} and @file{idna.el} that
+provides an Emacs Lisp API to (a limited set of) the Libidn API. This
+section describes the API. Currently the IDNA API always set the
+@code{UseSTD3ASCIIRules} flag and clear the @code{AllowUnassigned}
+flag, in the future there may be functionality to specify these flags
+via the API.
+
+@section Punycode Emacs API
+
+@defvar punycode-program
+Name of the GNU Libidn @file{idn} application. The default is
+@samp{idn}. This variable can be customized.
+@end defvar
+
+@defvar punycode-environment
+List of environment variable definitions prepended to
+@samp{process-environment}. The default is @samp{("CHARSET=UTF-8")}.
+This variable can be customized.
+@end defvar
+
+@defvar punycode-encode-parameters
+List of parameters passed to @var{punycode-program} to invoke punycode
+encoding mode. The default is @samp{("--quiet" "--punycode-encode")}.
+This variable can be customized.
+@end defvar
+
+@defvar punycode-decode-parameters
+Parameters passed to @var{punycode-program} to invoke punycode
+decoding mode. The default is @samp{("--quiet" "--punycode-decode")}.
+This variable can be customized.
+@end defvar
+
+@defun punycode-encode string
+Returns a Punycode encoding of the @var{string}, after converting the
+input into UTF-8.
+@end defun
+
+@defun punycode-decode string
+Returns a possibly multibyte string which is the decoding of the
+@var{string} which is a punycode encoded string.
+@end defun
+
+@section IDNA Emacs API
+
+@defvar idna-program
+Name of the GNU Libidn @file{idn} application. The default is
+@samp{idn}. This variable can be customized.
+@end defvar
+
+@defvar idna-environment
+List of environment variable definitions prepended to
+@samp{process-environment}. The default is @samp{("CHARSET=UTF-8")}.
+This variable can be customized.
+@end defvar
+
+@defvar idna-to-ascii-parameters
+List of parameters passed to @var{idna-program} to invoke IDNA ToASCII
+mode. The default is @samp{("--quiet" "--idna-to-ascii"
+"--usestd3asciirules")}. This variable can be customized.
+@end defvar
+
+@defvar idna-to-unicode-parameters
+Parameters passed @var{idna-program} to invoke IDNA ToUnicode mode.
+The default is @samp{("--quiet" "--idna-to-unicode"
+"--usestd3asciirules")}. This variable can be customized.
+@end defvar
+
+@defun idna-to-ascii string
+Returns an ASCII Compatible Encoding (ACE) of the string computed by
+the IDNA ToASCII operation on the input @var{string}, after converting
+the input to UTF-8.
+@end defun
+
+@defun idna-to-unicode string
+Returns a possibly multibyte string which is the output of the IDNA
+ToUnicode operation computed on the input @var{string}.
+@end defun
+
+@node Java API
+@chapter Java API
+
+Libidn has been ported to the Java programming language, and as a
+consequence most of the API is available to native Java applications.
+This section contain notes on this support, complete documentation is
+pending.
+
+The Java library, if Libidn has been built with Java support
+(@pxref{Downloading and Installing}), will be placed in
+@file{java/libidn-@value{VERSION}.jar}. The source code is located in
+@file{java/gnu/inet/encoding/}.
+
+@section Overview
+
+This package provides a Java implementation of the Internationalized
+Domain Names in Applications (IDNA) standard. It is written entirely
+in Java and does not require any additional libraries to be set up.
+
+The gnu.inet.encoding.IDNA class offers two public functions, toASCII
+and toUnicode which can be used as follows:
+
+@example
+gnu.inet.encoding.IDNA.toASCII("bl@"ods.z@"ug");
+gnu.inet.encoding.IDNA.toUnicode("xn--blds-6qa.xn--zg-xka");
+@end example
+
+@section Miscellaneous Programs
+
+The @file{misc/} directory contains several programs that are related
+to the Java part of GNU Libidn, but that don't need to be included in
+the main source tree.
+
+@subsection GenerateRFC3454
+
+This program parses RFC3454 and creates the RFC3454.java program that
+is required during the StringPrep phase.
+
+The RFC can be found at various locations, for example at
+@url{http://www.ietf.org/rfc/rfc3454.txt}.
+
+Invoke the program as follows:
+
+@example
+$ java GenerateRFC3454
+Creating RFC3454.java... Ok.
+@end example
+
+@subsection GenerateNFKC
+
+The GenerateNFKC program parses the Unicode character database file
+and generates all the tables required for NFKC. This program requires
+the two files UnicodeData.txt and CompositionExclusions.txt of version
+3.2 of the Unicode files. Note that RFC3454 (Stringprep) defines that
+Unicode version 3.2 is to be used, not the latest version.
+
+The Unicode data files can be found at
+@url{http://www.unicode.org/Public/}.
+
+Invoke the program as follows:
+
+@example
+$ java GenerateNFKC
+Creating CombiningClass.java... Ok.
+Creating DecompositionKeys.java... Ok.
+Creating DecompositionMappings.java... Ok.
+Creating Composition.java... Ok.
+@end example
+
+@subsection TestIDNA
+
+The TestIDNA program allows to test the IDNA implementation manually
+or against Simon Josefsson's test vectors.
+
+The test vectors can be found at the Libidn homepage,
+@url{http://www.gnu.org/software/libidn/}.
+
+To test the tranformation manually, use:
+
+@example
+$ java -cp .:../libidn.jar TestIDNA -a <string to test>
+Input: <string to test>
+Output: <toASCII(string to test)>
+$ java -cp .:../libidn.jar TestIDNA -u <string to test>
+Input: <string to test>
+Output: <toUnicode(string to test)>
+@end example
+
+To test against draft-josefsson-idn-test-vectors.html, use:
+
+@example
+$ java -cp .:../libidn.jar TestIDNA -t
+No errors detected!
+@end example
+
+@subsection TestNFKC
+
+The TestNFKC program allows to test the NFKC implementation manually
+or against the NormalizationTest.txt file from the Unicode data files.
+
+To test the normalization manually, use:
+
+@example
+$ java -cp .:../libidn.jar TestNFKC <string to test>
+Input: <string to test>
+Output: <nfkc version of the string to test>
+@end example
+
+To test against NormalizationTest.txt:
+
+@example
+$ java -cp .:../libidn.jar TestNFKC
+No errors detected!
+@end example
+
+@section Possible Problems
+
+Beware of Bugs: This Java API needs a lot more testing, especially
+with "exotic" character sets. While it works for me, it may not work
+for you.
+
+Encoding of your Java sources: If you are using non-ASCII characters
+in your Java source code, make sure javac compiles your programs with
+the correct encoding. If necessary specify the encoding using the
+-encoding parameter.
+
+Java Unicode handling: Java 1.4 only handles 16-bit Unicode code
+points (i.e. characters in the Basic Multilingual Plane), this
+implementation therefore ignores all references to so-called
+Supplementary Characters (U+10000 to U+10FFFF). Starting from Java
+1.5, these characters will also be supported by Java, but this will
+require changes to this library. See also the next section.
+
+@section A Note on Java and Unicode
+
+This library uses Java's builtin 'char' datatype. Up to Java 1.4, this
+datatype only supports 16-bit Unicode code points, also called the
+Basic Multilingual Plane. For this reason, this library doesn't work
+for Supplementary Characters (i.e. characters from U+10000 to
+U+10FFFF). All references to such characters are silently ignored.
+
+Starting from Java 1.5, also Supplementary Characters will be
+supported. However, this will require changes in the present version
+of the library. Java 1.5 is currently in beta status.
+
+For more information refer to the documentation of java.lang.Character
+in the JDK API.
+
+@node C# API
+@chapter C# API
+
+The Libidn library has been ported to the C# language. The port
+reside in the top-level @file{csharp/} directory. Currently, no
+further documentation about the implementation or the API is
+available. However, the C# port was based on the Java port, and the
+API is exactly the same as in the Java version. The help files for
+the Java API may thus be useful.
+
+@c **********************************************************
+@c ******************* Acknowledgements *******************
+@c **********************************************************
+@node Acknowledgements
+@chapter Acknowledgements
+
+The punycode implementation was taken from the IETF IDN Punycode
+specification, by Adam M. Costello. The TLD code was contributed by
+Thomas Jacob. The Java implementation was contributed by Oliver Hitz.
+The C# implementation was contributed by Alexander Gnauck. The
+Unicode tables were provided by Unicode, Inc. Some functions for
+dealing with Unicode (see nfkc.c and toutf8.c) were borrowed from
+GLib, downloaded from @url{http://www.gtk.org/}. The manual borrowed
+text from Libgcrypt by Werner Koch.
+
+Inspiration for many things that, consciously or not, have gone into
+this package is due to a number of free software package that the
+author has been exposed to. The author wishes to acknowledge the free
+software community in general, for giving an example on what role
+software development can play in the modern society.
+
+Several people reported bugs, sent patches or suggested improvements,
+see the file THANKS in the top-level directory of the source code.
+
+@c **********************************************************
+@c ************************ History ***********************
+@c **********************************************************
+@node History
+@chapter History
+
+The complete history of user visible changes is stored in the file
+@file{NEWS} in the top-level directory of the source code tree. The
+complete history of modifications to each file is stored in the file
+@file{ChangeLog} in the same directory. This section contain a
+condensed version of that information, in the form of ``milestones''
+for the project.
+
+@table @asis
+@item Stringprep implementation.
+Version 0.0.0 released on 2002-11-05.
+
+@item IDNA and Punycode implementations, part of the GNU project.
+Version 0.1.0 released on 2003-01-05.
+
+@item Uses official IDNA ACE prefix @code{xn--}.
+Version 0.1.7 released on 2003-02-12.
+
+@item Command line interface.
+Version 0.1.11 released on 2003-02-26.
+
+@item GNU Libc add-on proposed.
+Version 0.1.12 released on 2003-03-06.
+
+@item Interoperability testing during IDNConnect.
+Version 0.3.1 released on 2003-10-02.
+
+@item TLD restriction testing.
+Version 0.4.0 released on 2004-02-28.
+
+@item GNU Libc add-on integrated.
+Version 0.4.1 released on 2004-03-08.
+
+@item Native Java implementation.
+Version 0.4.2-0.4.9 released between 2004-03-20 and 2004-06-11.
+
+@item PR-29 functions for ``problem sequences''.
+Version 0.5.0 released on 2004-06-26.
+
+@item Many small portability fixes and wider use.
+Version 0.5.1 through 0.5.20, released between 2004-07-09 and
+2005-10-23.
+
+@item Native C# implementation.
+Version 0.6.0 released on 2005-12-03.
+
+@item Windows support through cross-compilation.
+Version 0.6.1 released on 2006-01-20.
+
+@item Library declared stable by releasing v1.0.
+Version 1.0 released on 2007-07-31.
+
+@end table
+
+@node PR29 discussion
+@appendix PR29 discussion
+
+If you wish to experiment with a modified Unicode NFKC implementation
+according to the PR29 proposal, you may find the following bug report
+useful. However, I have not verified that the suggested modifications
+are correct. For reference, I'm including my response to the report
+as well.
+
+@verbatim
+From: Rick McGowan <rick@unicode.org>
+Subject: Possible bug and status of PR 29 change(s)
+To: bug-libidn@gnu.org
+Date: Wed, 27 Oct 2004 14:49:17 -0700
+
+Hello. On behalf of the Unicode Consortium editorial committee, I would
+like to find out more information about the PR 29 fixes, if any, and
+functions in Libidn. Your implementation was listed in the text of PR29 as
+needing investigation, so I am following up on several implementations.
+
+The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
+draft of UAX #15 has been issued.
+
+I have looked at Libidn 0.5.8 (today), and there may still be a possible
+bug in NFKC.java and nfkc.c.
+
+------------------------------------------------------
+
+1. In NFKC.java, this line in canonicalOrdering():
+
+ if (i > 0 && (last_cc == 0 || last_cc != cc)) {
+
+should perhaps be changed to:
+
+ if (i > 0 && (last_cc == 0 || last_cc < cc)) {
+
+but I'm not sure of the sense of this comparison.
+
+------------------------------------------------------
+
+2. In nfkc.c, function _g_utf8_normalize_wc() has this code:
+
+ if (i > 0 &&
+ (last_cc == 0 || last_cc != cc) &&
+ combine (wc_buffer[last_start], wc_buffer[i],
+ &wc_buffer[last_start]))
+ {
+
+This appears to have the same bug as the current Python implementation (in
+Python 2.3.4). The code should be checking, as per new rule D2 UAX #15
+update, that the next combining character is the same or HIGHER than the
+current one. It now checks to see if it's non-zero and not equal.
+
+The above line(s) should perhaps be changed to:
+
+ if (i > 0 &&
+ (last_cc == 0 || last_cc < cc) &&
+ combine (wc_buffer[last_start], wc_buffer[i],
+ &wc_buffer[last_start]))
+ {
+
+but I'm not sure of the sense of the comparison (< or > or <=?) here.
+
+In the text of PR29, I will be marking Libidn as "needs change" and adding
+the version number that I checked. If any further change is made, please
+let me know the release version, and I'll update again.
+
+Regards,
+ Rick McGowan
+@end verbatim
+
+@verbatim
+From: Simon Josefsson <jas@extundo.com>
+Subject: Re: Possible bug and status of PR 29 change(s)
+To: Rick McGowan <rick@unicode.org>
+Cc: bug-libidn@gnu.org
+Date: Thu, 28 Oct 2004 09:47:47 +0200
+
+Rick McGowan <rick@unicode.org> writes:
+
+> Hello. On behalf of the Unicode Consortium editorial committee, I would
+> like to find out more information about the PR 29 fixes, if any, and
+> functions in Libidn. Your implementation was listed in the text of PR29 as
+> needing investigation, so I am following up on several implementations.
+>
+> The UTC has accepted the proposed fix to D2 as outlined in PR29, and a new
+> draft of UAX #15 has been issued.
+>
+> I have looked at Libidn 0.5.8 (today), and there may still be a possible
+> bug in NFKC.java and nfkc.c.
+
+Hello Rick.
+
+I believe the current behavior is intentional. Libidn do not aim to
+implement latest-and-greatest NFKC, it aim to implement the NFKC
+functionality required for StringPrep and IDN. As you may know,
+StringPrep/IDN reference Unicode 3.2.0, and explicitly says any later
+changes (which I consider PR29 as) do not apply.
+
+In fact, I believe that would I incorporate the changes suggested in
+PR29, I would in fact be violating the IDN specifications.
+
+Thanks for looking into the code and finding the place where the
+change could be made. I'll see if I can mention this in the manual
+somewhere, for technically interested readers.
+
+Regards,
+Simon
+@end verbatim
+
+@node On Label Separators
+@appendix On Label Separators
+
+Some strings contains characters whose NFKC normalized form contain
+the ASCII dot (0x2E, ``.''). Examples of these characters are U+2024
+(ONE DOT LEADER) and U+248C (DIGIT FIVE FULL STOP). The strings have
+the interesting property that their IDNA ToASCII output will contain
+embedded dots. For example:
+
+@example
+ToASCII (hi U+248C com) = hi5.com
+ToASCII (r@"aksm@"org@aa{}s U+2024 com) = xn--rksmrgs.com-l8as9u
+@end example
+
+This demonstrate the two general cases: The first where the ASCII dot
+is part of an output that do not begin with the IDN prefix
+@code{xn--}. The second example illustrate when the dot is part of
+IDN prefixed with @code{xn--}.
+
+The input strings are, from the DNS point of view, a single label.
+The IDNA algorithm translate one label at a time. Thus, the output is
+expected to be only one label. What is important here is to make sure
+the DNS resolver receives the correct query. The DNS protocol does
+not use the dot to delimit labels on the wire, rather it uses
+length-value pairs. Thus the correct query would be for
+@code{@{7@}hi5.com} and @code{@{22@}xn--rksmrgs.com-l8as9u}
+respectively.
+
+Some implementations @footnote{Notably Microsoft's Internet Explorer
+and Mozilla's Firefox, but not Apple's Safari.} have decided that
+these inputs strings are potentially confusing for the user. The
+string @code{hi U+248C com} looks like @code{hi5.com} on systems that
+support Unicode properly. These implementations do not follow RFC
+3490. They yield:
+
+@example
+ToASCII (hi U+248C com) = hi5.com
+ToASCII (r@"aksm@"org@aa{}s U+2024 com) = xn--rksmrgs-5wao1o.com
+@end example
+
+The DNS query they perform are @code{@{3@}hi5@{3@}com} and
+@code{@{18@}xn--rksmrgs-5wao1o@{3@}com} respectively. Arguably, this
+leads to a better user experience, and suggests that the IDNA
+specification is sub-optimal in this area.
+
+@section Recommended Workaround
+
+It has been suggested to normalize the entire input string using NFKC
+before passing it to IDNA ToASCII. You may use
+@code{stringprep_utf8_nfkc_normalize} or
+@code{stringprep_ucs4_nfkc_normalize}. This appears to lead to
+similar behaviour as IE/Firefox, which would avoid the problem, but
+this needs to be confirmed. Feel free to discuss the issue with us.
+
+Alternative workarounds are being considered. Eventually Libidn may
+implement a new flag to the @code{idna_*} functions that implements a
+recommended way to work around this problem.
+
+@node Copying Information
+@appendix Copying Information
+
+@menu
+* GNU Free Documentation License:: License for copying this manual.
+* GNU LGPL:: License for copying the library.
+* GNU GPL:: License for copying the programs.
+@end menu
+
+@node GNU Free Documentation License
+@appendixsec GNU Free Documentation License
+
+@cindex FDL, GNU Free Documentation License
+
+@include fdl-1.3.texi
+
+@node GNU LGPL
+@appendixsec GNU Lesser General Public License
+@cindex LGPL, GNU Lesser General Public License
+@cindex License, GNU LGPL
+
+@include lgpl-2.1.texi
+
+@node GNU GPL
+@appendixsec GNU General Public License
+@cindex GPL, GNU General Public License
+@cindex License, GNU GPL
+
+@include gpl-3.0.texi
+
+@node Function and Variable Index
+@unnumbered Function and Variable Index
+
+@printindex fn
+
+@node Concept Index
+@unnumbered Concept Index
+
+@printindex cp
+
+@bye