diff options
Diffstat (limited to 'man/man1/dos2unix.htm')
-rw-r--r-- | man/man1/dos2unix.htm | 67 |
1 files changed, 61 insertions, 6 deletions
diff --git a/man/man1/dos2unix.htm b/man/man1/dos2unix.htm index 8fbc87c..d2013b0 100644 --- a/man/man1/dos2unix.htm +++ b/man/man1/dos2unix.htm @@ -2,12 +2,12 @@ <!DOCTYPE html PUBLIC "-//W3C//DTD XHTML 1.0 Strict//EN" "http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd"> <html xmlns="http://www.w3.org/1999/xhtml"> <head> -<title>dos2unix 7.2 - DOS/MAC to UNIX and vice versa text file format converter</title> +<title>dos2unix 7.3 - DOS/MAC to UNIX and vice versa text file format converter</title> <meta http-equiv="content-type" content="text/html; charset=utf-8" /> -<link rev="made" href="mailto:root@localhost" /> +<link rev="made" href="mailto:ASSI@cygwin.nonet" /> </head> -<body style="background-color: white"> +<body> @@ -23,6 +23,7 @@ <li><a href="#Encodings">Encodings</a></li> <li><a href="#Conversion">Conversion</a></li> <li><a href="#Byte-Order-Mark">Byte Order Mark</a></li> + <li><a href="#Unicode-file-names-on-Windows">Unicode file names on Windows</a></li> <li><a href="#Unicode-examples">Unicode examples</a></li> </ul> </li> @@ -143,6 +144,50 @@ <p>Set conversion mode. Where CONVMODE is one of: <i>ascii</i>, <i>7bit</i>, <i>iso</i>, <i>mac</i> with ascii being the default.</p> </dd> +<dt id="D---display-enc-ENCODING"><b>-D, --display-enc ENCODING</b></dt> +<dd> + +<p>Set encoding of displayed text. Where ENCODING is one of: <i>ansi</i>, <i>unicode</i>, <i>utf8</i> with ansi being the default.</p> + +<p>This option is only available in dos2unix for Windows with Unicode file name support. This option has no effect on the actual file names read and written, only on how they are displayed.</p> + +<p>There are several methods for displaying text in a Windows console based on the encoding of the text. They all have their own advantages and disadvantages.</p> + +<dl> + +<dt id="ansi"><b>ansi</b></dt> +<dd> + +<p>Dos2unix's default method is to use ANSI encoded text. The advantage is that it is backwards compatible. It works with raster and TrueType fonts. In some regions you may need to change the active DOS OEM code page to the Windows system ANSI code page using the <code>chcp</code> command, because dos2unix uses the Windows system code page.</p> + +<p>The disadvantage of ansi is that international file names with characters not inside the system default code page are not displayed properly. You will see a question mark, or a wrong symbol instead. When you don't work with foreign file names this method is OK.</p> + +</dd> +<dt id="unicode"><b>unicode</b></dt> +<dd> + +<p>The advantage of unicode (the Windows name for UTF-16) encoding is that text is usually properly displayed. There is no need to change the active code page. You may need to set the console's font to a TrueType font to have international characters displayed properly. When a character is not included in the TrueType font you usually see a small square, sometimes with a question mark in it.</p> + +<p>When you use the ConEmu console all text is displayed properly, because ConEmu automatically selects a good font.</p> + +<p>The disadvantage of unicode is that it is not compatible with ASCII. The output is not easy to handle when you redirect it to another program or a file. Redirection to a file does not give a correct UTF-16 file.</p> + +</dd> +<dt id="utf8"><b>utf8</b></dt> +<dd> + +<p>The advantage of utf8 is that it is compatible with ASCII and when you redirect it to a file you get a proper UTF-8 file. You need to set the console's font to a TrueType font. With a TrueType font the text is displayed similar as with the <code>unicode</code> encoding.</p> + +<p>The disadvantage is that when you use the default raster font all non-ASCII characters are displayed wrong. Not only unicode file names, but also translated messages become unreadable. On Windows configured for an East-Asian region you may see a lot of flickering of the console when the messages are displayed.</p> + +<p>In a ConEmu console the utf8 encoding method works well.</p> + +</dd> +</dl> + +<p>The default encoding can be changed with environment variable DOS2UNIX_DISPLAY_ENC by setting it to <code>unicode</code> or <code>utf8</code>.</p> + +</dd> <dt id="f---force"><b>-f, --force</b></dt> <dd> @@ -275,7 +320,7 @@ <p>When the input file is UTF-16, and the option <code>-u</code> is used, an UTF-16 BOM will be written.</p> -<p>Never use this option when the output encoding is other than UTF-8 or UTF-16. See also section UNICODE.</p> +<p>Never use this option when the output encoding is other than UTF-8, UTF-16, or GB18030. See also section UNICODE.</p> </dd> <dt id="n---newfile-INFILE-OUTFILE"><b>-n, --newfile INFILE OUTFILE ...</b></dt> @@ -486,6 +531,12 @@ <p>Dos2unix and unix2dos write always a BOM when option <code>-m</code> is used.</p> +<h2 id="Unicode-file-names-on-Windows">Unicode file names on Windows</h2> + +<p>Dos2unix has optional support for reading and writing Unicode file names in the Windows Command Prompt. That means that dos2unix can open files that have characters in the name that are not part of the default system ANSI code page. To see if dos2unix for Windows was built with Unicode file name support type <code>dos2unix -V</code>.</p> + +<p>There are some issues with displaying Unicode file names in a Windows console. See option <code>-D</code>, <code>--display-enc</code>. The file names may be displayed wrongly in the console, but the files will be written with the correct name.</p> + <h2 id="Unicode-examples">Unicode examples</h2> <p>Convert from Windows UTF-16 (with BOM) to Unix UTF-8:</p> @@ -510,7 +561,7 @@ <p>GB18030 is fully compatible with Unicode, and can be considered an unicode transformation format. Like UTF-8, GB18030 is compatible with ASCII. GB18030 is also compatible with Windows code page 936, also known as GBK.</p> -<p>On Unix/Linux UTF-16 files are converted to GB18030 when the locale encoding is set to GB18030. Note that this will only work if the location is set to China. E.g. in an English British locale setting <code>en_GB.GB18030</code> conversion of UTF-16 to GB18030 will not work, but in a Chinese <code>zh_CN.GB18030</code> locale setting it will work.</p> +<p>On Unix/Linux UTF-16 files are converted to GB18030 when the locale encoding is set to GB18030. Note that this will only work if the locale is supported by the system. Use command <code>locale -a</code> to get the list of supported locales.</p> <p>On Windows you need to use option <code>-gb</code> to convert UTF-16 files to GB18030.</p> @@ -574,7 +625,11 @@ <p>Use dos2unix in combination with the find(1) and xargs(1) commands to recursively convert text files in a directory tree structure. For instance to convert all .txt files in the directory tree under the current directory type:</p> -<pre><code> find . -name *.txt |xargs dos2unix</code></pre> +<pre><code> find . -name '*.txt' |xargs dos2unix</code></pre> + +<p>In a Windows Command Prompt the following command can be used:</p> + +<pre><code> for /R %G in (*.txt) do dos2unix "%G"</code></pre> <h1 id="LOCALIZATION">LOCALIZATION</h1> |