mirror of
https://github.com/tcltk/tcl.git
synced 2026-05-29 00:27:49 +08:00
179 lines
6.5 KiB
Plaintext
179 lines
6.5 KiB
Plaintext
'\"
|
|
'\" Copyright (c) 1998 Scriptics Corporation.
|
|
'\"
|
|
'\" See the file "license.terms" for information on usage and redistribution
|
|
'\" of this file, and for a DISCLAIMER OF ALL WARRANTIES.
|
|
'\"
|
|
.TH encoding n "8.1" Tcl "Tcl Built-In Commands"
|
|
.so man.macros
|
|
.BS
|
|
.SH NAME
|
|
encoding \- Manipulate encodings
|
|
.SH SYNOPSIS
|
|
\fBencoding \fIoption\fR ?\fIarg arg ...\fR?
|
|
.BE
|
|
.SH INTRODUCTION
|
|
.PP
|
|
Strings in Tcl are logically a sequence of Unicode characters.
|
|
These strings are represented in memory as a sequence of bytes that
|
|
may be in one of several encodings: modified UTF\-8 (which uses 1 to 4
|
|
bytes per character), or a custom encoding start as 8 bit binary data.
|
|
.PP
|
|
Different operating system interfaces or applications may generate
|
|
strings in other encodings such as Shift\-JIS. The \fBencoding\fR
|
|
command helps to bridge the gap between Unicode and these other
|
|
formats.
|
|
.SH DESCRIPTION
|
|
.PP
|
|
Performs one of several encoding related operations, depending on
|
|
\fIoption\fR. The legal \fIoption\fRs are:
|
|
.TP
|
|
\fBencoding convertfrom\fR ?\fB-nocomplain\fR|\fB-strict\fR|\fB-failindex var\fR? ?\fIencoding\fR? \fIdata\fR
|
|
.
|
|
Convert \fIdata\fR to a Unicode string from the specified \fIencoding\fR. The
|
|
characters in \fIdata\fR are 8 bit binary data. The resulting
|
|
sequence of bytes is a string created by applying the given \fIencoding\fR
|
|
to the data. If \fIencoding\fR is not specified, the current
|
|
system encoding is used.
|
|
.VS "TCL8.7 TIP346, TIP607, TIP601"
|
|
.PP
|
|
.RS
|
|
The command does not fail on encoding errors. Instead, any not convertable bytes
|
|
(like incomplete UTF-8 sequences, see example below) are put as byte values into
|
|
the output stream.
|
|
.PP
|
|
If the option \fB-failindex\fR with a variable name is given, the error reporting
|
|
is changed in the following manner:
|
|
in case of a conversion error, the position of the input byte causing the error
|
|
is returned in the given variable. The return value of the command are the
|
|
converted characters until the first error position.
|
|
In case of no error, the value \fI-1\fR is written to the variable. This option
|
|
may not be used together with \fB-nocomplain\fR.
|
|
.PP
|
|
The option \fB-nocomplain\fR has no effect and is available for compatibility with
|
|
TCL 9. In TCL 9, the encoding command fails with an error on any encoding issue.
|
|
This switch restores the TCL8.7 behaviour.
|
|
.PP
|
|
The \fB-strict\fR option follows more strict rules in conversion. For the \fButf-8\fR
|
|
encoder, it disallows invalid byte sequences and surrogates (which -
|
|
otherwise - are just passed through).
|
|
.VE "TCL8.7 TIP346, TIP607, TIP601"
|
|
.RE
|
|
.TP
|
|
\fBencoding convertto\fR ?\fB-nocomplain\fR|\fB-strict\fR|\fB-failindex var\fR? ?\fIencoding\fR? \fIstring\fR
|
|
.
|
|
Convert \fIstring\fR from Unicode to the specified \fIencoding\fR.
|
|
The result is a sequence of bytes that represents the converted
|
|
string. Each byte is stored in the lower 8-bits of a Unicode
|
|
character (indeed, the resulting string is a binary string as far as
|
|
Tcl is concerned, at least initially). If \fIencoding\fR is not
|
|
specified, the current system encoding is used.
|
|
.VS "TCL8.7 TIP346, TIP607, TIP601"
|
|
.PP
|
|
.RS
|
|
The command does not fail on encoding errors. Instead, the replacement character
|
|
\fB?\fR is output for any not representable character (like the dot \fB\\U2022\fR
|
|
in \fBiso-8859-1\fR encoding, see example below).
|
|
.PP
|
|
If the option \fB-failindex\fR with a variable name is given, the error reporting
|
|
is changed in the following manner:
|
|
in case of a conversion error, the position of the input character causing the error
|
|
is returned in the given variable. The return value of the command are the
|
|
converted bytes until the first error position. No error condition is raised.
|
|
In case of no error, the value \fI-1\fR is written to the variable. This option
|
|
may not be used together with \fB-nocomplain\fR.
|
|
.PP
|
|
The option \fB-nocomplain\fR has no effect and is available for compatibility with
|
|
TCL 9. In TCL 9, the encoding command fails with an error on any encoding issue.
|
|
This switch restores the TCL8.7 behaviour.
|
|
.PP
|
|
The \fB-strict\fR option follows more strict rules in conversion. For the \fButf-8\fR
|
|
encoder, it disallows surrogates (which - otherwise - are just passed through).
|
|
.VE "TCL8.7 TIP346, TIP607, TIP601"
|
|
.RE
|
|
.TP
|
|
\fBencoding dirs\fR ?\fIdirectoryList\fR?
|
|
.
|
|
Tcl can load encoding data files from the file system that describe
|
|
additional encodings for it to work with. This command sets the search
|
|
path for \fB*.enc\fR encoding data files to the list of directories
|
|
\fIdirectoryList\fR. If \fIdirectoryList\fR is omitted then the
|
|
command returns the current list of directories that make up the
|
|
search path. It is an error for \fIdirectoryList\fR to not be a valid
|
|
list. If, when a search for an encoding data file is happening, an
|
|
element in \fIdirectoryList\fR does not refer to a readable,
|
|
searchable directory, that element is ignored.
|
|
.TP
|
|
\fBencoding names\fR
|
|
.
|
|
Returns a list containing the names of all of the encodings that are
|
|
currently available.
|
|
The encodings
|
|
.QW utf-8
|
|
and
|
|
.QW iso8859-1
|
|
are guaranteed to be present in the list.
|
|
.TP
|
|
\fBencoding system\fR ?\fIencoding\fR?
|
|
.
|
|
Set the system encoding to \fIencoding\fR. If \fIencoding\fR is
|
|
omitted then the command returns the current system encoding. The
|
|
system encoding is used whenever Tcl passes strings to system calls.
|
|
.SH EXAMPLE
|
|
.PP
|
|
Example 1: convert a byte sequence in Japanese euc-jp encoding to a TCL string:
|
|
.PP
|
|
.CS
|
|
set s [\fBencoding convertfrom\fR euc-jp "\exA4\exCF"]
|
|
.CE
|
|
.PP
|
|
The result is the unicode codepoint:
|
|
.QW "\eu306F" ,
|
|
which is the Hiragana letter HA.
|
|
.VS "TCL8.7 TIP346, TIP607, TIP601"
|
|
.PP
|
|
Example 2: detect the error location in an incomplete UTF-8 sequence:
|
|
.PP
|
|
.CS
|
|
% set s [\fBencoding convertfrom\fR -failindex i utf-8 "A\exC3"]
|
|
A
|
|
% set i
|
|
1
|
|
.CE
|
|
.PP
|
|
Example 3: return the incomplete UTF-8 sequence by raw bytes:
|
|
.PP
|
|
.CS
|
|
% set s [\fBencoding convertfrom\fR -nocomplain utf-8 "A\exC3"]
|
|
.CE
|
|
The result is "A" followed by the byte \exC3. The option \fB-nocomplain\fR
|
|
has no effect, but assures to get the same result with TCL9.
|
|
.PP
|
|
Example 4: detect the error location while transforming to ISO8859-1
|
|
(ISO-Latin 1):
|
|
.PP
|
|
.CS
|
|
% set s [\fBencoding convertto\fR -failindex i utf-8 "A\eu0141"]
|
|
A
|
|
% set i
|
|
1
|
|
.CE
|
|
.PP
|
|
Example 5: replace a not representable character by the replacement character:
|
|
.PP
|
|
.CS
|
|
% set s [\fBencoding convertto\fR -nocomplain utf-8 "A\eu0141"]
|
|
A?
|
|
.CE
|
|
The option \fB-nocomplain\fR has no effect, but assures to get the same result
|
|
with TCL9.
|
|
.VE "TCL8.7 TIP346, TIP607, TIP601"
|
|
.PP
|
|
.SH "SEE ALSO"
|
|
Tcl_GetEncoding(3), fconfigure(n)
|
|
.SH KEYWORDS
|
|
encoding, unicode
|
|
.\" Local Variables:
|
|
.\" mode: nroff
|
|
.\" End:
|