charset-name; alias, ...


Command-line Option



CHARSETALIASES defines aliases for character set names. For example, the charset iso-8859-1 is also known by latin1. Hence, latin1 is an alias for iso-8859-1 and can be defined as follows:

iso-8859-1; latin1

Each line of the CHARSETALIASES element defines an alias definition. The syntax of an alias definition is as follows,

charset-name; alias, ...

i.e. the character set name followed by a semi-colon followed by a comma separated list of aliases.

Specifying a character set multiple times is allowed. For example, the following are equivalent:

iso-8859-1; latin1, l1, iso_8859_1

iso-8859-1; latin1
iso-8859-1; l1
iso-8859-1; iso_8859_1

If the same alias is specified for two different charsets, then the last one defined is use. For example, if the following is defined,

iso-8859-1; x-foo
koi8-u; x-foo

then x-foo will be an alias for koi8-u.

When MHonArc invokes CHARSETCONVERTERS filters, MHonArc maps aliases to real names before invoking the filters. Therefore, it is not necessary for a filter to know all possible names for a given character set.

If the override attribute is specified for CHARSETALIASES, then any previous settings will be cleared. Otherwise, each occurance of CHARSETALIASES will augment existing settings.

Default Setting

us-ascii;	    ascii
us-ascii;	    ansi_x3.4-1968
us-ascii;	    iso646
us-ascii;	    iso646-us
us-ascii;	    iso646.irv:1991
us-ascii;	    cp367
us-ascii;	    ibm367
us-ascii;	    csascii
us-ascii;	    iso-ir-6
us-ascii;	    us
iso-8859-1;	    latin1
iso-8859-1;	    l1
iso-8859-1;	    iso_8859_1
iso-8859-1;	    iso_8859-1:1987
iso-8859-1;	    iso8859-1
iso-8859-1;	    iso8859_1
iso-8859-1;	    8859-1
iso-8859-1;	    8859_1
iso-8859-1;	    cp819
iso-8859-1;	    ibm819
iso-8859-1;	    x-mac-latin1
iso-8859-1;	    iso-ir-100
iso-8859-2;	    latin2
iso-8859-2;	    l2
iso-8859-2;	    iso_8859_2
iso-8859-2;	    iso_8859-2:1987
iso-8859-2;	    iso8859-2
iso-8859-2;	    iso8859_2
iso-8859-2;	    8859-2
iso-8859-2;	    8859_2
iso-8859-2;	    iso-ir-101
iso-8859-3;	    latin3
iso-8859-3;	    l3
iso-8859-3;	    iso_8859_3
iso-8859-3;	    iso_8859-3:1988
iso-8859-3;	    iso8859-3
iso-8859-3;	    iso8859_3
iso-8859-3;	    8859-3
iso-8859-3;	    8859_3
iso-8859-3;	    iso-ir-109
iso-8859-4;	    latin4
iso-8859-4;	    l4
iso-8859-4;	    iso_8859_4
iso-8859-4;	    iso_8859-4:1988
iso-8859-4;	    iso8859-4
iso-8859-4;	    iso8859_4
iso-8859-4;	    8859-4
iso-8859-4;	    8859_4
iso-8859-4;	    iso-ir-110
iso-8859-5;	    iso_8859-5:1988
iso-8859-5;	    cyrillic
iso-8859-5;	    iso-ir-144
iso-8859-6;	    iso_8859-6:1987
iso-8859-6;	    arabic
iso-8859-6;	    asmo-708
iso-8859-6;	    ecma-114
iso-8859-6;	    iso-ir-127
iso-8859-7;	    iso_8859-7:1987
iso-8859-7;	    greek
iso-8859-7;	    greek8
iso-8859-7;	    ecma-118
iso-8859-7;	    elot_928
iso-8859-7;	    iso-ir-126
iso-8859-8;	    iso-8859-8-i
iso-8859-8;	    iso_8859-8:1988
iso-8859-8;	    hebrew
iso-8859-8;	    iso-ir-138
iso-8859-9;	    latin5
iso-8859-9;	    l5
iso-8859-9;	    iso_8859_9
iso-8859-9;	    iso-8859_9:1989
iso-8859-9;	    iso8859-9
iso-8859-9;	    iso8859_9
iso-8859-9;	    8859-9
iso-8859-9;	    8859_9
iso-8859-9;	    iso-ir-148
iso-8859-10;	    latin6
iso-8859-10;	    l6
iso-8859-10;	    iso_8859_10
iso-8859-10;	    iso_8859-10:1993
iso-8859-10;	    iso8859-10
iso-8859-10;	    iso8859_10
iso-8859-10;	    8859-10
iso-8859-10;	    8859_10
iso-8859-10;	    iso-ir-157
iso-8859-13;	    latin7 ,l7
iso-8859-14;	    latin8 ,l8
iso-8859-15;	    latin9
iso-8859-15;	    latin0
iso-8859-15;	    l9
iso-8859-15;	    l0
iso-8859-15;	    iso_8859_15
iso-8859-15;	    iso8859-15
iso-8859-15;	    iso8859_15
iso-8859-15;	    8859-15
iso-8859-15;	    8859_15
iso-2022-jp;	    iso-2022-jp-1
utf-8;		    utf8
cp932;		    shiftjis
cp932;		    shift_jis
cp932;		    shift-jis
cp932;		    x-sjis
cp932;		    ms_kanji
cp932;		    csshiftjis
cp936;		    gbk
cp936;		    ms936
cp936;		    windows-936
cp949:		    euc-kr
cp949:		    ks_c_5601-1987
cp949:		    ks_c_5601-1989
cp949:		    ksc_5601
cp949:		    iso-ir-149
cp949:		    windows-949
cp949:		    ms949
cp949:		    korean
cp950;		    windows-950
cp1250;		    windows-1250
cp1251;		    windows-1251
cp1252;		    windows-1252
cp1253;		    windows-1253
cp1254;		    windows-1254
cp1255;		    windows-1255
cp1256;		    windows-1256
cp1257;		    windows-1257
cp1258;		    windows-1258
koi-0;		    gost-13052
koi8-e;		    iso-ir-111
koi8-e;		    ecma-113:1986
koi8-r;		    cp878
gost-19768-87;	    ecma-cyrillic
gost-19768-87;	    ecma-113
gost-19768-87;	    ecma-113:1988
big5-eten;	    big5
big5-eten;	    csbig5
big5-eten;	    tcs-big5
big5-eten;	    tcsbig5
big5-hkscs;	    big5hk
big5-hkscs;	    big5hkscs
big5-hkscs;	    hkscs-big5
big5-hkscs;	    hk-big5
gb2312;		    gb_2312-80
gb2312;		    csgb2312
gb2312;		    hz-gb-2312
gb2312;		    iso-ir-58
gb2312;		    euc-cn
gb2312;		    chinese
gb2312;		    csiso58gb231280
macarabic;          apple-arabic
maccentraleurroman; apple-centeuro
maccroatian;        apple-croatian
maccyrillic;        apple-cyrillic
macgreek;           apple-greek
machebrew;          apple-hebrew
macicelandic;       apple-iceland
macromanian;        apple-romanian
macroman;           apple-roman
macthai;            apple-thai
macturkish;         apple-turkish
macarabic;          x-mac-arabic
maccentraleurroman; x-mac-centraleurroman
maccroatian;        x-mac-croatian
maccyrillic;        x-mac-cyrillic
macgreek;           x-mac-greek
machebrew;          x-mac-hebrew
macicelandic;       x-mac-icelandic
macromanian;        x-mac-romanian
macroman;           x-mac-roman
macthai;            x-mac-thai
macturkish;         x-mac-turkish

Resource Variables



CHARSETALIASES is generally useful for resolving "unknown charset" warnings that MHonArc generates since some MUAs can specify non-standard names for charsets.

Another use is to fool MHonArc into thinking that data labeled with one charset is actual data in another charset. For example, in some locales, MUAs improperly set the charset="..." parameter in text messages. CHARSETALIASES can be used to tell MHonArc to treat the improperly labeled data in another charset during conversion. For example,

iso-8859-8; us-ascii

tells MHonArc to treat US-ASCII data as Hebrew.



See Also



$Date: 2003/10/06 22:04:16 $
Copyright © 2002, Earl Hood, mhonarc@mhonarc.org