開發和下載開源軟體

瀏覽 skf_1_91_eman

category(Tag) tree

file info

category(Tag)
檔案名
skfe.txt
最後更新
2002-12-17 23:58
類型
Plain Text
editor
Seiji Kaneko
描述
skf 1.91 English formatted man page
語言
English
translate
SKF(1)                                                     SKF(1)



NAME
       skf - simple Kanji Filter

SYNOPSIS
       skf [-AEIJKNQRSXZabdehjknqrsuvxz] [-i multi_byte_charset ]
       [-o  single_byte_charset   ]   [   long_format_options   ]
       [infiles..]

DESCRIPTION
       skf  is  a  yet  another  i18n capable kanji-filter, which
       enables users to read various Japanese  kanji-coded  files
       on the Net.  It converts input kanji texts or streams into
       a character stream using designated kanji code and  output
       them  to standard output. Specifically, skf is intended to
       be a versatile filter to read documents  in  various  code
       sets,  and  does  not  have  fancy  features which are not
       directly related to code conversion (like  folding,  mime-
       encoding support).

       Like  nkf,  skf  automatically  recognizes input file code
       when it is some kind of ISO-2022 code, and also  recognize
       Microsoft  JIS(SJIS)  code  and EUC if input file does not
       include X0201 kanas.  skf 1.9x can read  various  iso-2022
       compliant codesets, including JIS Kanji code (X0208, X0212
       and   X0213),   EUC   encoding,   ISO   Europian    latins
       (ISO-8859-1/2/3/4/6/7/10/11/14/15/16),   BS   4730,  NF  Z
       62-010 and X0201 kana with ESC-(-I,  SS0,  Locking  shift.
       skf also supports some non-iso2022 compliant sets, includ-
       ing Microsoft Shifted-JIS code, KOI-8-R/U,  Unicode  stan-
       dard(UCS2/UTF-16,  UTF7  and UTF8), X0221 JIS(2octet only)
       and some vendor specific codes  (KEIS83  and  JEF).   Sup-
       ported  output codesets are X-0208/X-0212 JIS, X-0201 JIS,
       ASCII, EUC, Microsoft JIS, EUC and Unicode.

       Unlike nkf, skf is designed to  convert  input  code  into
       some kind of human-readable form under a local environment
       (i.e. codeset), and has several extra conversion features.
       Such  conversions  include Windows/Macintosh specific code
       swap and old-new jis glyph change, html-format/TeX  format
       conversion and variant unifications.

       If  file  name(s) are specified, skf read files and output
       converted stream to stdout. If no file  names  are  given,
       input  is  taken from stdin and output to stdout.  OPTIONS
       are taken from Environment Variables  SKFENV,  skfenv  and
       command  line,  respectively  in  this  order. Environment
       variables are not used when executed as root.

       skf does not use LOCALE-related environment variables  for
       conversion,  but  output  error messages are controlled by
       given LOCALES.

OPTIONS
       skf is internally a different program from  nkf.  However,
       skf  is  intended to be a plug-in replacement to nkf(v1.4)
       and has a subset of nkf options.
       skf 1.9x recognizes following options.

       -u     use unbuffered output.

       -b     use buffered output. This is default.

Input/Output codeset settings
       -n -j  output  encoding  is  7-bit  JIS  code  using   JIS
              X0208(1983/1990) character set.

       -s -x  output   encoding   is   Microsoft  JIS  using  JIS
              X0208(1983/1990) character set.

       -a -e  output encoding is EUC using  JIS  X0208(1983/1990)
              character set.

       -q     output encoding is Unicode UTF-16 (v3.2). Output is
              little endian byte ordered by default, and includes
              endian  mark by default unless --suppress-endian is
              specified. Output range is within UTF-32 with  sur-
              rogate pair unless --limit-to-ucs2 is specified.

       -z     output encoding is UTF-8 encoded Unicode (v3.2)

       -y     output encoding is UTF-7 encoded Unicode (v3.2)

       -k (experimental)
              output encoding and character set is KEIS83.


       -i_    use  ESC-$-_  as a designate sequence for JIS Kanji
              (Default is B ). This setting  and  output  codeset
              setting is separate setting.

       -o_    use ESC-$-_ as a designate sequence for single-byte
              roman character (Default is J  ).  Note  that  this
              setting and output codeset setting is separate set-
              ting.  This setting does not specify  output  code-
              set.

       -A, -E, --input-euc
              Assume input code set is EUC.

       --input-euc-x0213 (experimental)
              Assume  input code set is EUC with X-0213 1st plane
              extension.

       -N, --input-jis
              Assume input code set is JIS X-0208.

       -S, -X, --input-sjis
              Assume input code set is Microsoft JIS.

       --input-sjis-x0213 (experimental)
              Assume input code set is Microsoft JIS with  X-0213
              extension (i.e.  JIS X-0213 Shift encoding).

       -Q, --input-ucs2 --input-utf16
              Assume  input  code  set is Unicode UTF-16. Default
              endian is BIG, and byte  order  marking  is  recog-
              nized.

       -Y, --input-utf7
              Assume input code set is UTF-7 encoded Unicode.

       --no-utf7
              Assume  input  code set is *NOT* UTF-7 encoded Uni-
              code. This option disables input utf7 testing.

       -Z, --input-utf8
              Assume input code set is UTF-8 encoded Unicode.

       -K, --input-keis (experimental)
              Assume input code set is KEIS83 code.

       --input-jef (experimental)
              Assume input code set is JEF-ebcdik kana code.

       --input-jef-small (experimental)
              Assume input code set is  JEF  with  ebcdik  latin-
              small code.


EXTENDED OPTIONS
       skf has various features to fit output file to local envi-
       ronment, and many of these are controlled by extended con-
       trol switch described in this section.


   X-0201 Kana handling
       skf  by  default converts X-0201 kanas to X-0208 kanas. To
       output X-0201 kana as it is, use one of following options.
       When  output  is  designated  to  EUC or SJIS, these three
       options enable X-0201 kana output by ways provided by each
       code  set. When Unicode output is specified, (equiv.) kana
       part output is controlled by --use-compat,  not  following
       switches.

       --kana-jis7
              use  SI/SO  locking  shift  sequence  to  designate
              X-0201 kana.

       --kana-jis8
              output X-0201 kana using 8-bit code right plane.

       --kana-esci --kana-call
              use ESC-(-I to designate X-0201 kana.

       --kana-enable
              use X-0201 kana when EUC (with G2) or  SJIS  output
              code  is  used.  When  JIS  output,  it  is same as
              --kana-call.


   JIS X-0212(Supplement Kanji code) Support
       --x0212-enable
              skf by default does not  output  JIS  X-0212  code.
              This  option enables use of JIS X-0212 part. Output
              code set may be neither Microsoft  code  nor  KEIS.
              For Unicode variant encodings, this option is on by
              default.


   Latin code handling
       With Unicode(tm) family output codings,  skf  output  non-
       ascii latin character part as it is, but with other output
       codings, skf converts  these  characters  using  following
       rules:

       (1)  If  code is defined in iso-8859-1 and --use-iso8859-1
       is defined, it is outputted as is using iso-8859-1 as  GR.
       (2)  If  html  convert mode enabled and code is defined in
       html/sgml  codeset,  it  is  converted  to   html   escape
       sequence.
       (3) If tex convert mode enabled and code is defined in tex
       codeset, it is converted to tex format.
       (4) If code is defined in X-0208/X0212, it is converted to
       X-0208/X0212 respectively.

       --use-iso8859-1
              Enable  iso-8859-1 output. Iso-8859-1 is invoked to
              G1 and set to GR plane. This  mode  is  cleared  by
              --reset.

       --convert-html --convert-sgml
              Enable  html convert mode. This mode is disabled by
              --reset. These two options  are  aliases,  and  are
              treated as same option.

       --convert-html-decimal
              Enable  html  code-point decimal convert mode. This
              mode is cleared by --reset.

       --convert-html-hexadecimal
              Enable html code-point  hexadecimal  convert  mode.
              This mode is cleared by --reset.

       --convert-tex
              Enable  tex  convert  mode. This mode is cleared by
              --reset.


   Codeset/Vendor Specific codeset handling flags
       skf by default assumes machine  specific  parts  of  kanji
       code  are  Microsoft  Windows  compatible.  Here  are some
       options that control this behavior.

       --disable-gaiji-support
              Assume machine specific part is undefined.

       --use-apple-gaiji
              Assume machine specific part in input file is  Mac-
              intosh(Kanjitalk7) compatible.

       --dsbl-ibm-gaiji
              Disable machine specific part in input file.

       --disable-chart
              Do  not use Moji-keisen characters. This is for old
              Macintosh system compatibility.

       --disable-jis90
              Disable 2 added characters of JIS X-0208(1990).  If
              this  option is specified, these two characters are
              replaced by Kanji variants.  This option is off  by
              default.

       --input-detect-jis78
              Distinguish   JIS   X-0208(1978)  codeset  and  JIS
              X-0208(1983/90) codeset. This option is valid  only
              when input encoding is JIS (ISO-2022).  This option
              needs -DDYNAMIC_LOADING at compile time.

       --output-jis78
              When  output,  codeset  for  JIS   table   is   JIS
              X-0208(1978).  This  option  is  valid  when output
              encoding is JIS, EUC or Microsoft code(cp932).

       --convert-jis78-jis83
              In JIS X-0208 1983 revision, some characters in JIS
              X-0208(1978)  is  moved  to  JIS X-0212(1990). This
              switch specifies skf  to  output  these  characters
              with variants in X-0208(1983).


   ISO-2022 Specific controls
       --set-g0=`char_set'
              Set  code set predefined to plane 0 (G0). Supported
              `char_set' is `ascii' (default) and `x0201'. It  is
              automatically   invoked  to  GL  (iso-2022-jp-1/2/3
              assumption).   This  option  works  only  with  JIS
              input.

       --set-g1=`char_set'
              Set  code  set  predefined to right plane(G1). Sup-
              ported    `char_set'    is    `x0201'     (default)
              `iso8859-1',`iso8859-2',`iso8859-3',`iso8859-7',
              `iso8859-14',`iso8859-15',`koi8-r'   and   `x0212'.
              This option works only with JIS input.

       --set-g2=`char_set'
              Set  code  set  predefined  to  G2 plane. Supported
              `char_set'       is        `x0201'        (default)
              `iso8859-1',`iso8859-2',`iso8859-3',`iso8859-7',
              `iso8859-14',`iso8859-15',`koi8-r'   and   `x0212'.
              This option works with EUC and JIS input.

       --set-g3=`char_set'
              Set  code  set  predefined  to  G3 plane. Supported
              `char_set'       is        `x0201'        (default)
              `iso8859-1',`iso8859-2',`iso8859-3',`iso8859-7',
              `iso8859-14',`iso8859-15',`koi8-r'   and   `x0212'.
              This option works with EUC and JIS input.

       --euc-protect-g1
              In  EUC  input  mode,  suppress  sequences to set a
              charset to G1. Such sequences are discarded.

       --old-nec-compat
              Enable old NEC kanji sequence (ESC-K,H). Needs com-
              pile option -DOLD_NEC_COMPAT.

       --add-annon
              Add announcer for JIS X-0208(1990) to X-0208 desig-
              nate sequence. This option works only with JIS out-
              put.


   Unicode coding specific control
       --use-compat
              When output is one of translation format of Unicode
              standard, enable characters in compatibility  plane
              (0xfxxx).   skf by default does not use these char-
              acters.

       --use-ms-compat
              When output is Unicode, make translation  Microsoft
              wind*ws  compatible.  This only affect some symbols
              in JIS-Kanji, and  adding  --use-compat  option  is
              recommended.

       --little-endian
              When  output  is  Unicode,  use little endian byte-
              order. This is default.

       --big-endian
              When output is Unicode, use big endian  byte-order.

       --suppress-endian-mark
              When  output  is  UTF-16,  do  not  use  byte order
              marking. To  make  UTF-8N,  use  this  option  with
              --little-endian. This is off by default.

       --enable-endian-mark
              When  output  is  UTF-8, output byte order marking.
              This is off by default.

       --input-little-endian
              When input  is  Unicode,  assume  input  is  little
              endian  byte-ordered.   This  is  default,  but skf
              respects byte-order mark.

       --input-big-endian
              When input is Unicode, assume input is  big  endian
              byte-ordered.   Note  that  skf respects byte-order
              mark.

       --endian-protect
              Do not use endian mark in the input stream.  Endian
              mark is just discarded.

       --use-replace-char
              skf  by  default  converts undefined (except 0x2xxx
              part) characters into "geta (U+3013)"  code.   This
              option   specifies  skf  to  use  replacement  char
              (0xfffc in UCS2) instead.

       --limit-to-ucs2
              Do not use > 0x10000 area  code  in  Unicode  (i.e.
              limit code to ucs2 area).

       --suppress-cjk-extension
              Treat CJK extension A/B area as undefined.

       --old-hangle-location
              Treat  U-3400  area as hangle (Unicode 1.0 compati-
              bility).


   Encoding controls
       --decode=`encoding scheme'
              Specify encoding scheme for input stream. Supported
              encoding   scheme   is   `hex',  'mime',  'mime_q',
              'mime_b' and `rot47'.  Each option means  CAP  hex-
              code,  mime,  mime  Q-encoding, mime B-encoding and
              rot13/47 respectively. When mime decoding is speci-
              fied,  base  text  is  assumed  to  be EUC encoding
              unless specified otherwise.


   End of line controls
       --lineend-thru
              Output end of line code as it is.  also  output  ^Z
              code as it is.  This is default.

       --lineend-cr --lineend-mac
              Use  CR  as  end  of line code. Also delete ^Z code
              from input stream.

       --lineend-lf --lineend-unix
              Use LF as end of line code.  Also  delete  ^Z  code
              from input stream.

       --lineend-crlf --lineend-windows
              Use  CRLF  as end of line code. Also delete ^Z code
              from input stream.


   File controls
       --filewise-detect --force-reset
              Reset and re-detect input code set at the start  of
              each file.

       --linewise-detect
              Reset  and re-detect input code set at the start of
              each line. This option needs -DKUNIMOTO at  compile
              time.


   Misc. Controls
       --suppress-space-convert
              skf  by default, converts an ideographic space into
              two  ascii  spaces.  This  option  suppresses  this
              behavior.

       --reset
              Reset  all flags specified by extended controls and
              given input code.

       --inquiry
              skf detects code and output detect result  to  std-
              out. No filtering output is performed.

       --show-filename
              When  inquiry(--inquiry)  is  on,  this option adds
              each file name to output. Enabled by  default  when
              multiple input files are specified.

       --invis-strip
              Delete   all  escape  sequences  not  belonging  to
              ISO-2022  code  extension.  This  is  intended   to
              replace  invisstrip  command bundled in inews pack-
              age.

       --html-sanitize
              Convert several  characters  in  HTML  document  to
              entity    reference    expression.    Specifically,
              "!#$&%()/<>:;?' is escaped by entity expression.

       -I     Warn if input has unassigned code points.

       -v     print version and exit.

       -h     print brief help.


FILES
       /usr/(local/)share/skf/lib/
              where external codeset conversion  tables  go.  The
              location  that  current skf assumes are shown by -h
              option.


AUTHOR
       skf is  written  by  Seiji  Kaneko  (skaneko@a2.mbn.or.jp)
       based   on   idea  from  nkf  written  by  Itaru  Ichikawa
       (ichikawa@flab.fujitsu.co.jp) X-0213 code table is derived
       from work of earthian@tama.or.jp.


ACKNOWLEDGEMENT
       skf is inspired by works or requests by
       shinoda@cs.titech,    kato@cs.titech,   uematsu@cs.titech,
       void@global ohta@ricoh,  Hinata(HKE)  Ashizawa(CRL)  Kuni-
       moto(SDL)


BUGS AND LIMITATIONS
       1. skf can handle mixed coding with some limitations. How-
       ever, code detection easily fails for mixed code, and giv-
       ing explicit input code set is strongly encouraged.
       In case of emergency, --linewise-detect option may help.

       2.  When using UCS2, UTF-16, UTF-8 and UTF-7, skf tries to
       detect input code, but giving explicit code set is encour-
       aged.    skf   doesn't  support  UCS4,  but  does  support
       UTF-16/UTF-32 (i.e. surrogate pairs).  skf just pass  Com-
       posite  characters  to  output. No further process is per-
       formed.

       3. skf implements ISO-2022 with following exceptions

        (1) GL 0x20 is always space.

        (2) if unknown sequence is given to  G[0-3],  G[0-3]  is  
       set to ascii, and locking/single shift is cleared.

        (3) standard return sequence is ignored.

        (4) Sequences related to C1 and C2 is just ignored.

        (5)  Sequences  for  96  character  multibyte  coding  is
       ignored.

       4. Since skf by default is testing input  to  detect  utf7
       coding,  skf sometimes misdetects pure ascii text as utf7.
       If this occurs, use --no-utf7 option.

       5. error output coding is controlled by LOCALE environment
       variables  in  UN*X system. Since skf can't recognize that
       stdout and stderr is redirecting into  same  stream,  this
       case should be cared by user.

       6. IBM CCSID 1394 is not supported.

       7. skf-1.91 converts KEIS/JIS X-0213 code using CJK-exten-
       sion B and CJK compatibility area. For this reason, X-0213
       and  KEIS  convert result varies depending on --use-compat
       and --limit-to-ucs2 switches.

       8. Current external table format supports only UCS2  char-
       acters.

       10. JIS X-0207(1979) is not supported. JIS X-0211(1987) is
       designed to be supported  (i.e.  common  terminal  control
       sequence is transparently passed to output).


Note
       1.  Extended options are changed extensively from skf-1.3.
       Some archaic options (eg. -B, -@ and -r) have been deleted
       from this version.

       2.  From  version 1.9, default code set assumed by skf has
       changed to JIS X-0208(1990) with Microsoft  Japanese  Win-
       dows gaiji (i.e. CP932).

       3.  From  version  1.9,  skf  supports  iso8859  and other
       charset by using Unicode as internal code  set.  For  this
       reason, skf-1.9 behaves differently from earlier versions.

       4. Code autodetection is not perfect by design. If it  has
       failed  to  detect  input code properly, please give input
       code information explicitly.

       5. Some ligatures in Unicode, cp932 gaiji and  KEIS83  are
       converted  using  JIS  X-0124 and other convention. During
       this conversion, its byte length is not preserved.

       6. skf is intended to pass ANSI compatible  terminal  con-
       trol code transparently, but this is not guaranteed.

       7.  There  are  some  undocumented  options. These options
       should be considered as highly experimental.


Notice
       Unicode(TM) is a trademark of Unicode, Inc. Microsoft  and
       Windows  are  registered  trademarks of Microsoft corpora-
       tion. Macintosh is a registered trademark  of  Apple  Com-
       puter Inc. Other names and terms may be trademarks or reg-
       istered trademarks of their respective  owner.   Trademark
       symbol (TM) is omitted in this manual page.



                           09/MAY/2002                     SKF(1)