LaTeX Encoding

LaTeX fundamentally works with three text encodings:

  1. The input encoding, used to interpret the files that are supplied to LaTeX and TeX
  2. The internal character representation, used within the programs
  3. The font encoding, used to translate bytes into glyphs

Most reasonable users of LaTeX are only interested in the third item.


A Great Big Caveat

LuaTeX and XeTeX are fantastic projects that are taking the world of TeX and LaTeX forwards into the future. Both run natively in Unicode, and both are built on top of modern libraries like Harfbuzz (i.e. automatically poll system fonts for glyphs). If you have the flexibility, consider not using LaTeX.


CJK

Chinese, Japanese, and Korean occupy a closely related segment of the Unicode universe, and so are commonly bunched together in distributed software. Used only by a niche community, there's often only one good way of handling CJK text in LaTeX.

Quick Start

\usepackage{CJKutf8}

% ...

This is some `normal' text, while \begin{CJK}{UTF8}{min}これは悪魔のもの\end{CJK}.

CJK Fonts

The latex-cjk-japanese-wadalab package distributes three fonts for use as above: min (Mincho, roughly equivalent to serif, lit. "Ming dynasty"), goth (Gothic, roughly equivalent to sans-serif), and maru (rounded, lit. "circle").

The most available Chinese fonts are gbsn or gkai for Simplified Chinese, and bsmi or bkai for Traditional Chinese.


CategoryRicottone

LaTeX/Encoding (last edited 2022-05-11 17:42:24 by DominicRicottone)