American Standard Code for Information Interchange
The basis of character sets used in almost all present-day computers. US-ASCII
uses only the lower seven bits (character points 0 to 127) to convey some
control codes, space, numbers, most basic punctuation, and unaccented letters
a-z and A-Z. More modern coded character sets (e.g., Latin-1, Unicode) define
extensions to ASCII for values above 127 for conveying special Latin characters
(like accented characters, or German ess-tsett), characters from non-Latin
writing systems (e.g., Cyrillic, or Han characters), and such desirable glyphs
as distinct open- and close-quotation marks. ASCII replaced earlier systems such
as EBCDIC and Baudot, which used fewer bytes, but were each broken in their own
way.
Computers are much pickier about spelling than humans; thus, hackers need to be
very precise when talking about characters, and have developed a considerable
amount of verbal shorthand for them. Every character has one or more names -
some formal, some concise, some silly.
Individual characters are listed in this dictionary with alternative names from
revision 2.3 of the Usenet ASCII pronunciation guide in rough order of
popularity, including their official ITU-T names and the particularly silly
names introduced by INTERCAL.
See V ampersand, asterisk, back quote, backslash, caret, colon, comma,
commercial at, control-C, dollar, dot, double quote, equals, exclamation mark,
greater than, hash, left bracket, left parenthesis, less than, minus,
parentheses, oblique stroke, percent, plus, question mark, right brace, right
brace, right bracket, right parenthesis, semicolon, single quote, space, tilde,
underscore, vertical bar, zero.
Some other common usages cause odd overlaps. The "#", "$", ">", and "&"
characters, for example, are all pronounced "hex" in different communities
because various assemblers use them as a prefix tag for hexadecimal constants
(in particular, "#" in many assembler-programming cultures, "$" in the 6502
world, ">" at Texas Instruments, and "&" on the BBC Micro, Acorn Archimedes,
Sinclair, and some Zilog Z80 machines). See also splat.
The inability of US-ASCII to correctly represent nearly any language other than
English became an obvious and intolerable misfeature as computer use outside the
US and UK became the rule rather than the exception (see software rot). And so
national extensions to US-ASCII were developed, such as Latin-1.
Hardware and software from the US still tends to embody the assumption that
US-ASCII is the universal character set and that words of text consist entirely
of byte values 65-90 and 97-122 (A-Z and a-z); this is a major irritant to
people who want to use a character set suited to their own languages.
Perversely, though, efforts to solve this problem by proliferating sets of
national characters produced an evolutionary pressure (especially in protocol
design, e.g., the URL standard) to stick to US-ASCII as a subset common to all
those in use, and therefore to stick to English as the language encodable with
the common subset of all the ASCII dialects. This basic problem with having a
multiplicity of national character sets ended up being a prime justification for
Unicode, which was designed, ostensibly, to be the *one* ASCII extension anyone
will need.
A system is described as "eight-bit clean" if it doesn't mangle text with byte
values above 127, as some older systems did.
See also ASCII character table, Yu-Shiang Whole Fish.
(1995-03-06)
Nearby terms:
American National Standard « American National
Standards Institute « American Society of Mechanical
Engineers «
American Standard Code for Information Interchange
» American Telephone and Telegraph, Inc. » American
Wire Gauge » America On-Line, Inc.
|