les codes ascii / ascii étendu / utf8

année 1980 : ASCII pur (anglais seulement)
année 1990 : ASCII étendu (chaque région le sien) nous en france Latin-1
année 2000 : UTF-8 (tout le monde, enfin compatible)

# manuel de l'ascii 7 bits
man ascii
# manuel de l'ascii etendu 8 bits (latin-1)
man iso-8859-1
# manuel de tous les caractères 
man utf8

Les codes ASCII

American
Standard
Code for
Information
Interchange

📌 Table ASCII (0-127)

Caractères de contrôle (0-31)

Décimal	Hexadécimal	Symbole	Description
0	00	NUL	Null (caractère nul)
1	01	SOH	Start of Heading
2	02	STX	Start of Text
3	03	ETX	End of Text
4	04	EOT	End of Transmission
5	05	ENQ	Enquiry
6	06	ACK	Acknowledgment
7	07	BEL	Bell (sonnerie)
8	08	BS	Backspace
9	09	HT	Horizontal Tab
10	0A	LF	Line Feed (nouvelle ligne)
11	0B	VT	Vertical Tab
12	0C	FF	Form Feed (nouvelle page)
13	0D	CR	Carriage Return (retour chariot)
14	0E	SO	Shift Out
15	0F	SI	Shift In
16	10	DLE	Data Link Escape
17	11	DC1	Device Control 1
18	12	DC2	Device Control 2
19	13	DC3	Device Control 3
20	14	DC4	Device Control 4
21	15	NAK	Negative Acknowledgment
22	16	SYN	Synchronous Idle
23	17	ETB	End of Transmission Block
24	18	CAN	Cancel
25	19	EM	End of Medium
26	1A	SUB	Substitute
27	1B	ESC	Escape
28	1C	FS	File Separator
29	1D	GS	Group Separator
30	1E	RS	Record Separator
31	1F	US	Unit Separator

Caractères imprimables (32-127)

Décimal	Hex	Caractère	Description
32	20	␣	Espace
33	21	!	Point d’exclamation
34	22	«	Guillemet double
35	23	#	Dièse
36	24	$	Dollar
37	25	%	Pourcent
38	26	&	Esperluette
39	27	‘	Apostrophe
40	28	(	Parenthèse ouvrante
41	29	)	Parenthèse fermante
42	2A	*	Astérisque
43	2B	+	Plus
44	2C	,	Virgule
45	2D	–	Trait d’union
46	2E	.	Point
47	2F	/	Barre oblique
48–57	30–39	0–9	Chiffres
58	3A	:	Deux-points
59	3B	;	Point-virgule
60	3C	<	Inférieur à
61	3D	=	Égal
62	3E	>	Supérieur à
63	3F	?	Point d’interrogation
64	40	@	Arobase
65–90	41–5A	A–Z	Lettres majuscules
91	5B	[	Crochet ouvrant
92	5C	\	Barre oblique inverse
93	5D	]	Crochet fermant
94	5E	^	Accent circonflexe
95	5F	_	Trait de soulignement
96	60	`	Accent grave
97–122	61–7A	a–z	Lettres minuscules
123	7B	{	Accolade ouvrante
124	7C	\|	Barre verticale
125	7D	}	Accolade fermante
126	7E	~	Tilde
127	7F	DEL	Delete (suppression)

Code ASCII Etendu

💡 À savoir

ASCII étendu (128–255) varie selon les pages de codes (ISO-8859, Windows-1252, etc.).
Aujourd’hui, Unicode (UTF-8) a largement remplacé ASCII pour supporter toutes les langues.
Chaque caractère ASCII occupe 7 bits (mais est souvent stocké sur 1 octet/8 bits).

Plages UTF-8 de base

Plage (hex)	Octets	Description	Exemples
U+0000 à U+007F	1	Identique à l’ASCII	`A` = 41h, `0` = 30h
U+0080 à U+07FF	2	Latin étendu, grec, arabe, etc.	`é` = C3 A9, `π` = CF 80
U+0800 à U+FFFF	3	Chinois, japonais, coréen, etc.	`文` = E6 96 87, `🎨` = F0 9F 8E A8
U+10000 à U+10FFFF	4	Emojis, symboles spéciaux	`😀` = F0 9F 98 80

Caractère	Point Unicode	Séquence UTF-8 (hex)	Remarque
é	U+00E9	`C3 A9`	e accent aigu (2 octets)
€	U+20AC	`E2 82 AC`	Symbole euro (3 octets)
文	U+6587	`E6 96 87`	Caractère chinois (3 octets)
🎉	U+1F389	`F0 9F 8E 89`	Emoji (4 octets)

Table Unicode complète : https://unicode-table.com/fr/
Outils de conversion : Utilisez hexdump -C sous Linux ou des sites comme https://www.branah.com/unicode-converter

bruno@elliott:~$ echo -n "é" | od -An -tx1
c3 a9
bruno@elliott:~$ echo -n "ê" | od -An -tx1
c3 aa
bruno@elliott:~$ echo -n "è" | od -An -tx1
c3 a8
bruno@elliott:~$ echo -n "@" | od -An -tx1
40
bruno@elliott:~$ echo -n "A" | od -An -tx1
41
bruno@elliott:~$ echo -n "a" | od -An -tx1
61

UTF-8 et Latin-1

# En UTF-8 (défaut)
echo -n "é" | od -tx1
# Sortie : c3 a9

# En ISO-8859-1 (Latin-1)
echo -n "é" | iconv -f UTF-8 -t ISO-8859-1 | od -tx1
# Sortie : e9

ASCII ETENDU

Latin-1 fait partie d’une famille :

ISO-8859-1 : Europe occidentale (Latin-1)
ISO-8859-2 : Europe centrale
ISO-8859-5 : Cyrillique
ISO-8859-6 : Arabe
ISO-8859-7 : Grec
ISO-8859-15 : Latin-9 (remplace Latin-1, ajoute le symbole €)

ASCII ETENDU LATIN-1

Hex | 0   1   2   3   4   5   6   7   8   9   A   B   C   D   E   F
----|----------------------------------------------------------------
0x0 | NUL SOH STX ETX EOT ENQ ACK BEL BS  HT  LF  VT  FF  CR  SO  SI 
0x1 | DLE DC1 DC2 DC3 DC4 NAK SYN ETB CAN EM  SUB ESC FS  GS  RS  US 
0x2 | SP  !   "   #   $   %   &   '   (   )   *   +   ,   -   .   / 
0x3 | 0   1   2   3   4   5   6   7   8   9   :   ;   <   =   >   ? 
0x4 | @   A   B   C   D   E   F   G   H   I   J   K   L   M   N   O 
0x5 | P   Q   R   S   T   U   V   W   X   Y   Z   [   \   ]   ^   _ 
0x6 | `   a   b   c   d   e   f   g   h   i   j   k   l   m   n   o 
0x7 | p   q   r   s   t   u   v   w   x   y   z   {   |   }   ~   DEL
0x8 | PAD HOP BPH NBH IND NEL SSA ESA HTS HTJ VTS PLD PLU RI  SS2 SS3
0x9 | DCS PU1 PU2 STS CCH MW  SPA EPA   -  -  -  -  -  -  -  -  - 
0xA | -   ¡   ¢   £   ¤   ¥   ¦   §   ¨   ©   ª   «   ¬   ®   ¯   -
0xB | °   ±   ²   ³   ´   µ   ¶   ·   ¸   ¹   º   »   ¼   ½   ¾   ¿
0xC | À   Á   Â   Ã   Ä   Å   Æ   Ç   È   É   Ê   Ë   Ì   Í   Î   Ï
0xD | Ð   Ñ   Ò   Ó   Ô   Õ   Ö   ×   Ø   Ù   Ú   Û   Ü   Ý   Þ   ß
0xE | à   á   â   ã   ä   å   æ   ç   è   é   ê   ë   ì   í   î   ï
0xF | ð   ñ   ò   ó   ô   õ   ö   ÷   ø   ù   ú   û   ü   ý   þ   ÿ

Conversion

on dispose de iconv sous linux .

iconv -f ISO-8859-1 -t UTF-8 source.txt -o utf8.txt