We describe how a sequence of several characters (often called a string) can be converted into a single number (encoding) and how we can recover the characters from that number (decoding). This is often the first step in many cryptographic protocols, after which the number thus obtained is then encrypted with an encryption function.
In Section 8.1 we had seen how to encode characters as numbers using the encoding function \(C:\A\to\Z_{27}\) from Figure 8.1. Now we combine this with the methods from Section 11.4 and Section 11.5 to encode several characters into one number.
In our first example we encode a short word as a number.
Example12.26.Encode \(\mathtt{do}\) as a number.
Consider the word \(\mathtt{do}\text{.}\) We want to represent \(\mathtt{do}\) by a number in decimal representation in such a way that we can still recover the word.
The word \(\mathtt{do}\) consists of the two characters \(\mathtt{d}\) and \(\mathtt{o}\text{.}\) Using the encoding function \(C:\A\to\Z_{27}\) from Figure 8.1 we convert the two characters into numbers:
Now we have represented \(\mathtt{do}\) by two numbers, but our goal is to represent it by one numbers. For each character the encoding function \(C\) yields a number in \(\Z_{27}\text{,}\) that is a number from \(0\) to \(26\text{.}\) This is also the range of the digits of the base \(27\) representation of a number. Considering \(4\) and \(15\) as base \(27\) digits as a number we get
Because we know how to determine the base \(27\) digits of a number from we can recover \(4\) and \(15\) and thus the word \(\mathtt{do}\) from the number \(123\text{,}\) see Example 12.27.
We now decode the number obtained in the example above.
Example12.27.Decode \(123\).
We find the word encoded in the number \(123\)
First we find the base \(27\) expansion of the decimal number \(123\text{.}\) Following Algorithm 11.27 We get
Thus the word encoded as \(123\) is \(\mathtt{do}\text{.}\)
In general we proceed as follows.
Strategy12.3.Representation of text by a decimal number.
To compute the decimal representation of a word proceed as follows.
(a)
Encode the characters in the given text into numbers in \(\Z_{27}\) using the function \(C:\A\to\Z_{27}\) from Figure 8.1.
(b)
Consider these numbers as coefficient of a base \(27\) expansion. Evaluate the base \(27\) expansion to obtain the decimal representation of the text.
We demonstrate how to compute a representation of the word \(\mathtt{wombat}\) as a number in decimal representation by following the steps given in the strategy above.
Example12.28.Represent \(\mathtt{wombat}\) by a number.
We find a representation of the word \(\mathtt{wombat}\) as an integer.
(a)
The function \(C\) defined in Figure 8.1 encodes the letters in \(\mathtt{wombat}\) as the numbers \(23\text{,}\)\(15\text{,}\)\(13\text{,}\)\(2\text{,}\)\(1\text{,}\)\(20\text{.}\)
(b)
We now consider these as the values of the digits of a base \(27\) number. We obtain
We can also work backwards to find the text that is encoded in a given decimal number.
Strategy12.4.Conversion of decimal numbers to text.
To convert a decimal number to text proceed as follows.
(a)
Find the base 27 expansion of the number.
(b)
Decode each digit of the base 27 expansion using the decoding function \(C^{-1}:\Z_{27}\to\A\) given by from Figure 8.1 to obtain the characters of the text.
Problem12.31.Find the word encoded as \(2234\).
Find the text encoded as the number \(2234\text{.}\)
Answer.
The text represented by the decimal number \(2234\) is \(\mathtt{cat}\)
Solution.
We start by converting the decimal number \(2234\) to a base \(27\) number with the method used by the base conversion algorithm ( Algorithm 11.27 ):
\(2234 \fdiv 27=82\)
\(2234 \fmod 27=20\)
\(82\fdiv27=3\)
\(82 \fmod 27 = 1\)
\(3\fdiv 27=0\)
\(3 \fmod 27 = 3\)
Thus the base \(27\) expansion of \(2234\) is \(2234=3\cdot 27^2+1\cdot 27+20 \cdot 1\text{.}\) We decode the digits of the base \(27\) expansion number into letters using the function \(C^{-1}\) from Figure 8.1 :
We have \(C^{-1}(12) = \mathtt{l}\text{,}\)\(C^{-1}(1) = \mathtt{a}\text{,}\)\(C^{-1}(13) = \mathtt{m}\text{,}\)\(C^{-1}(2) = b\text{,}\) and \(C^{-1}(19) = \mathtt{s}.
\text{.}\) So the word encoded as \(6406525\) is \(\mathtt{lambs}\text{.}\)
In Checkpoint 12.33 apply these methods to recover a word that is encoded in a number.
\(C^{-1}:
\lbrace 0,1,2,3,\dots,26\rbrace
\to
\lbrace \mathtt{-},\mathtt{a},\mathtt{b},\mathtt{c},\dots,\mathtt{z}\rbrace\) with \(C^{-1}(0)=\mathtt{-}\text{,}\)\(C^{-1}(1)=\mathtt{a}\text{,}\)\(C^{-1}(2)=\mathtt{b}\text{,}\)\(\dots\text{,}\)\(C^{-1}(26)=\mathtt{z}\text{,}\)
of the encoding function \(C\) to these integers we obtain the word:
We end this section with the remark that we made several (somewhat arbitrary) choices in our encoding and decoding scheme.
We used our simple character encoding function \(C\text{.}\) In the real world one would use an encoding standard such as ASCII or UTF-8. In both cases one would use \(256\) instead of \(27\text{.}\)
The order we chose for encoding and decoding the characters in a word or more general string was chosen to match the order in which we write base \(b\) expansions. The reverse order would also be a good choice.