Skip to main content
Logo image

Section 12.3 Text

We describe how a sequence of several characters (often called a string) can be converted into a single number (encoding) and how we can recover the characters from that number (decoding). This is often the first step in many cryptographic protocols, after which the number thus obtained is then encrypted with an encryption function.
In Section 8.1 we had seen how to encode characters as numbers using the encoding function \(C:\A\to\Z_{27}\) from Figure 8.1. Now we combine this with the methods from Section 11.4 and Section 11.5 to encode several characters into one number.
In our first example we encode a short word as a number.

Example 12.26. Encode \(\mathtt{do}\) as a number.

Consider the word \(\mathtt{do}\text{.}\) We want to represent \(\mathtt{do}\) by a number in decimal representation in such a way that we can still recover the word.
The word \(\mathtt{do}\) consists of the two characters \(\mathtt{d}\) and \(\mathtt{o}\text{.}\) Using the encoding function \(C:\A\to\Z_{27}\) from Figure 8.1 we convert the two characters into numbers:
\begin{align*} C(\mathtt{d})\amp=4\\ C(\mathtt{o})\amp=15 \end{align*}
Now we have represented \(\mathtt{do}\) by two numbers, but our goal is to represent it by one numbers. For each character the encoding function \(C\) yields a number in \(\Z_{27}\text{,}\) that is a number from \(0\) to \(26\text{.}\) This is also the range of the digits of the base \(27\) representation of a number. Considering \(4\) and \(15\) as base \(27\) digits as a number we get
\begin{equation*} 4\cdot 27^1+15\cdot 27^0=4\cdot 27+15\cdot 1=108+15=123. \end{equation*}
Because we know how to determine the base \(27\) digits of a number from we can recover \(4\) and \(15\) and thus the word \(\mathtt{do}\) from the number \(123\text{,}\) see Example 12.27.
We now decode the number obtained in the example above.

Example 12.27. Decode \(123\).

We find the word encoded in the number \(123\)
First we find the base \(27\) expansion of the decimal number \(123\text{.}\) Following Algorithm 11.27 We get
\(123\fmod 27=15\) and \(123\fdiv 27=4\)
\(4\fmod 27=4\) and \(4\fmod 27=0\)
Thus the base \(27\) expansion of \(123\) is
\begin{equation*} 123=4\cdot 27^1+15\cdot 27^0. \end{equation*}
Decoding the \(4\) and \(15\) with \(C^{-1}:\Z_{27}\to\A\) from Figure 8.1 we get:
\begin{gather*} C^{-1}(4)=\mathtt{d}\\ C^{-1}(15)=\mathtt{o} \end{gather*}
Thus the word encoded as \(123\) is \(\mathtt{do}\text{.}\)
In general we proceed as follows.

Strategy 12.3. Representation of text by a decimal number.

To compute the decimal representation of a word proceed as follows.

(a)

Encode the characters in the given text into numbers in \(\Z_{27}\) using the function \(C:\A\to\Z_{27}\) from Figure 8.1.

(b)

Consider these numbers as coefficient of a base \(27\) expansion. Evaluate the base \(27\) expansion to obtain the decimal representation of the text.
We demonstrate how to compute a representation of the word \(\mathtt{wombat}\) as a number in decimal representation by following the steps given in the strategy above.

Example 12.28. Represent \(\mathtt{wombat}\) by a number.

We find a representation of the word \(\mathtt{wombat}\) as an integer.

(a)

The function \(C\) defined in Figure 8.1 encodes the letters in \(\mathtt{wombat}\) as the numbers \(23\text{,}\) \(15\text{,}\) \(13\text{,}\) \(2\text{,}\) \(1\text{,}\) \(20\text{.}\)

(b)

We now consider these as the values of the digits of a base \(27\) number. We obtain
\begin{equation*} 23\cdot 27^5+15\cdot27^4+13\cdot27^3+2\cdot27^2+1\cdot27+20 \cdot 1=338253860 \end{equation*}
So, the word \(\mathtt{wombat}\) is represented by the decimal number \(338253860\text{.}\)
We demonstrate the strategy in the solution of Problem 12.29.

Problem 12.29. Represent \(\mathtt{dog}\) by a number.

Compute a decimal representation of the word \(\mathtt{dog}\text{.}\)
Answer.
The representation of \(\mathtt{dog}\) as a decimal number is \(3328\text{.}\)
Solution.
Encoding the letters in \(\mathtt{dog}\) by the function \(C\) from Figure 8.1 we obtain
\begin{equation*} C(\mathtt{d})=4, C(\mathtt{o})=15, C(\mathtt{g})=7\text{.} \end{equation*}
Considering as the values of the digits of a base 27 representation, we write out the base 27 expansion and evaluate it:
\begin{equation*} 4\cdot27^2+15\cdot27+7 \cdot 1=3328 \end{equation*}
So, the decimal representation of the word \(\mathtt{dog}\) is the decimal number \(3328\text{.}\)
Now use the methods described above to compute a representation of a word as an integer.

Checkpoint 12.30. Convert word to number.

We want to compute a representation of the word
\(\mathtt{wide}\)
by one integer in decimal representation.
First represent the characters in the word by integers using the encoding function
\(C:\lbrace \mathtt{-},\mathtt{a},\mathtt{b},\dots,\mathtt{z}\rbrace \to \lbrace0,1,2,3,...26\rbrace\) with \(C(\mathtt{-})=0\text{,}\) \(C(\mathtt{a})=1\text{,...,}\)\(C(\mathtt{z})=26\text{.}\)
We obtain
\(C(\mathtt{w}) =\) , \(C(\mathtt{i}) =\) , \(C(\mathtt{d}) =\) , \(C(\mathtt{e}) =\) .
Then we compute the representation as one integer:
\(C(\mathtt{w}) \cdot 27^3 + C(\mathtt{i}) \cdot 27^2 + C(\mathtt{d}) \cdot 27 + C(\mathtt{e}) =\)
We can also work backwards to find the text that is encoded in a given decimal number.

Strategy 12.4. Conversion of decimal numbers to text.

To convert a decimal number to text proceed as follows.

(a)

Find the base 27 expansion of the number.

(b)

Decode each digit of the base 27 expansion using the decoding function \(C^{-1}:\Z_{27}\to\A\) given by from Figure 8.1 to obtain the characters of the text.

Problem 12.31. Find the word encoded as \(2234\).

Find the text encoded as the number \(2234\text{.}\)
Answer.
The text represented by the decimal number \(2234\) is \(\mathtt{cat}\)
Solution.
We start by converting the decimal number \(2234\) to a base \(27\) number with the method used by the base conversion algorithm ( Algorithm 11.27 ):
\(2234 \fdiv 27=82\) \(2234 \fmod 27=20\)
\(82\fdiv27=3\) \(82 \fmod 27 = 1\)
\(3\fdiv 27=0\) \(3 \fmod 27 = 3\)
Thus the base \(27\) expansion of \(2234\) is \(2234=3\cdot 27^2+1\cdot 27+20 \cdot 1\text{.}\) We decode the digits of the base \(27\) expansion number into letters using the function \(C^{-1}\) from Figure 8.1 :
\begin{equation*} C^{-1}(3) = \mathtt{c}, C^{-1}(1) = \mathtt{a}, C^{-1}(20) = \mathtt{t} \end{equation*}
So, the word encoded as the number \(2234\) is \(\mathtt{cat}\text{.}\)

Problem 12.32. Find the word encoded as \(6406525\).

A word is encoded as the integer \(6406525\text{.}\) Find the word.
Answer.
The word encoded as \(6406525\) is \(\mathtt{lambs}\text{.}\)
Solution.
We first find the digits of the base 27 representation of 6406525.
\begin{align*} 6406525 mod 27 = 19 \quad\amp 6406525 div 27 = 237278\\ 237278 mod 27 = 2 \amp 237278 div 27 = 8788\\ 8788 mod 27 = 13 \amp 8788 div 27 = 325\\ 325 mod 27 = 1 \amp 325 div 27 = 12\\ 12 mod 27 = 12 \amp 12 div 27 = 0 \end{align*}
Thus the base \(27\) representation of \(6406525\) is
\begin{equation*} 6406525 = 1\cdot 27^4+12\cdot 27^3+13 \cdot 27^2+2\cdot 27+ 19 \end{equation*}
We have \(C^{-1}(12) = \mathtt{l}\text{,}\) \(C^{-1}(1) = \mathtt{a}\text{,}\) \(C^{-1}(13) = \mathtt{m}\text{,}\) \(C^{-1}(2) = b\text{,}\) and \(C^{-1}(19) = \mathtt{s}. \text{.}\) So the word encoded as \(6406525\) is \(\mathtt{lambs}\text{.}\)
In Checkpoint 12.33 apply these methods to recover a word that is encoded in a number.

Checkpoint 12.33. Convert number to word.

A word is encoded in the integer:
\(6565771\)
We find the digits of the base 27 representation
\(6565771 =\) \(\cdot 27^4+\) \(\cdot 27^3+\) \(\cdot 27^2+\) \(\cdot 27+\)
Applying the inverse
\(C^{-1}: \lbrace 0,1,2,3,\dots,26\rbrace \to \lbrace \mathtt{-},\mathtt{a},\mathtt{b},\mathtt{c},\dots,\mathtt{z}\rbrace\) with \(C^{-1}(0)=\mathtt{-}\text{,}\) \(C^{-1}(1)=\mathtt{a}\text{,}\) \(C^{-1}(2)=\mathtt{b}\text{,}\) \(\dots\text{,}\) \(C^{-1}(26)=\mathtt{z}\text{,}\)
of the encoding function \(C\) to these integers we obtain the word:
described in detail following the image
A man is looking at a computer monitor and speaking into a microphone.
Man 1: A’la’ih, do’neh’lini, do’neh’lini, a’la’ih, do’neh’lini, a’la’ih, do’neh’lini, do’neh’lini, a’la’ih, a’la’ih, do’neh’lini, a’la’ih, do’neh’lini,do’neh’lini, do’neh’lini...
Two men are talking nearby.
Man 2: For added security, after we encrypt the data stream, we send it through our Navajo code talker. Man 3:...Is he just using Navajo words for "Zero" and "One"? Man 2: Woah, hey, keep your voice down!
Title text: As far as I can tell, Navajo doesn’t have a common word for ’zero’. do-neh-lini means ’neutral’.
As far as I can tell, Navajo doesn’t have a common word for ’zero’. do-neh-lini means ’neutral’.
Figure 12.34. Code Talkersby Randall Munroe ( https://xkcd.com/257 ).
We end this section with the remark that we made several (somewhat arbitrary) choices in our encoding and decoding scheme.
  • We used our simple character encoding function \(C\text{.}\) In the real world one would use an encoding standard such as ASCII or UTF-8. In both cases one would use \(256\) instead of \(27\text{.}\)
  • The order we chose for encoding and decoding the characters in a word or more general string was chosen to match the order in which we write base \(b\) expansions. The reverse order would also be a good choice.