The Golden Middle Path - a blog by Amit K Mathur

Characters: From the eyes of computers

You type something on the computer keyboard and sent it away to a friend in an email or to the world in a Facebook post. All this without thinking much. But, lets pause for a second. When you press the key ‘a’ and you see it appear on your screen, have you wondered how does your computer know that you typed an ‘a’? Or, when you send that message over, how does your friend’s computer know that the message contains an ‘a’?

“Meh? Simple, isn’t it?”, you think.

Let me remind you that computers can only store numbers. That too only in binary format – as 0s and 1s. No ‘a’ allowed there.

So, how do computers work with alphabets (or charaters or chars as its called in the lingo of computer engineers)? How does a computer which knows only 0s and 1s, recognize an ‘a’?

To understand that is not very difficult actually. Lets see.

Everything is a number to a compter. It can store only numbers on its hard disk and transfer only numbers from one computer to another. That number is stored in binary format but it is still a number. Like 206 will be stored as 11001110.

Computer programmes give each character like ‘a’, ‘b’, ‘c’ etc a number. For example, ‘a’ is given the number 65. ‘b’ is given 66. Its like a serial number for that letter. And it is completely arbitrary.

Now, whenever you type ‘a’, it is stored as 65 in the computer’s memory.

Well, that creates a problem then, doesn’t it? How will the computer know the difference between number 65 and ‘a’?

All data from a computer is read using some program, also called an application. The program knows what kind of data to expect in the memory. For example, when you open a file in a word processor, the word processor program knows that it is supposed to be a text file with letters and words in it.

So, a word processor program reads whatever is in the computer’s memory i.e. the numbers, as if they were characters. When your word processor sees a 65, it shows you an ‘a’.

So, the answer to the original question is: a computer does not know anything about characters. Its the applications which write the code for the characters in memory and then read those numbers interpreting them as characters.

A string or a word in the computer’s memory is just a sequence of such numbers, one for each character, followed by a word ending marker, usually a zero.

In summary, letters or characters are assigned a serial number or code and that’s what ultimately gets stored in a computers memory – RAM or hard disk.

Now, what happens if two programmers, working on two different applications, decide on some encoding and get into a conflict. Say, one assigns 65 to ‘a’ but another assigns 65 to “Devnagri A”? Then a file written using one encoding cannot be read by the other application. Actually, that used to happen and it stills happens sometimes.

So, a group of wise folks got together and created a big list of all letters from all the alphabets in the world and assigned them numbers or codes and urged everyone to use this common list. It is called Unicode. So, if your program uses unicode, chances are your data are compatible with everyone else’s.



Post a comment

(Formatting help)