What is the encoding of information. Encoding of text information. Information coding. Information coding in the PC. way. Matrix method

Code is a set legend(or signals) to record (or transmit) some predefined concepts.

Information coding is the process of forming a specific representation of information. In a narrower sense, the term "coding" is often understood as a transition from one form of information presentation to another, more convenient for storage, transmission or processing.

Usually, each image when encoded (sometimes they say - encrypted) is represented by a separate sign.

A sign is an element of a finite set of distinct elements.

In a narrower sense, the term "coding" is often understood as a transition from one form of information presentation to another, more convenient for storage, transmission or processing.

The computer can only process information presented in numerical form. All other information (for example, sounds, images, instrument readings, etc.) for processing on a computer must be converted into numerical form. For example, to digitize a musical sound, you can measure the intensity of the sound at specific frequencies at short intervals, presenting the results of each measurement in numerical form. With the help of programs for a computer, you can perform transformations of the received information, for example, "superimpose" sounds from different sources on top of each other.

Similarly, text information can be processed on a computer. When entered into a computer, each letter is encoded with a certain number, and when output to external devices(screen or print) for human perception, images of letters are built using these numbers. The correspondence between a set of letters and numbers is called character encoding.

As a rule, all numbers in a computer are represented using zeros and ones (and not ten digits, as is customary for people). In other words, computers usually work in binary system dead reckoning, since the devices for their processing are much simpler. Entering numbers into a computer and displaying them for reading by a person can be carried out in the usual decimal form, and all the necessary transformations are performed by programs running on the computer.

Methods of encoding information.

The same information can be presented (encoded) in several forms. With the advent of computers, it became necessary to encode all types of information with which both an individual person and humanity as a whole deal. But humanity began to solve the problem of coding information long before the advent of computers. The tremendous achievements of mankind - writing and arithmetic - are nothing more than a system for coding speech and numerical information. Information never appears in its pure form, it is always somehow presented, somehow encoded.

Binary coding is one of the common ways to represent information. In computers, in robots and machine tools with numerical program management As a rule, all information with which the device deals is encoded in the form of words of the binary alphabet.

Encoding character (text) information.

The main operation performed on individual text characters is character comparison.

When comparing symbols, the most important aspects are the uniqueness of the code for each symbol and the length of this code, and the choice of the coding principle itself is practically irrelevant.

Various lookup tables are used to encode texts. It is important that the same table is used when encoding and decoding the same text.

A conversion table is a table containing a list of encoded characters, ordered in some way, in accordance with which the character is converted to its binary code and vice versa.

Most popular lookup tables: DKOI-8, ASCII, CP1251, Unicode.

Historically, 8 bits or 1 byte have been chosen as the length of the code for encoding characters. Therefore, most often one character of text stored in the computer corresponds to one byte of memory.

There can be 28 = 256 different combinations of 0 and 1 with a code length of 8 bits, so no more than 256 characters can be encoded using one lookup table. With a code length of 2 bytes (16 bits), 65536 characters can be encoded.

Numeric information encoding.

Similarities in encoding numeric and text information is as follows: in order to be able to compare data of this type, different numbers (as well as different symbols) must have a different code. The main difference between numerical data and symbolic data is that in addition to the comparison operation, various mathematical operations are performed on numbers: addition, multiplication, root extraction, calculating the logarithm, etc. The rules for performing these operations in mathematics are developed in detail for numbers represented in the positional number system.

The basic number system for representing numbers in a computer is the binary positional number system.

Encoding text information

Currently, most of the users, using a computer, processes textual information, which consists of symbols: letters, numbers, punctuation marks, etc. Let's calculate how many characters there are and how many bits we need.

10 numbers, 12 punctuation marks, 15 arithmetic signs, letters of the Russian and Latin alphabets, TOTAL: 155 characters, which corresponds to 8 bits of information.

Units of information measurement.

1 byte = 8 bits

1 KB = 1024 bytes

1 MB = 1024 Kbytes

1 GB = 1024 MB

1 TB = 1024 GB

The essence of coding is that each character is assigned a binary code from 00000000 to 11111111 or the corresponding decimal code from 0 to 255.

It must be remembered that currently five different code tables are used to encode Russian letters (KOI - 8, CP1251, CP866, Mac, ISO), and texts encoded using one table will not be correctly displayed in another

The main display of character encoding is ASCII - American Standard Code for Information Interchange - American standard code exchange of information, which is a 16 by 16 table, where the characters are encoded in the hexadecimal number system.

Coding graphic information.

An important stage in the coding of a graphic image is its division into discrete elements (discretization).

The main ways of presenting graphics for storage and processing using a computer are raster and vector images.

Vector image is a graphic object consisting of elementary geometric shapes (most often segments and arcs). The position of these elementary segments is determined by the coordinates of the points and the value of the radius. For each line, the binary codes of the line type (solid, dotted, dash-dotted), thickness and color are indicated.

Bitmap is a collection of points (pixels) obtained as a result of image sampling in accordance with the matrix principle.

The matrix principle of coding graphic images is that the image is divided into a given number of rows and columns. Then each element of the resulting grid is encoded according to the selected rule.

Pixel (picture element) is the minimum unit of an image, the color and brightness of which can be set independently of the rest of the image.

In accordance with the matrix principle, images are built, output to the printer, displayed on the display screen, obtained using a scanner.

The image quality will be the higher, the "denser" the pixels are, that is, the higher the resolution of the device, and the more accurately the color of each of them is encoded.

For black and white image the color code of each pixel is specified by one bit.

If the picture is colored, then a binary code of its color is set for each point.

Since colors are also encoded in binary code, then if, for example, you want to use a 16-color picture, then you need 4 bits (16 = 24) to encode each pixel, and if it is possible to use 16 bits (2 bytes) for color encoding one pixel, then you can then transfer 216 = 65536 different colors. Using three bytes (24 bits) to encode the color of a single dot allows for 16,777,216 (or about 17 million) different shades of color to be reflected - the so-called True Color mode. Note that these are currently used, but far from the limiting capabilities of modern computers.

Audio coding.

You know from physics course that sound is vibrations of air. By its nature, sound is a continuous signal. If we convert sound into an electrical signal (for example, using a microphone), we see voltage smoothly changing over time.

For computer processing, an analog signal must be somehow converted into a sequence of binary numbers, and for this it must be sampled and digitized.

You can proceed as follows: measure the signal amplitude at regular intervals and record the received numerical values into the computer memory.

A modern computer can process numerical, textual, graphic, audio and video information. All these types of information in a computer are represented in binary code, that is, an alphabet with a power of two characters (0 and 1) is used. This is due to the fact that it is convenient to represent information in the form of a sequence of electrical impulses: there is no impulse (0), there is an impulse (1). Such coding is usually called binary, and the logical sequences of zeros and ones themselves are called machine language.

Each digit of the machine binary code carries the amount of information equal to one bit.

This conclusion can be made by considering the numbers of the machine alphabet as equiprobable events. When writing a binary digit, it is possible to realize the choice of only one of two possible states, which means that it carries an amount of information equal to 1 bit. Consequently, two digits carry information 2 bits, four bits - 4 bits, etc. To determine the amount of information in bits, it is enough to determine the number of digits in the binary machine code.

Encoding text information

Currently, most of the users using a computer processes text information, which consists of symbols: letters, numbers, punctuation marks, etc.

Based on one cell with information capacity of 1 bit, only 2 different states can be encoded. In order for each character that can be entered from the keyboard in the Latin register to receive its own unique binary code, 7 bits are required. Based on a sequence of 7 bits, in accordance with the Hartley formula, N = 2 7 = 128 different combinations of zeros and ones can be obtained, i.e. binary codes. By associating each character with its binary code, we get a coding table. A person operates with symbols, a computer - with their binary codes.

For the Latin keyboard layout, such a coding table is one for the whole world, therefore, the text typed using the Latin keyboard layout will be adequately displayed on any computer. This table is called ASCII (American Standard Code of Information Interchange) in English pronounced [eski], in Russian pronounced [aski]. Below is the entire ASCII table, the codes in which are indicated in decimal form. It can be used to determine that when you enter, say, the character “*” from the keyboard, the computer perceives it as code 42 (10), in turn 42 (10) = 101010 (2) - this is the binary code of the character “* ”. Codes 0 through 31 are not used in this table.

ASCII character table

In order to encode one character, an amount of information equal to 1 byte is used, that is, I = 1 byte = 8 bits. Using a formula that connects the number of possible events K and the amount of information I, you can calculate how many different symbols can be encoded (assuming that the symbols are possible events):

K = 2 I = 2 8 = 256,

that is, an alphabet with a capacity of 256 characters can be used to represent textual information.

The essence of coding is that each character is assigned a binary code from 00000000 to 11111111 or the corresponding decimal code from 0 to 255.

It must be remembered that currently for encoding Russian letters, five different code tables are used(KOI - 8, CP1251, CP866, Mac, ISO), and texts encoded using one table will not be displayed correctly in another encoding. This can be clearly represented as a fragment of the combined character encoding table.

Different symbols are assigned to the same binary code.

Binary code	Decimal code

However, in most cases, it is not the user who cares about the transcoding of text documents, but special programs- converters that are built into applications.

Since 1997 latest versions Microsoft Office support the new encoding. It's called Unicode. Unicode is a codebook that uses 2 bytes to encode each character, i.e. 16 bit. Based on such a table, N = 2 16 = 65 536 characters can be encoded.

Unicode includes almost all modern scripts, including: Arabic, Armenian, Bengali, Burmese, Greek, Georgian, Devanagari, Hebrew, Cyrillic, Coptic, Khmer, Latin, Tamil, Hangul, Han (China, Japan, Korea), Cherokee, Ethiopian, Japanese (Katakana, Hiragana, Kanji) and others.

For academic purposes, many historical scripts have been added, including: Ancient Greek, Egyptian hieroglyphs, cuneiform, Mayan writing, Etruscan alphabet.

Unicode provides a wide range of mathematical and musical symbols and pictograms.

For Cyrillic characters in Unicode, two ranges of codes are allocated:

Cyrillic (# 0400 - # 04FF)

Cyrillic Supplement (# 0500 - # 052F).

But the implementation of the Unicode table in its pure form is held back for the reason that if the code of one character will occupy not one byte, but two bytes, that for storing text it will take twice as much disk space, and for its transfer over communication channels - twice as long.

Therefore, in practice, the Unicode representation of UTF-8 (Unicode Transformation Format) is now more common. UTF-8 provides the best compatibility with systems using 8-bit characters. Text containing only characters numbered less than 128 is converted to plain ASCII text when written in UTF-8. The rest of the Unicode characters are represented by sequences of 2 to 4 bytes in length. In general, since the most common characters in the world - the characters of the Latin alphabet - in UTF-8 still occupy 1 byte, this encoding is more economical than pure Unicode.

To determine the numeric character code, you can either use the code table. To do this, select the "Insert" - "Symbol" item in the menu, after which the Symbol dialog box appears on the screen. The symbol table for the selected font appears in the dialog box. The characters in this table are arranged line by line, sequentially from left to right, starting with the Space character.

One of the main advantages of a computer is that it is an amazingly versatile machine. Anyone who has ever come across it knows that doing arithmetic calculations is not at all the main method of using a computer. Computers perfectly reproduce music and video films, with their help it is possible to organize speech and video conferencing on the Internet, create and process graphic images, and the possibility of using a computer in the field of computer games at first glance looks completely incompatible with the image of a superarhythmometer, which grinds hundreds of millions of digits per second.

When composing an information model of an object or phenomenon, we must agree on how to understand certain designations. That is, to agree on the type of presentation of information.

A person expresses his thoughts in the form of sentences made up of words. They are an alphabetical representation of information. The basis of any language is an alphabet - a finite set of various signs (symbols) of any nature, from which a message is composed.

One and the same entry can carry different meanings. For example, a set of numbers 251299 can mean: the mass of an object; the length of the object; distance between objects; phone number; record date December 25, 1999.

To represent information, different codes can be used and, accordingly, you need to know certain rules - the laws of writing these codes, i.e. be able to code.

Code - a set of symbols for the presentation of information.

Coding - the process of presenting information in the form of a code.

To communicate with each other, we use a code - Russian. When speaking, this code is transmitted in sounds, when writing - in letters. The driver transmits a signal with a beep or flashing headlights. You are faced with the coding of information when crossing the road in the form of traffic signals. Thus, encoding is reduced to the use of a collection of characters according to strictly defined rules.

Information can be encoded in various ways: orally; in writing; gestures or signals of any other nature.

Binary coding of data.

As technology developed, different ways coding information. In the second half of the 19th century, American inventor Samuel Morse invented an amazing code that still serves mankind today. Information is encoded with three characters: long signal (dash), short signal (dot), no signal (pause) - to separate letters.

Its own system also exists in computing - it is called binary coding and is based on the representation of data by a sequence of only two characters: 0 and 1. These characters are called binary digits, in English -binary digit or abbreviated bit (bit).

One bit can express two concepts: 0 or 1 ( Yes or No, black or white, true or Lying etc.). If the number of bits is increased to two, then four different concepts can already be expressed:

Three bits can encode eight different values:

000 001 010 011 100 101 110 111

By increasing the number of bits in the binary coding system by one, we double the number of values that can be expressed in this system, that is, the general formula is:

where N is the number of independent encoded values;

m - bit width of the binary coding adopted in this system.

Each digit of the machine binary code carries the amount of information equal to one bit.

Encoding text information

Currently, most of the users using a computer processes text information, which consists of symbols: letters, numbers, punctuation marks, etc.

ASCII character table

K = 2 I = 2 8 = 256,

that is, an alphabet with a capacity of 256 characters can be used to represent textual information.

The essence of coding is that each character is assigned a binary code from 00000000 to 11111111 or the corresponding decimal code from 0 to 255.

Different symbols are assigned to the same binary code.

Binary code	Decimal code

However, in most cases, it is not the user who cares about the transcoding of text documents, but special programs - converters that are built into applications.

Since 1997, the last Microsoft versions Office supports the new encoding. It's called Unicode. Unicode is a codebook that uses 2 bytes to encode each character, i.e. 16 bit. Based on such a table, N = 2 16 = 65 536 characters can be encoded.

For academic purposes, many historical scripts have been added, including: Ancient Greek, Egyptian hieroglyphs, cuneiform, Mayan writing, Etruscan alphabet.

Unicode provides a wide range of mathematical and musical symbols and pictograms.

For Cyrillic characters in Unicode, two ranges of codes are allocated:

Cyrillic (# 0400 - # 04FF)

Cyrillic Supplement (# 0500 - # 052F).

Department of Education of the city of Moscow

State educational institution

Middle vocational education

College of Architecture and Construction No. 7 TSP-2

Report

On the subject: "Informatics and ICT"

on the topic: "Number systems ».

Completed by: student of group 11EVM

Full name: Vus Ivan Valerievich

checked:

Teacher Ovsyannikova A.S.

Moscow - 2011

Data representation in memory personal computer(numbers, symbols, graphics, sound).

Form and language of information presentation

Perceiving information with the help of the senses, a person seeks to fix it so that it becomes understandable to others, presenting it in one form or another.

The composer can play a musical theme on the piano, and then write it down using notes. Images inspired by the same melody can be embodied by the poet in the form of a poem, the choreographer can express it with a dance, and the artist can express it in a painting.

A person expresses his thoughts in the form of sentences made up of words. Words, in turn, are composed of letters. This is an alphabetical representation of information.

The form of presentation of the same information may be different. It depends on the goal you have set for yourself. You come across similar operations in mathematics and physics lessons when you present a solution in different forms. For example, the solution to the problem: "Find the value of a mathematical expression ..." can be presented in tabular or graphical form by using visual means of presenting information: numbers, a table, a picture.

Thus, information can be presented in various forms:

sign writing, consisting of various signs, among which it is customary to distinguish

symbolic in the form of text, numbers, special characters (for example, text of a textbook);
graphic (for example, a geographic map);
tabular (for example, a table for recording the course of a physical experiment);

in the form of gestures or signals (for example, signals from a traffic controller);
oral verbal (for example, conversation).

The form of presenting information is very important when transmitting it: if a person has poor hearing, then it is impossible to transmit information to him in sound form; if the dog has a poorly developed sense of smell, then it cannot work in the search service. At different times, people transmitted information in various forms using: speech, smoke, drumming, ringing bells, writing, telegraph, radio, telephone, fax.

Regardless of the form of presentation and method of transmission of information, it is always transmitted using a language.

In mathematics lessons, you use a special language based on numbers, signs of arithmetic operations and relations. They make up the alphabet of the language of mathematics.

In physics lessons, when considering a physical phenomenon, you use the characteristic of this language Special symbols, from which you make formulas. Formula is a word in the language of physics.

In chemistry lessons, you also use certain symbols, signs, combining them into the "words" of a given language.

There is the language of the deaf and dumb, where the symbols of the language are certain signs expressed by facial expressions and hand movements.

The basis of any language is the alphabet - a set of uniquely defined characters (symbols) from which a message is formed.

Languages are divided into natural (spoken) and formal. The natural language alphabet depends on national traditions. Formal languages are found in special areas of human activity (mathematics, physics, chemistry, etc.). There are about 10,000 different languages, dialects, and adverbs in the world. Many spoken languages are descended from the same language. For example, French, Spanish, Italian and other languages were formed from Latin.

Information coding

With the advent of language, and then sign systems, the possibilities of communication between people have expanded. This made it possible to store ideas, knowledge gained and any data, transmit them in various ways over a distance and at other times - not only to their contemporaries, but also to future generations. The creations of ancestors have survived to this day, who, with the help of various symbols, immortalized themselves and their deeds in monuments and inscriptions. Rock paintings (petroglyphs) are still a mystery to scientists. Perhaps, in this way, the ancient people wanted to make contact with us, the future inhabitants of the planet, and report the events of their lives.

Each nation has its own language, consisting of a set of characters (letters): Russian, English, Japanese and many others. You have already become familiar with the language of mathematics, physics, chemistry.

The representation of information using a language is often referred to as coding.

Code- a set of symbols (symbols) to represent information. Coding- the process of presenting information in the form of a code.

The driver transmits a signal with a beep or flashing headlights. The code is the presence or absence of a beep, and in the case of a light alarm, the blinking of headlights or its absence.

You are faced with the coding of information when crossing the road by traffic signals. The code determines the colors of the traffic light - red, yellow, green.

The natural language in which people communicate is also based on code. Only in this case is it called the alphabet. When speaking, this code is transmitted in sounds, when writing - in letters. The same information can be represented using different codes. For example, a recording of a conversation can be recorded by means of Russian letters or special verbatim symbols.

As technology developed, different ways of encoding information appeared. In the second half of the 19th century, American inventor Samuel Morse invented an amazing code that still serves mankind today. The information is coded with three "letters": a long signal (dash), a short signal (dot) and no signal (pause) to separate letters. Thus, encoding is reduced to using a set of characters arranged in a strictly defined order.

People have always looked for ways to quickly exchange messages. For this, messengers were sent, carrier pigeons were used. The peoples existed different ways notifications of impending danger: drums, smoke from campfires, flags, etc. However, the use of such a presentation of information requires prior agreement on the understanding of the received message.

The famous German scientist Gottfried Wilhelm Leibniz proposed in the 17th century a unique and simple system representations of numbers. "Calculation with the help of twos ... is fundamental for science and generates new discoveries ... when the numbers are reduced to the simplest principles, which are 0 and 1, a wonderful order appears everywhere."

Today, this way of representing information using a language containing only two characters of the alphabet - 0 and 1, is widely used in technical devices, including in the computer. These two characters 0 and 1 are usually called binary digits or bits (from the English bit - Binary Digit - binary sign).

Engineers were attracted by this coding method by the simplicity of the technical implementation - whether there is a signal or not. Any message can be encoded with these two digits.

The larger unit for measuring the amount of information is considered to be 1 byte, which consists of 8 bits.

It is also accepted to use larger units for measuring the amount of information. The number 1024 (2 10) is a multiplier when moving to a higher unit of measurement.

Encoding information in a computer

All information that a computer processes must be represented by a binary code using two digits - 0 and 1. These two characters are usually called binary digits, or bits. Any message can be encoded with two digits 1 and 0. This was the reason that two important processes must be organized in the computer:

coding, which is provided by input devices when converting input information into a form perceivable by a computer, that is, into a binary code;
decoding, which is provided by output devices when converting data from a binary code into a form that can be understood by a person.

From the point of view of technical implementation, the use of the binary number system for encoding information turned out to be much
easier than using other methods. Indeed, it is convenient to encode information as a sequence of zeros and ones if these values are presented as two possible stable states of an electronic element:

0 - no electrical signal or the signal is low;
1 - the presence of a signal or the signal is high.

These states are easy to distinguish. The disadvantage of binary encoding is long codes. But in technology it is easier to deal with a large number of simple elements than with a small number of complex ones.

Every day you and in everyday life have to deal with a device that can only be in two stable states: on / off. Of course, this is a well-known switch. But it turned out to be impossible to come up with a switch that could stably and quickly switch to any of the 10 states. As a result, after a series of unsuccessful attempts, the developers came to the conclusion that it was impossible to build a computer based on the decimal number system. And the binary number system was the basis for the representation of numbers in a computer.

Currently, there are different ways of binary encoding and decoding information in a computer. First of all, it depends on the type of information, namely, what should be encoded: text, numbers, graphics or sound. In addition, when encoding numbers, an important role is played by how they will be used: in text, in calculations, or in the process of I / O. The peculiarities of technical implementation are also superimposed.

Number encoding

Number system - a set of techniques and rules for writing numbers using a specific set of characters.

To write numbers, not only numbers can be used, but also letters (for example, writing Roman numerals - XXI). The same number can be represented differently in different number systems.

Depending on the method of displaying numbers, the number systems are divided into positional and non-positional.

In the positional number system, the quantitative value of each digit of a number depends on where (position or digit) one or another digit of this number is written. For example, changing the position of the digit 2 in the decimal system, you can write decimal numbers of different sizes, for example 2; twenty; 2000; 0.02, etc.

In a non-positional number system, the numbers do not change their quantitative value when their location (position) in the number changes. An example of a non-positional system is the Roman system, in which, regardless of location, the same symbol has the same meaning (for example, the symbol X in XXV).

The number of different symbols used to represent a number in a positional number system is called the base number system.

In the computer, the most suitable and reliable was the binary number system, in which the sequences of numbers 0 and 1 are used to represent numbers.

In addition, to work with computer memory, it turned out to be convenient to use the representation of information using two more number systems:

octal (any number is represented using eight digits - 0, 1, 2 ... 7);
hexadecimal (used characters-numbers - 0, 1, 2 ... 9 and letters - A, B, C, D, E, F, replacing numbers 10, 11, 12, 13, 14, 15, respectively).

Encoding character information

Pressing an alphanumeric key on the keyboard sends a signal to the computer as a binary number representing one of the values in the code table. The code table is internal representation characters in the computer. All over the world, the ASCII table (American Standard Code for Informational Interchange) is accepted as a standard.

To store the binary code of one character, 1 byte = 8 bits is allocated. Considering that each bit takes the value 1 or 0, the number of possible combinations of ones and zeros is 2 8 = 256.

This means that with the help of 1 byte, you can get 256 different binary code combinations and display 256 different symbols with their help. These codes make up the ASCII table.

For example, when you press the key with the letter S, the code 01010011 is written into the computer's memory. When the letter S is displayed on the screen, the computer performs decoding - based on this binary code, an image of the symbol is built.

SUN - 01010011 010101101 01001110

The ASCII standard encodes the first 128 characters from 0 to 127: numbers, letters of the Latin alphabet, control characters. The first 32 characters are control characters and are intended mainly for transmitting control commands. Their purpose may vary depending on the software and hardware. The second half of the code table (from 128 to 255) is not defined by the American standard and is intended for symbols of national alphabets, pseudographic and some mathematical symbols. V different countries different versions of the second half of the code table can be used.

Note! Numbers are encoded according to the ASCII standard and are recorded in two cases - during input-output and when they are encountered in the text. If the numbers are involved in calculations, then they are converted to another binary code.

For comparison, consider the number 45 for two coding options.

When used in text, this number will require 2 bytes for its representation, since each digit will be represented by its own code in accordance with the ASCII table. In the binary system - 00110100 00110101.

When used in calculations, the code of this number will be obtained according to special translation rules and represented as an 8-bit binary number 00101101, which will require 1 byte.

Categories

Popular