Basic synopsis on the "Hartley-Shannon formula". Information, data, signals. Sources of information and its carriers. The amount of information and entropy. Hartley and Shannon's formulas How to evaluate information using the Hartley formula

Approaches to determining the amount of information.

American engineer R. Hartley in 1928 ᴦ. the process of obtaining information was considered as the choice of one message from a finite predetermined set of N equiprobable messages, and the amount of information I contained in the selected message was determined as the binary logarithm of N.

Hartley's formula: I = log 2 N

Let's say you need to guess one number from a set of numbers from one to one hundred. Using the Hartley formula, you can calculate how much information is required for this: I = log 2 100 = 6.644. Τᴀᴋᴎᴍ ᴏϬᴩᴀᴈᴏᴍ, the message about the correctly guessed number contains the amount of information approximately equal to 6,644 units of information.

Here are the others examples of equiprobable messages:

1.when a coin is tossed: "tails fell", "the eagle fell";

2.on the book page: "the number of letters is even", "the number of letters is odd".

Let us now determine whether the messages "the first woman to leave the door of the building" and "a man will be the first to leave the door of the building"... It is impossible to answer this question unambiguously. It all depends on what kind of building we are talking about. If it is, for example, a cinema, then the probability of leaving the door first is the same for a man and a woman, and if it is a military barracks, then for a man this probability is much higher than for a woman.

For tasks of this kind, an American scientist Claude Shannon offered in 1948 ᴦ. another formula for determining the amount of information, taking into account the possible unequal probability of messages in the set.

Shannon's formula: I = - (p 1 log 2 p 1 + p 2 log 2 p 2 +... + P N log 2 p N), where p i- the probability that exactly i The th message is selected in the set of N messages.

It is easy to see that if the probabilities p 1, ..., p N are equal, then each of them is equal 1 / N, and Shannon's formula turns into Hartley's formula.

In addition to the two considered approaches to determining the amount of information, there are others. It is important to remember that any theoretical results apply only to a certain range of cases outlined by the initial assumptions.

As a unit of information, Claude Shannon proposed to take one bit (English. bit - bi nary digi t - binary digit).

Bitin information theory- the amount of information that is extremely important for distinguishing between two equally probable messages (such as "heads" - "tails", "even" - "odd", etc.). In computing a bit is the smallest "portion" of computer memory required to store one of the two characters "0" and "1" used for the internal machine representation of data and instructions.

Bit is too small a unit of measure. In practice, a larger unit is often used - byte equal to eight bits. Exactly eight bits are required to encode any of the 256 characters of the computer keyboard alphabet (256 = 2 8).

Even larger derived units of information are also widely used:

  • 1 kilobyte (KB) = 1024 bytes = 2 10 bytes,
  • 1 Megabyte (MB) = 1024 KB = 2 20 bytes,
  • 1 Gigabyte (GB) = 1024 MB = 2 30 bytes.

Recently, in connection with the increase in the amount of processed information, such derived units as:

  • 1 Terabyte (TB) = 1024 GB = 2 40 bytes,
  • 1 Petabyte (PB) = 1024 TB = 2 50 bytes.

For a unit of information, one could choose the amount of information that is extremely important for distinguishing, for example, ten equally probable messages. It will not be a binary (bit), but a decimal (dit) unit of information.

1.6. What can you do with the information?

Information can be:

All these processes associated with certain operations with information are called information processes.

1.7. What properties does information have?

Information properties:

Information reliable, if it reflects the true state of affairs. Inaccurate information can lead to misunderstandings or incorrect decisions.

Over time, reliable information can become unreliable, since it has the property of becoming outdated, that is, it ceases to reflect the true state of affairs.

Information full, if it is enough for understanding and making decisions. Both incomplete and redundant information hinders decision making or can lead to errors.

Accuracy of information is determined by the degree of its proximity to the real state of the object͵ process, phenomenon, etc.

Value information depends on how important it is for solving the problem, as well as on how further it will find application in any types of human activity.

Only in a timely manner the information received can bring the expected benefits. Both the premature presentation of information (when it cannot be absorbed yet) and its delay are equally undesirable.

If valuable and timely information is expressed in an incomprehensible way, it can become useless.

Information becomes understandable if it is expressed in the language spoken by those to whom the information is intended.

Information should be presented in an accessible (by the level of perception) form. For this reason, the same questions are presented in different ways in school textbooks and scientific publications.

Information on the same issue can be presented briefly (concisely, without insignificant details) or at length (in detail, verbose). Brevity of information is necessary in reference books, encyclopedias, textbooks, all kinds of instructions.

Control questions:

1. What does the term "informatics" mean and what is its origin?

2. What areas of knowledge have been officially assigned to the concept of "informatics" since 1978?

3. What spheres of human activity and to what extent does informatics affect?

4. What are the main components of informatics and the main directions of its application.

5. What is meant by the concept of "information" in the everyday, scientific and technical sense?

6. From whom (or what) does the person receive information? Who does the information transfer to?

7. What can you do with the information?

8. Give examples of human information processing. What are the results of this processing?

9. Give examples of technical devices and systems for collecting and processing information.

10. What determines the informativeness of a message received by a person?

11. Why is it more convenient to evaluate the amount of information in a message not by the degree of increase in knowledge about the object, but by the degree of reduction of the uncertainty of our knowledge about it?

12. How is the unit of measurement of the amount of information determined?

13. In what cases and by what formula can you calculate the amount of information contained in a message?

14. Why is the number 2 taken as the base of the logarithm in Hartley's formula?

15. Under what condition does Shannon's formula transform into Hartley's formula?

16. What defines the term "bit" in information theory and computing?

17. Give examples of messages, the information content of which can be determined unambiguously.

In 1928, the American engineer R. Hartley considered the process of obtaining information as the choice of one message from a finite in advance given set of N equiprobable messages, and the amount of information I contained in the selected message was determined as the binary logarithm of N.

Hartley's formula: I = log 2 N or N = 2 i

Let's say you need to guess one number from a set of numbers from one to one hundred. Using the Hartley formula, you can calculate how much information is required for this: I = log 2 100> 6.644. Thus, the message about the correctly guessed number contains the amount of information approximately equal to 6,644 units of information.

Let's give other examples equally probable messages :

1. when throwing a coin: "tails fell", "heads fell";

2. on the page of the book: “the number of letters is even”, “the number of letters is odd”.

Let us now determine whether equiprobable messages « the first to come out of the door of the building is a woman " and "The first to leave the door of the building is a man". It is impossible to answer this question unequivocally. It all depends on what kind of building we are talking about. If this is, for example, a metro station, then the probability of getting out of the door first is the same for a man and a woman, and if it is a military barracks, then for a man this probability is much higher than for a woman.

For problems of this kind, the American scientist Claude Shannon proposed in 1948 another formula determining the amount of information, taking into account the possible unequal probability of messages in the set .

Shannon's formula: I = - (p 1 log 2 p 1 + p 2 log 2 p 2 +... + P N log 2 p N),

where p i is the probability that the i-th message is selected in the set of N messages.

It is easy to see that if the probabilities p 1, ..., p N are equal, then each of them is equal to 1 / N, and Shannon's formula turns into Hartley's formula.

In addition to the two considered approaches to determining the amount of information, there are others. It is important to remember that any theoretical results apply only to a certain range of cases outlined by the initial assumptions.

As units of information Claude Shannon offered to take one bit(English bit - binary digit - binary digit).

Bit in information theory - the amount of information required to distinguish between two equally probable messages (such as "heads" - "tails", "even" - "odd", etc.).

In computing, a bit is the smallest "portion" of computer memory required to store one of the two characters "0" and "1" used for the in-machine representation of data and instructions.

Bit is too small a unit of measure. In practice, a larger unit is often used - byte equal to eight bits. Exactly eight bits are required in order to encode any of the 256 characters of the computer keyboard alphabet (256 = 2 8).



Even larger derived units of information are also widely used:

1 kilobyte (KB) = 1024 bytes = 210 bytes,

1 Megabyte (MB) = 1024 KB = 220 bytes,

1 Gigabyte (GB) = 1024 MB = 230 Bytes.

Recently, in connection with the increase in the amount of processed information, such derived units as:

1 Terabyte (TB) = 1024 GB = 240 Bytes,

1 Petabyte (PB) = 1024 TB = 250 Bytes.

For a unit of information, one could choose the amount of information required to distinguish, for example, ten equally probable messages. It will not be binary (bit), but decimal ( dit) unit of information.

The amount of information contained in a message is determined by the amount of knowledge that this message carries to the person who receives it. A message contains information for a person if the information contained in it is new and understandable for this person, and, therefore, replenish his knowledge.

The information that a person receives can be considered a measure of reducing the uncertainty of knowledge. If some message leads to a decrease in the uncertainty of our knowledge, then we can say that such a message contains information.

As a unit of information, we take the amount of information that we get when the uncertainty is reduced by a factor of 2. Such a unit is named bit.

In a computer, information is presented in binary code or in machine language, the alphabet of which consists of two numbers (0 and 1). These numbers can be considered as two equally probable states. When writing one binary bit, the choice of one of two possible states (one of two digits) is realized and, therefore, one binary bit carries the amount of information in 1 bit. Two binary bits carry information 2 bits, three bits - 3 bits, etc.



Let us now pose the inverse problem and define: "How many different binary numbers N can be written using I binary digits?" With one binary digit, you can write 2 different numbers (N = 2 = 2 1), with two binary digits, you can write four binary numbers (N = 4 = 2 2), with three binary digits, you can write eight binary numbers (N = 8 = 2 3), etc.

In general, the number of different binary numbers can be determined by the formula

N - the number of possible events (equally probable) !!!;

In mathematics, there is a function that solves an exponential equation, this function is called the logarithm. The solution to this equation has the form:

If events equiprobable , then the amount of information is determined by this formula.

The amount of information for events with different probabilities determined by Shannon's formula :

,

where I is the amount of information;

N is the number of possible events;

P i - the probability of individual events.

Example 3.4

There are 32 balls in the lottery drum. How much information does the message contain about the first drawn number (for example, number 15 dropped out)?

Solution:

Since pulling out any of the 32 balls is equally probable, the amount of information about one drawn number is found from the equation: 2 I = 32.

But 32 = 2 5. Therefore, I = 5 bits. Obviously, the answer does not depend on which number came up.

Example 3.5

How many questions is it enough to ask your interlocutor to determine for sure the month in which he was born?

Solution:

Let's consider 12 months as 12 possible events. If you ask about a specific month of birth, then you may have to ask 11 questions (if the first 11 questions were answered negatively, then the 12th question is not necessary, since it will be correct).

It is more correct to ask "binary" questions, that is, questions that can only be answered "yes" or "no." For example, "Were you born in the second half of the year?" Each such question breaks the set of options into two subsets, one for the answer "yes" and the other for the answer "no."

The correct strategy is to ask questions so that the number of possible options is halved each time. Then the number of possible events in each of the obtained subsets will be the same and guessing them is equally probable. In this case, at each step, the answer ("yes" or "no") will carry the maximum amount of information (1 bit).

Using formula 2 and using a calculator, we get:

bit.

The number of bits of information received corresponds to the number of questions asked, but the number of questions cannot be a non-integer number. We round up to a larger integer and we get the answer: with the right strategy, you need to set no more than 4 questions.

Example 3.6

After the computer science exam that your friends took, the grades ("2", "3", "4" or "5") are announced. How much information will be conveyed by the assessment message for student A, who learned only half of the tickets, and the assessment message for student B, who learned all the tickets.

Solution:

Experience shows that for student A, all four assessments (events) are equally probable, and then the amount of information carried by the assessment message can be calculated using the formula (1):

Based on experience, we can also assume that for student B, the most probable grade is “5” (p 1 = 1/2), the probability of grade “4” is two times less (p 2 = 1/4), and the probability of grade “2 "And" 3 "are two times less (p 3 = p 4 = 1/8). Since the events are unequal, we will use formula 2 to calculate the amount of information in the message:

Calculations have shown that with equiprobable events, we get more information than with non-equiprobable events.

Example 3.7

The opaque bag contains 10 white, 20 red, 30 blue and 40 green balls. How much information will the visual message contain about the color of the removed ball.

Solution:

Since the number of balls of different colors is not the same, the probabilities of visual messages about the color of the ball taken out of the bag also differ and are equal to the number of balls of a given color divided by the total number of balls:

P b = 0.1; P k = 0.2; P c = 0.3; P s = 0.4.

Events are unequal, so to determine the amount of information contained in the message about the color of the ball, let's use formula 2:

You can use a calculator to calculate this expression containing logarithms. I "1.85 bits.

Example 3.8

Using Shannon's formula, it is easy enough to determine how many bits of information or binary digits are needed to encode 256 different characters. 256 different symbols can be considered as 256 different equiprobable states (events). In accordance with the probabilistic approach to measuring the amount of information, the required amount of information for binary encoding of 256 characters is equal to:

I = log 2 256 = 8 bits = 1 byte

Therefore, for binary encoding of 1 character, 1 byte of information or 8 binary bits is required.

How much information is contained, for example, in the text of the novel "War and Peace", in Raphael's frescoes, or in the human genetic code? Science does not give the answer to these questions and, in all likelihood, will not give soon. Is it possible to objectively measure the amount of information? The most important result of information theory is the following conclusion: "Under certain, very broad conditions, one can neglect the qualitative features of information, express its amount by number, and also compare the amount of information contained in various groups of data."

Currently, approaches to the definition of the concept of "amount of information" based on the fact that the information contained in the message can be loosely interpreted in the sense of its novelty or, otherwise, reducing the uncertainty of our knowledge about the object. These approaches use the mathematical concepts of probability and logarithm.

In 1928, the American engineer R. Hartley considered the process of obtaining information as the choice of one message from a finite in advance given set of N equiprobable messages, and the amount of information I contained in the selected message was determined as the binary logarithm of N.

Hartley's formula: I = log 2 N

Let's say you need to guess one number from a set of numbers from one to one hundred. Using the Hartley formula, you can calculate how much information is required for this: I = log 2 100  6.644. Thus, the message about the correctly guessed number contains the amount of information approximately equal to 6,644 units of information.

Here are the others examples of equiprobable messages:

when throwing a coin: "tails fell", "the eagle fell";

on the book page: "the number of letters is even", "the number of letters is odd".

Let us now define are messages equiprobable "the first woman to leave the door of the building" and "a man will be the first to leave the door of the building". It is impossible to answer this question unequivocally.... It all depends on what kind of building we are talking about. If this is, for example, a metro station, then the probability of getting out of the door first is the same for a man and a woman, and if it is a military barracks, then for a man this probability is much higher than for a woman.

For tasks of this kind, an American scientist Claude Shannon proposed in 1948 another formula for determining the amount of information, taking into account the possible unequal probability of messages in the set.

Shannon's formula: I = - (p 1 log 2 p 1 + p 2 log 2 p 2 +... + P N log 2 p N),
where p i- the probability that exactly i-th message is selected in the set of N messages.

It is easy to see that if the probabilities p 1, ..., p N are equal, then each of them is equal 1 / N, and Shannon's formula turns into Hartley's formula.

In addition to the two considered approaches to determining the amount of information, there are others. It is important to remember that any theoretical results apply only to a certain range of cases outlined by the initial assumptions.

As a unit of information, Claude Shannon proposed to take one bit (English. bit - bi nary digi t - binary digit).



Bit in information theory- the amount of information required to distinguish between two equally probable messages (such as "heads" - "tails", "even" - "odd", etc.).

In computing a bit is the smallest "portion" of computer memory required to store one of the two characters "0" and "1" used for the in-machine representation of data and commands.

Bit is too small a unit of measure. In practice, a larger unit is often used - byte equal to eight bits. It is exactly eight bits that are required in order to encode any of the 256 characters of the computer keyboard alphabet (256 = 2 8).

Also widely used are larger derived units of information:

1 kilobyte (KB) = 1024 bytes = 2 10 bytes,

1 Megabyte (MB) = 1024 KB = 2 20 bytes,

1 Gigabyte (GB) = 1024 MB = 2 30 bytes.

Recently, in connection with the increase in the amount of processed information, such derived units as:

1 Terabyte (TB) = 1024 GB = 2 40 bytes,

1 Petabyte (PB) = 1024 TB = 2 50 bytes.

For a unit of information, one could choose the amount of information required to distinguish, for example, ten equally probable messages. It will not be binary (bit), but decimal ( dit) unit of information.

What can you do with the information?

Information can be:

All these processes associated with certain operations with information are called information processes.

Properties of information.

Information properties:

reliability;

value;

timeliness; comprehensibility;

availability;

brevity;

Information is reliable if it reflects the true state of affairs... Inaccurate information can lead to misunderstandings or incorrect decisions.

Reliable information can become inaccurate over time, since it has the property become obsolete, that is ceases to reflect the true state of affairs.

Information is complete if it is sufficient for understanding and making decisions... Both incomplete and redundant information inhibits decision making or can lead to errors.

Accuracy of information is determined by the degree of its proximity to the real state of an object, process, phenomenon, etc.

The value of information depends on how important it is for solving the problem., as well as from the fact how much later it will find application in any types of human activity.

Only information received in a timely manner can bring the expected benefits... Equally undesirable as premature submission of information(when it cannot yet be assimilated), so it delay.

If valuable and timely information is expressed in an incomprehensible way, she can become useless.

Information becomes clear if it is expressed in the language spoken by those to whom the information is intended.

Information should be presented in an accessible(according to the level of perception) form. Therefore, the same questions are presented in different ways in school textbooks and scientific publications.

Information on the same issue can be summarized(succinctly, without irrelevant details) or at length(in detail, verbose). Brevity of information is necessary in reference books, encyclopedias, textbooks, all kinds of instructions.

Data processing.

Data processing- getting some information objects from other information objects by performing some algorithms.

Processing is one of the main operations performed on information, and the main means of increasing the volume and variety of information.

Information processing means are all kinds of devices and systems created by mankind, and first of all, a computer is a universal machine for information processing.

We have already mentioned that Hartley's formula is a special case of Shannon's formula for equiprobable alternatives.

Substituting into formula (1) instead of p i it (in the equiprobable case, independent of i) value, we get:

Thus, Hartley's formula looks very simple:

(2)

It clearly follows from it that the greater the number of alternatives ( N), the greater the uncertainty ( H). These quantities are related in formula (2) not linearly, but through the binary logarithm. Logarithm to base 2 and leads the number of options to the units of information measurement - bits.

Note that entropy will only be an integer if N is a power of 2, i.e. if N belongs to the series: {1, 2, 4, 8, 16, 32, 64, 128, 256, 512, 1024, 2048…}

Rice. 10. Dependence of entropy on the number of equally probable choices (equivalent alternatives).

Let us recall what a logarithm is.

Rice. 11. Finding the logarithm b by reason a is finding degree to which you want to build a, To obtain b.

Logarithm base 2 is called binary:

log 2 (8) = 3 => 2 3 = 8

log 2 (10) = 3.32 => 2 3.32 = 10

Logarithm base 10 - called decimal:

log 10 (100) = 2 => 10 2 = 100

Basic properties of the logarithm:

    log (1) = 0, because any number to the zero degree gives 1;

    log (a b) = b * log (a);

    log (a * b) = log (a) + log (b);

    log (a / b) = log (a) -log (b);

    log (1 / b) = 0-log (b) = - log (b).

To solve inverse problems, when the uncertainty is known ( H) or the amount of information obtained as a result of its removal ( I) and it is necessary to determine how many equally probable alternatives correspond to the occurrence of this uncertainty, use the inverse Hartley formula, which looks even simpler:

(3)

For example, if it is known that as a result of determining that Kolya Ivanov, of interest to us, lives on the second floor, 3 bits of information were received, then the number of floors in the house can be determined by formula (3), as N=2 3 = 8 floors.

If the question is as follows: “in a house with 8 floors, how much information did we receive when we learned that Kolya Ivanov, who interests us, lives on the second floor?”, You need to use formula (2): I= log 2 (8) = 3 bits.

    1. The amount of information received during the message process

So far, we have given formulas for calculating the entropy (uncertainty) H indicating that H they can be replaced with I because the amount of information received with complete withdrawaluncertainties some situation, quantitatively equal to the initial entropy of this situation.

But uncertainty can only be partially removed, so the amount of informationI received from some message is calculated as decrease in entropy resulting from obtaining given messages.

(4)

For the equiprobable case, using the Hartley formula to calculate the entropy, we get:

(5)

The second equality is derived from the properties of the logarithm. Thus, in the equiprobable case I depends on how many times the number of options under consideration (considered variety) has changed.

Based on (5), we can deduce the following:

If
, then
- complete removal of uncertainty, the amount of information received in the message is equal to the uncertainty that existed before the message was received.

If
, then
- the uncertainty has not changed, therefore, no information has been received.

If
, then
=>
, if
,
=>
... Those. the amount of information received will be a positive value if, as a result of receiving a message, the number of considered alternatives has decreased, and negative if it has increased.

If the number of considered alternatives as a result of receiving a message has halved, i.e.
, then I =log 2 (2) = 1 bit. In other words, receiving 1 bit of information excludes half of the equivalent options from consideration.

Consider an experiment with a 36-card deck as an example.

Rice. 12. Illustration for the experiment with a deck of 36 cards.

Have someone draw one card from the deck. We are interested in which of the 36 cards he took out. The initial uncertainty calculated by formula (2) is H= log 2 (36) 5.17 bit... The one who draws the card tells us some of the information. Using formula (5), we determine how much information we get from these messages:

OptionA... "Itkartared suit”.

I = log 2 (36/18) = log 2 (2) = 1 bit (half of the red cards in the deck, the uncertainty has decreased by 2 times).

OptionB... "Itkartapeak suit”.

I = log 2 (36/9) = log 2 (4) = 2 bits (spades make up a quarter of the deck, the uncertainty has decreased by 4 times).

Option C. "This is one of the highest cards: jack, queen, king or ace."

I = log 2 (36) –log 2 (16) = 5.17-4 = 1.17 bits (the uncertainty has decreased by more than two times, so the amount of information received is more than one bit).

OptionD... "This is one card from the deck."

I = log 2 (36/36) = log 2 (1) = 0 bits (uncertainty has not decreased - the message is not informative).

OptionD... “This is a ladypeak".

I = log 2 (36/1) = log 2 (36) = 5.17 bits (uncertainty completely removed).

    It is known a priori that the ball is in one of three urns: A, B or C. Determine how many bits of information the message contains that it is in urn B. Options: 1 bit, 1.58 bits, 2 bits, 2.25 bits.

    The probability of the first event is 0.5, and the probability of the second and third is 0.25. What is the information entropy for such a distribution? Options: 0.5 bit, 1 bit, 1.5 bit, 2 bit, 2.5 bit, 3 bit.

    Here is a list of the employees of a certain organization:

Determine the amount of information missing in order to fulfill the following requests:

    Please call Ivanov to the phone.

    I am interested in one of your employees, she was born in 1970.

    Which message carries more information:

    As a result of a coin toss (heads, tails), tails fell.

    The traffic lights (red, yellow, green) are now green.

The toss of the dice (1, 2, 3, 4, 5, 6) results in 3 points.

American engineer R. Hartley in 1928 the process of obtaining information was considered as the choice of one message from a finite predetermined set from N equally probable messages, and the amount of information I contained in the selected message was defined as the binary logarithm N .

Hartley's formula:

I = log2 N.

Let's say you need to guess one number from a set of numbers from one to one hundred. Using the Hartley formula, you can calculate how much information is required for this: I = log2100> 6.644. Thus, the message about the correctly guessed number contains the amount of information approximately equal to 6,644 units of information.

Here are the others examples of equiprobable messages:

1.when a coin is tossed: "Tails fell", "The eagle fell";

2.on the book page: "The number of letters is even", "The number of letters is odd".

Let us now define are messages equiprobable "The woman will be the first to leave the door of the building" and "A man will be the first to leave the door of the building". It is impossible to answer this question unequivocally.... It all depends on what kind of building we are talking about. If this is, for example, a metro station, then the probability of getting out of the door first is the same for a man and a woman, and if it is a military barracks, then for a man this probability is much higher than for a woman.

For tasks of this kind, an American scientist Claude Shannon proposed in 1948 another formula for determining the amount of information, taking into account the possible unequal probability of messages in the set.

Shannon's formula:

I = - ( p 1log2 p 1 + p 2 log2 p 2 +... + p N log2 pN),


where pi- the probability that exactly i-th message is selected in the set of N messages.

It is easy to see that if the probabilities p 1, ...,pN are equal, then each of them is equal to 1 / N, and Shannon's formula turns into Hartley's formula.

Claude Shannon identified information , how removed uncertainty ... More precisely, obtaining information is a necessary condition for removing uncertainty. Uncertainty arises in a situation of choice. The task that is solved in the course of removing the uncertainty is a decrease in the number of options under consideration (decrease in diversity), and, as a result, the choice of one option corresponding to the situation from among the possible. Removing uncertainty enables you to make informed decisions and take action. This is the guiding role of information.

Imagine walking into a store and asking them to sell you some chewing gum. A saleswoman with, say, 16 brands of chewing gum is in a state of uncertainty. She cannot fulfill your request without additional information. If you specified, say, "Orbit", and the saleswoman is now considering only 8 of the 16 initial options, you have halved her uncertainty (looking ahead, let's say that halving the uncertainty corresponds to receiving 1 bit of information ). If you, without further ado, simply pointed your finger at the window - “this one!”, Then the uncertainty was completely removed. Again, looking ahead, let's say that with this gesture in this example, you told the saleswoman 4 bits of information.

Situation maximum uncertainty suggests several equiprobable alternatives (options), i.e. neither option is preferred. Moreover, the more equally probable options observed, the greater the uncertainty, the more difficult it is to make an unambiguous choice and the more information is required for that, get it. For N variants, this situation is described by the following probability distribution: (1 / N,1/ N, …,1/ N} .

The minimum uncertainty is 0, i.e. this situation complete certainty , which means that the choice has been made and all the necessary information has been received. The probability distribution for a situation of complete certainty looks like this: (1, 0,… 0).

A quantity characterizing the amount of uncertainty in information theory is denoted by the symbol H and has a name entropy , more precisely information entropy .

Entropy ( H) – measure of uncertainty , expressed in bits. Entropy can also be viewed as measure of uniformity of distribution random variable.

Rice. 3.4 Behavior of entropy for the case of two alternatives

In fig. 3.4 shows the behavior of entropy for the case of two alternatives, with a change in the ratio of their probabilities ( P, (1-P)).

The entropy reaches its maximum value in this case when both probabilities are equal to each other and equal to 1/2, the zero value of entropy corresponds to the cases ( P 0=0, P 1 = 1) and ( P 0=1, P 1=0).

Amount of information I and entropy H characterize the same situation, but from qualitatively opposite sides. I is the amount of information required to remove H. As defined by Leon Brillouin information is negative entropy(negentropy) .

When the uncertainty is completely removed, the amount of information received I equal to pre-existing uncertainty H.

With partial removal of uncertainty, the amount of information received and the remaining unrevealed uncertainty add up to the original uncertainty. Ht + It = H(fig. 3.5).

Rice. 3.5 The relationship between entropy and the amount of information

For this reason, the formulas that will be presented below for calculating the entropy H are also formulas for calculating the amount of information I, i.e. when it comes to complete removal of uncertainty, H in them can be replaced by I.

In general, entropy H and the amount of information obtained as a result of removing the uncertainty I depend on the initial number of options under consideration N and a priori probabilities of the realization of each of them P:{p 0,p 1, …,pN- 1), i.e. H = F(N,P). The entropy calculation in this case is performed by Shannon's formula , proposed by him in 1948 in the article "Mathematical theory of communication".

In a particular case when all options equiprobable, there remains a dependence only on the number of considered options, i.e. H = F(N). In this case, Shannon's formula is greatly simplified and coincides with by the Hartley formula , which was first proposed by the American engineer Ralph Hartley in 1928, i.e. 20 years earlier.

Shannon's formula is as follows:

The minus sign in formula (2.1) does not mean that the entropy is negative. This is explained by the fact that pi£ 1 by definition, and the logarithm of a number less than one is negative. By the property of the logarithm, therefore, this formula can be written in the second version, without a minus in front of the sum sign.

The expression is interpreted as a private amount of information It obtained in case of implementation i-th option. The entropy in Shannon's formula is the average characteristic - the mathematical expectation of the distribution of a random variable ( I 0,I 1, …,I N- 1} .

Here is an example of calculating the entropy using Shannon's formula. Let in some institution the composition of employees is distributed as follows: 3/4 - women, 1/4 - men. Then the uncertainty, for example, as to who you meet first, entering the institution, will be calculated by a series of actions shown in table. 3.1.

Table 3.1

pi 1/pi Ii = log2 (1 / pi),bit pi * log2 (1 / pi),bit
F 3/4 4/3 log2 (4/3) = 0.42 3/4 * 0,42=0,31
M 1/4 4/1 log2 (4) = 2 1/4 * 2=0,5
å H = 0,81bit

We have already mentioned that Hartley's formula is a special case of Shannon's formula for equiprobable alternatives.

Substituting into formula (2.1) instead of pi it (in the equiprobable case, independent of i) value, we get:

Thus, Hartley's formula looks very simple:

It clearly follows from it that the greater the number of alternatives ( N), the greater the uncertainty ( H). Taking logarithms to base 2 brings the number of options to information units - bits. Figure 3.6 shows the dependence of entropy on the number of equally probable choices.

Rice. 3.6 Dependence of entropy on the number of equally probable choices (equivalent alternatives)

To solve inverse problems, when the uncertainty is known ( H) or the amount of information obtained as a result of its removal ( I) and it is necessary to determine how many equally probable alternatives correspond to the occurrence of this uncertainty, use the inverse Hartley formula, which looks even simpler:

For example, if it is known that as a result of determining that Kolya Ivanov, of interest to us, lives on the second floor, 3 bits of information were received, then the number of floors in the house can be determined by formula (2.3), as N = 23= 8 floors.

If the question is like this: “There are 8 floors in a house, how much information did we receive when we learned that Kolya Ivanov, who interests us, lives on the second floor?”, You need to use formula (2.2): I = log2 (8) = 3 bits.

So far, we have given formulas for calculating the entropy (uncertainty) H indicating that H they can be replaced with I because the amount of information received with complete removal of uncertainty some situation, quantitatively equal to the initial entropy of this situation.

But uncertainty can only be partially removed, so the amount of information I received from some message is calculated as decrease in entropy resulting from obtaining given messages.

For the equiprobable case, using the Hartley formula to calculate the entropy, we get:

The second equality is derived from the properties of the logarithm. Thus, in the equiprobable case I depends on how many times the number of options under consideration (considered variety) has changed.

Based on (3.5), we can deduce the following:

If, then - complete removal of uncertainty, the amount of information received in the message is equal to the uncertainty that existed before the message was received.

If, then - the uncertainty has not changed, therefore, no information was received.

If, then =>,

if, then =>.

Those. the amount of information received will be a positive value if, as a result of receiving a message, the number of considered alternatives has decreased, and negative if it has increased.

If the number of considered alternatives as a result of receiving a message has halved, i.e., then I= log2 (2) = 1 bit. In other words, receiving 1 bit of information excludes from consideration half of the equivalent options.

Consider as an example an experiment with a deck of 36 cards (Figure 3.7).

Rice. 3.7 Illustration for the experiment with a deck of 36 cards

Have someone draw one card from the deck. We are interested in which of the 36 cards he took out. The initial uncertainty calculated by formula (3.2) is H = log2 (36) @ 5.17 bit... The one who draws the card tells us some of the information. Using formula (3.5), we determine how much information we get from these messages:

Option A. “This is a red card”.

I= log2 (36/18) = log2 (2) = 1bit (half of the red cards in the deck, the uncertainty has decreased by 2 times).

Option B. “This is a card of spades”.

I= log2 (36/9) = log2 (4) = 2 bits (spades make up a quarter of the deck, the uncertainty has decreased by 4 times).

Option C. "This is one of the highest cards: jack, queen, king or ace."

I= log2 (36) –log2 (16) = 5.17-4 = 1.17 bits (the uncertainty has decreased by more than two times, so the received amount of information is more than one bit).

Option D. "This is one card from the deck."

I= log2 (36/36) = log2 (1) = 0 bits (uncertainty has not decreased - the message is not informative).

Option E. “This is the Queen of Spades”.

I= log2 (36/1) = log2 (36) = 5.17 bits (the uncertainty is completely removed).

Objective 1. How much information will the visual message contain about the color of the removed balloon if there are 50 white, 25 red, 25 blue balls in an opaque bag?

Solution.

1) total balls 50 + 25 + 25 = 100

2) the probability of balls 50/100 = 1/2, 25/100 = 1/4, 25/100 = 1/4

3)I= - (1/2 log21 / 2 + 1/4 log21 / 4 + 1/4 log21 / 4) = - (1/2 (0-1) +1/4 (0-2) +1/4 (0 -2)) = = 1.5 bits

Objective 2. The basket contains 16 balls of different colors. How much information does the message say that you got the white ball?

Solution... Because N = 16 balls, then I = log2 N = log2 16 = 4 bits.

Objective 3. The basket contains black and white balls. Among them are 18 black balls. The message that the white ball has been pulled carries 2 bits of information. How many balls are there in the basket?

1) 18 2) 24 3) 36 4)48

Solution... Let us find by Shannon's formula the probability of getting a white ball: log2N = 2, N = 4, therefore, the probability of getting a white ball is 1/4 (25%), and the probability of getting a black ball is 3/4 (75%), respectively. If 75% of all balls are black, their number is 18, then 25% of all balls are white, their number is (18 * 25) / 75 = 6.

It remains to find the number of all balls in the basket 18 + 6 = 24.

Answer: 24 balls.

Task 4. In some country, a 5-character license plate is composed of capital letters (30 letters in total) and decimal digits in any order. Each character is encoded with the same and minimum possible number of bits, and each number - with the same and minimum possible number of bytes. Determine the amount of memory required to store 50 license plates.

1) 100 bytes 2) 150 bytes 3) 200 bytes 4) 250 bytes

Solution... The number of characters used to encode the number is: 30 letters + 10 digits = 40 characters. The amount of information carrying one character is 6 bits (2I = 40, but the amount of information cannot be a fractional number, so we take the closest power of two to a large number of characters 26 = 64).

We found the amount of information contained in each character, the number of characters in the number is 5, therefore, 5 * 6 = 30 bits. Each number is equal to 30 bits of information, but according to the condition of the problem, each number is encoded with the same and minimum possible number of bytes, therefore, we need to find out how many bytes are in 30 bits. If we divide 30 by 8, we get a fractional number, and we need to find an integer number of bytes for each number, so we find the nearest factor of 8 that will exceed the number of bits, this is 4 (8 * 4 = 32). Each number is encoded with 4 bytes.

To store 50 license plates you will need: 4 * 50 = 200 bytes.

The choice of the optimal strategy in the game "Guess the number". The choice of the optimal strategy in the game "Guess the Number" is based on obtaining the maximum amount of information, in which the first participant guesses an integer (for example, 3) from a given interval (for example, from 1 to 16), and the second must "guess" the conceived number. If we consider this game from an informational point of view, then the initial uncertainty of knowledge for the second participant is 16 possible events (variants of the conceived numbers).

With an optimal strategy, the interval of numbers should always be halved, then the number of possible events (numbers) in each of the obtained intervals will be the same and guessing the intervals is equally probable. In this case, at each step, the answer of the first player ("Yes" or "No") will carry the maximum amount of information (1 bit).

As you can see from the table. 1.1, guessing the number 3 took place in four steps, at each of which the uncertainty of the knowledge of the second participant was halved due to the receipt of a message from the first participant containing 1 bit of information. Thus, the amount of information required to guess one of the 16 numbers was 4 bits.

Control questions and tasks

1. It is known a priori that the ball is in one of three urns: A, B or C. Determine how many bits of information the message contains that it is in urn B.

The options are: 1bit, 1,58bit, 2bit, 2,25bit.

2. The probability of the first event is 0.5, and the probability of the second and third is 0.25. What is the information entropy for such a distribution? The options are: 0,5bit, 1 bit, 1,5bit, 2bit, 2,5bit, 3bit.

3. Here is a list of the employees of a certain organization:

Determine the amount of information missing in order to fulfill the following requests:

Please call Ivanov to the phone.

I am interested in one of your employees, she was born in 1970.

4. Which of the messages carries more information:

· As a result of a coin toss (heads, tails), tails fell.

· The traffic light (red, yellow, green) is now green.

· The toss of a die (1, 2, 3, 4, 5, 6) results in 3 points.