Hash and hash algorithm introduction

When I talk to the blockchain, I hear “hash”, “hash function”, and “hash algorithm”. Is it really confused? Don't worry, this tells us what is the hash algorithm.

1

Hash is an encryption algorithm

Hash functions, also known as hash functions or hash functions. The hash function is a public function that can map an M message of any length into a short length and fixed value H(M). Let H(M) be a hash value, a hash value, and a hash. Value or Message Digest. It is a one-way cryptosystem, which is an irreversible mapping from plaintext to ciphertext. There is only encryption and no decryption.

Its function expression is: h=H(m)

No matter what number format the input is, how large the file is, the output is a fixed-length bit string. Take the Sh256 algorithm used by Bitcoin as an example. The output is 256 bits regardless of the input data file.

Each bit is a bit of 0 or 1, 256bit is 256 0 or 1 binary digit string, with hexadecimal numbers, how many bits?

16 is equal to 2 to the 4th power, so each hexadecimal digit can represent 4 bits. Then, the 256-bit bit is represented by a hexadecimal number, of course, 256 divided by 4 equals 64 bits.

So the hash you usually see is like this:

00740f40257a13bf03b40f54a9fe398c79a664bb21cfa2870ab07888b21eeba8.

This is a hash value copied from btc.com casually. If you don't trust, you can count it. Is it 64-bit?

2

Hash function features

The Hash function has the following features.

Easy compression: For any size of input x, the length of the hash value is very small. In practical applications, the hash value generated by the function H is fixed in length.

Easy calculation: For any given message, it is easier to calculate its hash value.

Unidirectionality: For a given Hash value, it is computationally infeasible to find it, ie to find the inverse of Hash is difficult. Given a certain hash function H and hash value H(M), it is computationally infeasible to derive M. That is, the input original value cannot be pushed backward from the hash output. This is the basis of the hash function security.

Anti-collision: The ideal Hash function is collision-free, but it is difficult to do this in the design of actual algorithms.

There are two kinds of anti-collision: one is weak anti-collision, that is, for a given message, it is computationally infeasible to find another message; the other is strong collision resistance, ie for any one For different messages, it is also computationally infeasible.

High sensitivity: This is based on the bit position, which means that a 1-bit input change will cause a 1/2 bit change. Any change in message M will cause the hash value H(M) to change. That is, if the input is slightly different, the output after the hash operation will be different.

3

Hash algorithm

Convert the URL A into a number 1. URL B, converted to number 2.

A web site X is converted into a number N. According to the number N as a subscript, the web site X information can be quickly found. The process of this conversion is the hash algorithm.

For example, here there are 10,000 songs, give you a new song X, and ask you to confirm if the song is within that 10,000 songs.

Undoubtedly, it will be very slow to compare 10,000 songs one by one. But if there is a way to condense each piece of 10,000 song data into a number (called a hash code) and get 10,000 digits, then use the same algorithm to calculate the new song X code, If you look at whether the song X code is in the previous ten thousand digits, you can know if the song X is in that 10,000 song.

As an example, if you want to organize the 10,000 songs, a simple hashing algorithm is to make the number of bytes on the disk occupied by the songs as a hash code. In this case, you can make 10,000 songs “order by size” and then meet a new song. Just look at whether the number of bytes in the new song is equal to one of the 10,000 songs already in the song. With the same number of bytes, you know if the new song is within that 10,000 songs.

A reliable hash algorithm should satisfy:

For a given data M, it is easy to calculate the hash value X = F(M);

According to X, it is difficult to inversely calculate M;

It is difficult to find M and N such that F(N)=F(M)

As mentioned earlier, the hash function is anti-collision. Collision refers to the fact that someone finds a odd and even pair to make the hash result consistent, but this is computationally infeasible.

First of all, the message of a large space is compressed into a small space, and the collision must exist. Assume that the hash length is fixed at 256 bits. If the order is 1, 2, ... 2^256+1, the 2^256+1 input values ​​are calculated one by one, and it is certain that two input values ​​can be found. Has the same hash value. But don't be too happy because you have time to figure it out. It's yours.

According to the birthday paradox, if 2^128+1 inputs are randomly selected, there is a 99.8% probability of finding at least one pair of collision inputs. So for a hash function with a hash length of 256 bits, an average of 2^128 hash calculations are needed to find the collision pair. If the computer performs 10000 hash calculations per second, it takes about 10^27 years to complete 2^128 hash calculations. In the blockchain, the anti-collision property of the hash function is used to verify the integrity of the block and transaction and can be identified once it is tampered with.

As mentioned earlier, mining requires the miner to continuously calculate a number less than a given difficulty level using a random number. Difficulty is an important reference indicator for miners mining. It determines how many times a miner needs to perform a hash to produce a legitimate block. Bitcoin blocks are generated approximately every 10 minutes. In order to keep the new block at this rate, the difficulty value must be adjusted according to the changes in the total network power.

The hash function adjusts the difficulty value to ensure that each block is dug out for approximately 10 minutes. The difficulty value calculated by the hash function is of great significance to the security of the blockchain system. Just as several computer scientists in the United States wrote in their co-authored book: "The hash code is the Swiss army knife in cryptography. They find a place in many unique applications. To ensure security, different applications. Different hash function characteristics may be required. The facts have proved that it is extremely difficult to determine a series of hash functions to fully achieve documented security."

The proof of work needs to have a target value. The formula for calculating the target value of bitcoin workload proof is as follows:

Target = maximum target / difficulty

Among them, the maximum target value is a constant value:

0x00000000FFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFFF

The target value is inversely proportional to the difficulty value. The achievement of the bitcoin workload proof is that the miner's calculated block hash must be smaller than the target value.

We can also simply understand that the proof of bitcoin workload is to perform a SHA256 hash operation by constantly changing the block header (ie, trying different random values) to find the hash value of a specific format ( That is, there must be a certain number of leading 0). The greater the number of leading 0s required, the more difficult it is to represent.

4

Give a chestnut to help understand

A scene, a small star and dumb in the basketball court

Xiaoxing: Dumb, are you thirsty? Do you want to buy water for drinking? By the way, help me to buy a bottle of Kazakhstan.

Dumb: Oh, I don't know what you think carefully. You are thirsty. You go, I won't go.

Xiaoxing: Hey, let's not use these to be useless, to toss a coin. OK, OK, you're going positive, but I'm going, fair, how?

Dumb: OK.

.........

▌ Scene two, small star and dumb instant chat

Dumb: Xiaoxing, I came to my house to play today. There is a pizzeria on the way. It's very tasty. Take a bit of a ha.

Xiaoxing: Oh, if you don't want to come to my house to play, you will bring pizza on the way.

Dumb: Xiaoxing, you even said that, it seems that only a coin can be solved.

Xiaoxing: Hey, how can this be thrown? How do I know if you're not kidding?

Dumb: Well, that's it, or else.

1. Consider encrypting the result

Dumb: I want a number in my mind, assuming A, then A is multiplied by a number B to get the result C. A is my key, I tell you the result C. You guess whether A is an odd or even number, guess it, and you win.

Xiaoxing: This is not enough. If you tell me that C is 12, I guess A is odd. You can say that A is 4 and B is 3. I guess A is an even number. You can say that A is 3 and B is 4. Or do you tell me how much C is, and tell me what B is.

Dumb: That's not. Telling you C and B doesn't mean telling you how much A is, but also guessing ass. No way to change it.

2. Irreversible encryption

Dumb: Little Star, you see this can not, I want an A, after the following process:

1. A+123=B2.B^2=C3. Take the 2nd to 4th digits of C to form a 3-digit D4.D/12 result, and get the remainder.

Dumb: I told you both E and the above calculations. You guessed whether A is an odd or even number. Then I told you what A is. You can use the above calculation to verify if I have a lie.

Xiaoxing: Well, I think about it. If you think that A is 5, then:

5+123=128128^2=16384D=638 E=638mod12=53

(mod represents the remainder of the division)

Xiaoxing: Well, it's awesome. An A value corresponds to a unique E value, and it cannot be calculated based on E. You're so embarrassed, okay, this is fair, and whoever tells a lie can be identified.

Xiaoxing: Dumb, you have a question...

This type of encryption that loses part of the information is called "one-way encryption," also called hashing.

5

Common hashing algorithms

1, SHA-1 algorithm

The SHA-1 input is a message with a maximum length of less than 264 bits. The input message is processed in 512-bit packets and the output is a 160-bit message digest. SHA-1 has the advantages of high speed, easy implementation, and wide application range. The algorithm is described as follows.

Fill in the input message: After filling, the length of the message modulo 512 should be equal to 448. The filling method is that the first digit is 1, and the remaining digits are 0. Then the length before the message is filled is appended in the big-endian manner to the last 64 bits left in the previous step. This step is necessary even if the length of the message is already the desired length. The length of the padding ranges from 1 to 512.

Initialize the buffer: You can use 160 bits to store the initial variables, intermediate summaries, and final summaries of the Hash function, but first you must initialize them and assign values ​​to each 32-bit initial variable, that is:

Enter the message processing main loop and process the message block: process 512-bit message blocks at a time, perform a total of 4 rounds of processing, perform 20 operations per round, as shown in the figure. The four rounds of processing have a similar structure, but each round uses different helper functions and constants. Each round of input is the current value of the currently processed message packet and buffer A, B, C, D, E, and the output is still placed in the buffer to replace the old A, B, C, D, E values. The output of the fourth round is then added to the input CVq of the first round to generate CVq+1, where the addition is to add each word in the buffer CVq to the corresponding character pattern 232 in the buffer.

Figure 512-bit message block processing flow

Output: After all message packets have been processed, the output of the last packet is the resulting message digest value.

The SHA-1 step function is shown in the figure. It is the most important function of SHA-1 and the most critical part of SHA-1.

Figure SHA-1 step function

Each time SHA-1 runs a step function, the values ​​of A, B, C, and D are assigned sequentially to registers B, C, D, and E. At the same time, the input values, constants, and sub-message blocks of A, B, C, D, and E are assigned to A after the operation of the step function.

Among them, t is the number of steps, 0 ≤ t ≤ 79, Wt is a 32-bit word derived from the current 512-bit long packet, Kt is the addition constant.

The input to the basic logic function f is three 32-bit words, the output is a 32-bit word, and its function is expressed as follows.

For each input packet derived message packet wt, the first 16 message words wt (0 ≤ t ≤ 15) are the 16 32-bit words corresponding to the message input packet, and the rest of the wt (0 ≤ t ≤ 79) can be calculated as get:

Among them, ROTLs represent left-circular shift s bits, as shown.

Graph SHA-1's 80 Message Words Generated

2, SHA-2 algorithm

SHA-2 series Hash algorithm, the output length can be taken SHA-2 series hash algorithm output length can take 224, 256, 384, 512, corresponding to SHA-224, SHA-256, SHA-384, SHA- 512. It also contains two other algorithms: SHA-512/224, SHA-512/256. Has stronger security strength and more flexible output length than the previous Hash algorithm, where SHA-256 is a commonly used algorithm. The following four algorithms will be briefly described below.

SHA-256 algorithm

The input to the SHA-256 algorithm is a message with a maximum length of less than 264 bits, the output is a 256-bit message digest, and the input message is processed in units of 512-bit packets. The algorithm is described below.

(1) Filling of messages

Adding a "1" and a number of "0" makes its length modulo 512 and 448 congruent. A 64-bit length block is appended to the message, and the value is the length of the pre-fill message. This results in a message packet that is an integer multiple of 512, and the length of the message after it is filled is at most 264 bits.

(2) Initialize link variables

The intermediate and final results of the link variables are stored in a 256-bit buffer. The buffer is represented by eight 32-bit registers A, B, C, D, E, F, G, and H. The output is still in the buffer. Instead of the old A, B, C, D, E, F, G, H. The link variables are initialized first. The initial link variables are stored in eight registers A, B, C, D, E, F, G, and H:

The initial link variable is the first 32 bits of the binary representation of the fractional part taken from the square root of the first 8 prime numbers (2, 3, 5, 7, 11, 13, 17, 19).

(3) Processing the main loop module

The message block is processed in units of 512-bit packets and a 64-step loop operation is performed (as shown in the figure). Each round of input is the value of the currently processed message packet and the 256-bit buffers A, B, C, D, E, F, G, H that were obtained in the previous round of output. Different message words and constants are used in each step, and the methods for obtaining them are given below.

Figure SHA-256 compression function

(4) Get the final Hash value

After all 512-bit message block packets have been processed, the final packet processing result is the final 256-bit message digest.

The step function is the most important function in SHA-256 and is the most critical part in SHA-256. The calculation process is as shown in the figure.

Figure SHA-256 step function

According to the values ​​of T1 and T2, the registers A and E are updated. The input values ​​of A, B, C, E, F, and G are assigned to B, C, D, F, G, and H in order.

Kt is obtained by taking the fractional part of the cube root of the first 64 prime numbers (2,3,5,7,...), converting it to binary, and taking the first 64 bits of the 64 numbers as Kt. Its role is to provide a 64-bit random string set to eliminate any regularity in the input data.

For the message packet Wt derived for each input packet, the first 16 message words Wt (0 ≤ t ≤ 15) directly follow the message input packet to the corresponding 16 32-bit words, and the others are calculated according to the following formula:

Graph SHA-256 64 message word generation process

SHA-512 algorithm

SHA-512 is an algorithm with higher security performance in SHA-2. It is mainly composed of plaintext filling, message expansion function transformation and random number transformation. The initial value and intermediate calculation result are composed of 8 64-bit shift registers. The algorithm allows the maximum input length to be 2128 bits and produces a 512-bit message digest. The input message is divided into 1024-bit blocks for processing. The specific parameters are: the message digest length is 512 bits; the message length is less than 2128 bits; The message block size is 1024 bits; the message word size is 64 bits; the number of steps is 80 steps. The following figure shows the entire process of processing messages and outputting message digests. The specific steps of this process are as follows.

The overall structure of Figure SHA-512

Message Fill: Fill in a "1" and several "0"s to make the length modulo 1024 and 896 congruent and the number of padding bits is 0-1023. The length of the message before the fill is appended to the fill message with a 128-bit field. The value is the length of the pre-fill message.

Initialization of link variables: The intermediate and final results of link variables are stored in a 512-bit buffer, which is represented by eight 64-bit registers A, B, C, D, E, F, G, and H. The initial link variables are also stored in eight registers A, B, C, D, E, F, G, and H. Their values ​​are:

The initial link variables are stored in big-endian mode, ie the most significant byte of the word is stored in the lower address location. The initial link variable is taken from the first 64 bits of the binary representation of the fractional part of the square root of the first 8 prime numbers.

Main loop operation: Messages are processed in units of 1024-bit packets, and 80-step loop operations are performed. Each iteration takes the values ​​of 512-bit buffers A, B, C, D, E, F, G, and H as input, and the values ​​are taken from the results of the last iterative compression. Each step uses a different calculation. Message words and constants.

Calculate the final Hash value: After all N 1024-bit packets of the message are processed, the 512-bit link variable that is compressed and output in the Nth iteration is the final Hash value.

The step function is the most critical part of SHA-512 and its operation is similar to SHA-256. The calculation equation for each step is shown below. The updated values ​​of B, C, D, F, G, and H are the input state values ​​of A, B, C, E, F, and G, respectively, and two temporary variables are generated for update at the same time. A, E register.

For each step t of the 80-step operation, a 64-bit message word Wt is used, the value of which is derived from the currently processed 1024-bit message packet Mi, and the export method is as shown. The first 16 message words Wt (0 ≤ t ≤ 15) correspond to the 16 32-bit words after the message input packet, and the others are calculated according to the following formula:

Graph SHA-512 80 message word generation process

among them,

In the equation, ROTRn(X) indicates that the 64-bit variable x is rotated right by n bits, and SHRn(X) indicates that the 64-bit variable x is shifted right by n bits.

From the figure, we can see that in the first 16 steps, the value of Wt is equal to the corresponding 64-bit word in the message group. In the remaining 64 steps, the value is calculated from the previous 4 values, 4 Two of the values ​​are shifted and rotated.

Kt is obtained by taking the fractional part of the cube root of the first 80 prime numbers (2,3,5,7,...), converting it to binary, and then taking the first 64 bits of the 80 numbers as Kt. Its role is to provide 64-bit random string set to eliminate any regularity in the input data.

SHA-224 and SHA-384

SHA-256 and SHA-512 are very new hash functions. The former defines a word as 32 bits and the latter defines a word as 64 bits. In fact, the structure of the two is the same, but there are differences in the number of loop operations, the use of constants. SHA-224 and SHA-384 are truncated versions of the aforementioned two hash functions. They use different initial values ​​for calculation.

The input message length of SHA-224 is the same as that of SHA-256, which is also less than 264 bits. Its packet size is also 512 bits. The processing flow is basically the same as that of SHA-256, but there are two different places as follows.

The message digest of SHA-224 is taken from the bit words of 7 registers of A, B, C, D, E, F, and G, while the message digest of SHA-256 is taken from A, B, C, D, E, F, G, H 32-bit words with 8 registers in total.

SHA-224's initial link variable is different from SHA-256's initial link variable. It is stored in high-end format, but its initial link variable is obtained by taking the first 9 to 16 prime numbers (23, 29, 31, 37, 41). The second 32-bit part of the binary representation of the square root of the decimal part of , 43, 47, 53), SHA-224's initial link variable is as follows:

The detailed calculation procedure for SHA-224 is the same as SHA-256.

The input message length of SHA-384 is the same as that of SHA-512, which is also less than 2128 bits. The packet size is also 1024 bits. The processing flow is basically the same as that of SHA-512, but there are also two different places as follows.

The 384-bit message digest for SHA-384 is taken from six 64-bit words A, B, C, D, E, and F, while the message digest for SHA-512 is taken from A, B, C, D, E, F, G, H total 8 64-bit words.

SHA-384's initial link variable is different from SHA-512's initial link variable. It is also stored in high-end format, but its initial link variable is obtained by taking the first 9 to 16 prime numbers (23, 29, 31, 37, 41). The first 64 bits of the binary representation of the square root of the square root of 43, 43, 47, 53). The initial link variables for SHA-384 are as follows:

The detailed calculation procedure for SHA-384 is the same as for SHA-512.

3, SHA-3 algorithm

The SHA-3 algorithm adopts the Sponge structure as a whole and is divided into two stages: absorption and extraction. The SHA-3 core substitution f acts on a 5 x 5 x 64 three-dimensional matrix. There are 24 rounds of the entire f, each round consists of 5 links θ, ρ, π, χ, τ. The five links of the algorithm act on different dimensions of the three-dimensional matrix. The θ link is a linear operation that acts on a column; the ρ link is a linear operation that acts on each track, and it performs a cyclic shift operation on 64 bits on each track; and π links moves the elements on each track collectively to another track. The linear operation; χ link is a non-linear operation on each row, equivalent to the replacement of 5 bits in each row with another 5 bits; τ link is the addition constant link.

At present, the security analysis of the SHA-3 algorithm in open literature is mainly developed from the following aspects.

Collision attacks, preimage attacks, and second preimage attacks on the SHA-3 algorithm.

For the analysis of the core replacement of SHA-3 algorithm, this type of analysis mainly focuses on the distinction between algorithm permutation and random permutation.

The differential properties of the SHA-3 algorithm are developed. The main focus is on the SHA-3 high-probability difference chain, and a difference discriminator is constructed.

The stereoscopic encryption idea and sponge structure of the Keccak algorithm make SHA-3 superior to SHA-2 and even AES. The Sponge function creates a mapping from an arbitrary length input to an arbitrary length output.

4, RIPEMD160 algorithm

RIPEMD (RACE Integrity Primitives Evaluation Message Digest), that is, RACE original integrity check message digest. RIPEMD uses the design principle of MD4 and improves on the algorithm defect of MD4. In 1996, it released the RIPEMD-128 version for the first time. It is similar in performance to SHA-1.

RIPEMD-160 is an improvement over RIPEMD-128 and is the most common version in RIPEMD. The RIPEMD-160 outputs a 160-bit Hash value. The violent collision search attack on 160 Hash functions requires 280 calculations, and its computational intensity is greatly improved. The design of RIPEMD-160 fully absorbs some of the performance of MD4, MD5, and RIPEMD-128, making it more resistant to strong collisions. It is intended to replace the 128-bit hash functions MD4, MD5, and RIPEMD.

The RIPEMD-160 uses a 160-bit buffer to store the intermediate results of the algorithm and the final hash value. This buffer consists of five 32-bit registers A, B, C, D, and E. The initial value of the register is as follows:

The data is stored in the form of lower byte at the lower address.

The core of the processing algorithm is a compression function module with 10 loops, where each loop consists of 16 processing steps. Using different primitive logic functions in each cycle, the algorithm is handled in two different cases. In both cases, five original logic functions are used in reverse order. Each cycle gets the new value with the currently grouped message word and the 160-bit buffer values ​​A, B, C, D, E as input. Each cycle uses an extra constant, after the last cycle, the results of the two cases A, B, C, D, E and A', B', C', D', E' and the link variables The initial value is added after one addition to produce the final output. After all 512-bit packets have been processed, the resulting 160-bit output is the message digest.

In addition to the 128-bit and 160-bit versions, the RIPEMD algorithm also exists in 256-bit and 320-bit versions, which together form the four members of the RIPEMD family: RIPEMD-128, RIPEMD-160, RIPEMD-256, and RIPEMD-320. The 128-bit version of the security has been questioned, 256-bit and 320-bit versions reduce the possibility of accidental collisions, but compared to RIPEMD-128 and RIPEMD-160, they do not have a higher level of security, because they only On the basis of 128 bits and 160 bits, the initial parameters and s-box were modified to achieve the output of 256 bits and 320 bits.

Cummins 201-400KW Diesel Generator

Cummins 201-400Kw Diesel Generator,Cummins Genset,Cummins Diesel Genset,Cummins Engine Diesel Generator

Shanghai Kosta Electric Co., Ltd. , https://www.ksdpower.com