Author: Gimer Cervera, Ethereum smart contract developer Translation: Shan Ouba, Golden Finance< /p>
Introduction
This article introduces a series of articles that provide an in-depth look at the Ethereum Virtual Machine (EVM) and Solidity Assembly for smart contract optimization and security.
The Ethereum Virtual Machine (EVM) is the core component of the Ethereum network. EVM is a software that allows the deployment and execution of smart contracts written in high-level languages such as Solidity. After the contract is written, it is compiled into bytecode and deployed to the EVM. EVM runs on every node on the Ethereum network.
Solidity Assembly is a low-level programming language that allows developers to write code at a level closer to the EVM itself. It provides more granular control over smart contract execution, allowing for optimizations and customizations not possible through higher-level Solidity code alone.
The language used for inline assembly in Solidity is called Yul. The programming language acts as an intermediary for compilation into EVM bytecode. It is designed as a low-level language that enables developers to have more fine-grained control over the execution of smart contracts. It can be used in standalone mode or inline assembly in Solidity. Yul is designed as a low-level stack-based language that enables developers to write more optimized and efficient code. Before explaining Solidity assembly, we need to understand how the components of the EVM work.
EVM is a quasi-Turing complete state machine. In this case, the term"quasi"means that the execution of the process is limited to a limited number of computational steps, depending on the amount of Gas available for any given smart contract execution. This is how Ethereum handles stalling issues and situations where an execution might (maliciously or accidentally) run forever. This avoids the complete paralysis of the Ethereum platform.
Gas is a concept that measures the amount of computation required to complete a transaction in Ethereum. Transaction costs are paid in Ether and are tied to Gas and Gas price. Our goal in this process is to learn how to minimize the total amount of gas consumed without compromising security.
Code optimization issues
Inline assembly is a way to access the EVM at a lower level method. It bypasses several important security features and checks of Solidity. Proper use of inline assembly can significantly reduce execution costs. However, you should only use it for tasks that require it, and only if you know what you are doing. Using inline assembly to optimize your code can introduce new security issues into your code. To master inline assembly, we need to understand how the EVM and its components work.
In the EVM, you have to pay every time you access any stored variable for the first time. This is called a "cold" access and costs 2100 gas. The second or consecutive access is called a "hot" access and costs 100 Gas.
The following code is an example of how we can use Yul to optimize our code. The function SetData1 sets a new value for a global variable value in the traditional way using Solidity. The first time we allocate this new value it costs 22514 gas. The second one costs much less, 5414 Gas.
Introduction
This article introduces a series of articles that deeply explore the Ethereum Virtual Machine (EVM) ) and Solidity Assembly for smart contract optimization and security.
The Ethereum Virtual Machine (EVM) is the core component of the Ethereum network. EVM is a software that allows the deployment and execution of smart contracts written in high-level languages such as Solidity. After the contract is written, it is compiled into bytecode and deployed to the EVM. EVM runs on every node on the Ethereum network.
Solidity Assembly is a low-level programming language that allows developers to write code at a level closer to the EVM itself. It provides more granular control over smart contract execution, allowing for optimizations and customizations not possible through higher-level Solidity code alone.
The language used for inline assembly in Solidity is called Yul. The programming language acts as an intermediary for compilation into EVM bytecode. It is designed as a low-level language that enables developers to have more fine-grained control over the execution of smart contracts. It can be used in standalone mode or inline assembly in Solidity. Yul is designed as a low-level stack-based language that enables developers to write more optimized and efficient code. Before explaining Solidity assembly, we need to understand how the components of the EVM work.
EVM is a quasi-Turing complete state machine. In this case, the term "quasi" means that the execution of the process is limited to a limited number of computational steps, depending on the amount of gas available for any given smart contract execution. This is how Ethereum handles stalling issues and situations where an execution might (maliciously or accidentally) run forever. This avoids the complete paralysis of the Ethereum platform.
Gas is a concept that measures the amount of computation required to complete a transaction in Ethereum. Transaction costs are paid in Ether and are tied to Gas and Gas price. Our goal in this process is to learn how to minimize the total amount of natural gas consumed without compromising safety.
Code optimization issues
Inline assembly is a way to access the EVM at a lower level method. It bypasses several important security features and checks of Solidity. Proper use of inline assembly can significantly reduce execution costs. However, you should only use it for tasks that require it, and only if you know what you are doing. Using inline assembly to optimize your code can introduce new security issues into your code. To master inline assembly, we need to understand how the EVM and its components work.
In the EVM, you have to pay every time you access any stored variable for the first time. This is called a "cold" access and costs 2100 gas. A second or consecutive visit is called a "hot" visit and costs 100 gas.
The following code is an example of how we can use Yul to optimize our code. The function SetData1 sets a new value for a global variable value in the traditional way using Solidity. The first time we allocate this new value it costs 22514 gas. The second one costs much less i.e. 5414 gas.
p>
Function setData2 implements inline assembly. Inline assembly blocks are marked by assembly { … }, where the code within the curly brackets is Yul language code. There is no need to know the source code at this point, just remember that the software is accessing the storage space at a lower level. Therefore, execution costs will be lower.
In our example, modifying the value for the first time will cost 22484 Gas. Several times in a row, the cost is 5384 Gas. The difference may not seem significant, but we should consider that this code may be executed thousands of times.
p>
Why is storage so expensive? Remember that we are in a decentralized world where data is not stored in just one place but on tens of thousands of nodes. It must also be easily available to every node in the network if future transactions need to access or change it. The overall cost of this data is equal to the sum of the storage space it consumes and the amount of computation required to generate it across the network.
EVM stack, storage and memory
EVM is a stack-based A machine that operates on a data structure called a stack, which holds values and performs operations. The EVM has its own set of instructions (called opcodes) that are used to perform tasks such as reading and writing storage, calling other contracts, and performing mathematical operations. The stack operates in a last-in-first-out (LIFO) fashion, see Figure 1, which means that the most recently inserted item is stored at the top of the stack and is the first item to be removed.
p>
When executing a smart contract, the EVM creates an execution context that contains various data structures and state variables. After execution is complete, the execution context is discarded in preparation for the next contract. During execution, the EVM maintains a temporary memory that does not persist between transactions. The EVM executes a stack machine with a depth of 1024 entries. Each item is a 256-bit word, this size was chosen to facilitate the use of 256-bit hashing and elliptic curve encryption.
EVM has the following components, see Figure 2:
Stack: The EVM's stack is a data structure that operates in a last-in-first-out (LIFO) manner and is used to store temporary values during smart contract execution.
Storage: Permanent storage, part of the Ethereum state, initialized to zero only the first time.
Memory: a volatile, dynamically sized byte array used to store intermediate data during contract execution. Each time a new execution context is created, memory is initialized to zero.
Calldata: This is also a volatile data storage area, similar to memory. However it stores immutable data. It is designed to hold data sent as part of smart contract transactions.
Program Counter: The program counter (PC) points to the next instruction to be executed by the EVM. The PC usually increments by one byte after an instruction is executed.
Virtual ROM: Smart contracts are stored as bytecode in this area. Virtual ROM is read-only.
EVM stack
In this architecture, the instructions of the program and data are held in memory, the execution of the program is controlled by thestack pointer pointing to the top of the stack. The stack pointer keeps track of where on the stack the next value or instruction will be saved or retrieved. When the program runs, it adds values to the stack and performs operations on values that are already there. When the code wants to add two numbers, it pushes the numbers onto the stack and then performs the addition operation on the top two values. The result is then returned to the stack.
p>
One of the most important features of a stack-based architecture is that it allows for highly simple and efficient execution of operations. Since the stack is a LIFO data structure, data and instructions can be processed easily and quickly.
EVM has its own set of instructions called opcodes. Opcodes are used to perform tasks such as reading and writing storage, calling other contracts, and performing mathematical operations. The EVM instruction set provides most of the operations you might expect, including:
Stack operations: POP , PUSH, DUP, SWAP
Arithmetic/comparison/bitwise: ADD, SUB, GT, LT, AND, OR p>
Environment: CALLER, CALLVALUE, NUMBER
Memory operations: MLOAD, MSTORE, MSTORE8, MSIZE
Storage operations: SLOAD, SSTORE
Program counter related opcodes: JUMP, JUMPI, PC, JUMPDEST
Stop opcodes: STOP, RETURN, REVERT, INVALID, SELFDESTRUCT
EVM storage
EVM storage is a non-volatile space that stores 256-bit –> 256-bit key-value pairs. The total number of storage slots in the contract is 2²⁵⁶, which is a very large number of slots. Each smart contract on the blockchain has its own storage space.
During a function call, stores data that needs to be remembered between function calls. It is used to store variables and data structures that need to be available even after the smart contract execution ends.
p>
The operation codes for accessing storage are: SLOAD and SSTORE
The storage for this account is permanent data storage , used only by smart contracts. Externally owned accounts (EOA) will always have no code and empty storage.
EVM memory
Memory is volatile memory in the architecture, and its data is in the area There is no persistence in the blockchain. Memory is a random access data structure that stores temporary data during smart contract execution.
p>
The memory is divided into four parts: 2 slots for scratch space, 1 slot for free memory pointers, 0 slots and 1 slot pointing to available free memory . The first 64 bytes of space will be used by hashing methods, which require temporary space to store intermediate output before eventually returning the final output.
The free memory pointer is just a pointer to the beginning of free memory. It ensures that the smart contract keeps track of which memory locations have been written to and which ones are still available. This prevents the contract from overwriting some memory that has been allocated to another variable. Figure 6 shows how memory is divided:
Memory is used to store variables and data structures that do not need to be stored in memory. Memory size can be adjusted during smart contract execution, but access is slower and more expensive than a stack.
Consider that the memory is zero-initialized, and the opcodes used to access the memory are: MLOAD, MSTORE, MSTORE8
Summary
In this article, we review some basic concepts related to the Ethereum Virtual Machine (EVM). Implementing inline assembly code requires a deep understanding of the EVM. This is because we are interacting with some components of the EVM. In future lessons, we will analyze other EVM elements in more detail, such as: Storage, Memory, and Calldata. Additionally, we review important concepts such as bytecode, Gas, and Application Binary Interface (ABI). Finally, we'll discuss how opcodes work and more inline assembly examples to safely optimize smart contract execution.