TechAlpine – The Technology world

Exploring PHP and Zend Engine Internals

Overview: Version 1.0 of the Zend Engine is very important part like the heart and brain of PHP 4.0. It has the process which provides the sub-structure and facilities to the functional modules. It also implements the language syntax as well. The Zend Engine 1.0 is actually the second revision of the PHP scripting engine. It is still based on the same rules as the PHP 3.0 engine that was basically Zend Engine 0.5. Now it is permissible to migrate the path from PHP 3.0 to 4.0. The development has the same ‘state of mind’ as per PHP 3.0. We feel it is right time to start working towards a revision of the Zend Engine. It would also incorporate new structures and solutions to some of the most difficult problems faced by the PHP designer or developers.

In this article I will discuss about the Zend Engine internals based on PHP platform.

Introduction: Zend Engine is an open source scripting engine which acts as an interpreter for the PHP programming language. This was initially developed by two students at the Technion – Israel Institute of Technology. Zend engine is a virtual machine or VM. As we know that a Virtual machine is nothing but software which simulates a physical computer. The Zend engine consists of multiple components e.g. a compiler, ZFMI (Zend Function Module Interface) and a virtual CPU or an executor.

How Zend Engine Works:

A Zend engine consists of three major components –

  • Lexical analyzer or lexer
  • Parser
  • Executor

We know that Zend is a scripting engine and it works as an interpreter. So let us check different phases of a script which is subjected to a Zend engine. The script passes through the following steps and finally gets executed by Zend engine –

  • Step 1Lexical Analysis – In this step the script is passed through a lexical analyzer also known as lexer. Here the script which is human readable is migrated to tokens which are understood and accepted by the machine. Once the entire script is tokenized, the tokens are passed to the parser.
  • Step 2Parsing – In this step, the parser parses the tokens which it receives from the lexer and generates an instruction set which runs on the Zend engine. The Zend engine is nothing but a virtual machine (VM) with instruction set which are similar to assembly language and executes it. Parser generates the abstract syntax tree which can be optimized before passing to the code generator. This whole mechanism is jointly called compilation. The output of the compilation is an intermediate code which is a machine independent code for Zend virtual machine. This intermediate code contains an array of instruction set for the Zend Virtual machine also known as operation codes or in short opcodes. These opcodes are three address codes – two operands for the input and one for the output. In addition to these the opcodes also contains a handler which processes the operands. These opcodes contains instructions to perform all sorts of operations ranging from a basic operation on the two inputs and storing the output onto the third operand to a complex scenario which requires implementing a flow control.
  • Step 3Execution – Once the intermediate code is generated, it is passed to the executor which reads each of the instructions from the array and executes them.

The compilation and execution phases are executed by two separate functions within the Zend engine. These are Zend_compile and Zend_execute.
Web server Interaction involving Zend engine:
The internal architecture of the Zend engine is shown below in the diagram –

Zend engine Architecture

Zend engine Architecture

Image 1: Zend engine Architecture

Internal components of Zend Engine: Now let us check the internal components of Send Engine one by one.

ZMFI or Zend Function Module Interface:
This interface acts as a communication channel between the function modules. Function modules are nothing but PHP extensions which have some module written and included within them.

Opcode Cache:  Opcode cache is a generic cache which resides within the Zend engine and caches the opcode of a file. If the file is requested again it just gets executed from the cache if there is no change in the file.
Some Examples: Let us take an example to check different phases of a PHP code when it goes through a Zend Engine.
First, we will discuss a simple example as shown below.

Listing 1: Sample PHP file

[Code]

<?php

$name = ‘Ricardo’;

echo $name;

?>

[/Code]

The above PHP code when subjected to Zend engine, is converted in the following opcode –

Showing generated opcode

Showing generated opcode

Image2: Showing generated opcode

The executor of the Zend engine reads these opcodes one at a time and executes it as per the instruction mentioned in the opcode. The above code is executed in the following manner –

  • Opnum 0 or Opcode 0 – In this step, the pointer to the variable – ‘name’ is assigned the Register 0. Subsequently we use ‘ZEND_FETCH_W’ (where w stands for write) and assign it to the variable.
  • Opnum 1 or Opcode 1 – In this step, the ZEND_ASSIGN handler assigns the value – ‘Ricardo’ to Register 0 which is pointer to the variable – ‘name’. Register 1 is also assigned but never used. It could have been utilized if we had an expression like –

[Code]

if ($name == ‘Ricardo’) { }

[/Code]

  • Opnum 2 or Opcode 2 – In this step, we re-fetch the value of $name into Register 2. We use the opcode ZEND_FETCH_R as the variable is used in a read only context.
  • Opnum 3 or Opcode 3 – In this step, the instruction ‘ZEND_ECHO’ prints the value of Register 2 by sending the value to the output buffering system.
  • Opnum 4 0r Opcode 4 – In this step, the instruction ‘ZEND_RETURN’ is called which sets the return value of the script to 1. As we know even if we do not call the explicit return which is true for this case as well, every script contains an implicit return 1.

Now we will have a look into a slightly complicated example as shown below

Listing 2: Sample PHP file with conversion to upper case

[Code]

<?php               $name = ‘Ricardo’;               echo strtoupper($name);        ?>

[/Code]

As we see here this script initializes a variable and then prints the same after converting the text into upper case. The intermediate code dump for the above PHP script is quite similar to the earlier one –

Showing generated opcode

Showing generated opcode

Image3: Showing generated opcode

 The opcodes in the above two examples are quite similar except the following –

  • Opnum 3 or Opcode 3 – In this step, the instruction ‘ZEND_SEND_VAR’ pushes a pointer to Register 2 which has the variable – $name into the stack of arguments. This argument stack is designed to be called by the functions in the order prints the value of Register 2 by sending the value to the output buffering system.

·         Opnum 4 0r Opcode 4 – In this step, the instruction ‘ZEND_DO_FCALL’ is called which internally calls the ‘strtoupper’ function and also mentions that the output should be send to Register 3.  Following diagram shows the work flow direction while a PHP script is passes through the Zend engine.

Showing work flow in Zend engine

Showing work flow in Zend engine

Image 4: Showing work flow in Zend engine

Summary:

Let us summarize our discussion in the form of following bullets –

  • Zend engine acts as the heart and brain of PHP 4.0
  • Zend engine is a virtual machine which was developed by two students at Technion – Israel Institute of Technology.
  • Zend engine consists of three major components –
    • Lexical analyzer or the Lexer – Responsible to tokenize the script
    • Parser – Parses the tokens and generates the opcodes or opnums
    • Executor – Executes the Opcodes.
  • The Zend Engine contains the following internal components –
    • ZFMI or Zend Function Module Interface – The communication channel between different modules
    • Opcode Cache – Caches the opcodes so that they can be reused if required.

 

Tagged on: ,

Leave a Reply

Your email address will not be published. Required fields are marked *


6 + = 10

TechAlpine Books
-----------------------------------------------------------