Laravel and Swoole - Diving Laravel

In a typical LEMP stack setup, Nginx is going to proxy the request to the PHP-FPM process through a UNIX socket. The FPM master process will assign a PHP worker to handle the request and then send the response back.

PHP-FPM starts and manages a pool of PHP workers whose job is to handle incoming requests. A worker starts by initializing PHP and all its installed extensions and then waits for requests. Once a request is received, the worker is going to execute the script (/public/index.php in the case of Laravel) and return the response. Each worker can execute a single request at any given time.

Now a server doesn't understand instructions written in PHP. So PHP compiles its code to op codes and passes them to a scripting engine (Zend Engine) that is used to execute them. Doing this for every request costs time and machine resources. For that reason, production servers use OPcache to only compile the PHP code once, after the first request, and store it in the server memory to be reused by Zend engine later.

This makes request handling much faster, because the PHP worker won't have to interpret the PHP script each time. However, the worker still has to execute the script (/public/index.php) for every request. That includes bootstrapping Laravel by registering container bindings and booting Service Providers. That's all work that must be done for every request before your actual application code is executed.

For the vast majority of web applications, this is not a problem. In fact, it's considered to be one of the perks of using PHP. The script terminates after every request so developers won't have to worry about memory leaks. However, for applications with very high traffic, the framework initialization process consumes a fair amount of time and server resources. To allow the server to handle more requests, developers need to do one of these things:

Add more PHP workers.
Handle requests faster.

Adding more workers allows the server to handle more requests concurrently. On the other hand, handling requests faster allows the server to switch to new requests at a faster rate and thus handle more requests during the same time interval.

Adding more workers is the easiest solution, just update the value of pm.max_children inside the /etc/php/{version}/fpm/pool.d/www.conf file. However, more workers means more memory consumption as each process requires its own private memory space. It also means more CPU consumption if your application code is CPU bound (Does a lot of calculations) and more context switching if your code is I/O bound (Waits for DB queries, HTTP requests, etc...).

There's always a sweet spot when it comes to the number of processes to run; based on the server resources and the application code. You usually find this sweet spot via trial-and-error.

So if we assume you already reached the maximum number of PHP workers you can run on your server, your options now are:

Scale the server by adding more resources. This will allow you to start more PHP workers to process more requests in parallel.
Add more servers and serve the application behind a load balancer. This will allow you to have several PHP Worker pools across multiple servers and thus process more requests.
Optimize your application to run faster.

Scaling is the solution if you have money to throw at the problem. If you don't, then consider optimizing your application code and DB queries. Those little optimizations do wonders. Sending heavy tasks to the queue also ensures requests are handled much faster.

But let's assume you've already done that and you still want to serve more requests per second. Now your only option is eliminating the part in which the framework is bootstrapped with every request. By doing so, a worker will handle the request in a shorter time allowing it to quickly switch to handling the next request.

We were able to eliminate converting PHP code to OPcode on every request by caching it. How about we do the same for framework initialization? Do the work once when the PHP worker starts, keep the bootstrapped application in memory, and use this same application instance to handle every request.

PHP-FPM cannot do this. Other PHP process managers can. Like Swoole.

Swoole Server

The idea is that Swoole allows you to handle how the workers work. Since you are in control, you can create an instance of the application once when a worker starts and keep using that same instance for every request handled by this worker:

$app = null;

$server->on('workerstart', function () use ($app){
    $app = new Illuminate\Foundation\Application(...);

    $app->bootstrap();
});

$server->on('request', function ($request, $response) use ($app){
    $laravelResponse = $app->make(Kernel::class)->handle($request);

    $response->end(
        $laravelResponse->getContent()
    );
});

With this in place, we remove these tasks from the list of what PHP has to do on every request:

Register the Composer autoloader.
Creating the application/container instance.
Load the application configurations.
Load the service providers.
Register the service providers.
Boot the service providers.
Configure the Kernel.
Create a database connection.

Bootstrapping the application once during a worker lifecycle saves resources that you may allocate to handling more requests for high traffic environments. But at the same time, it makes it easier for data to leak between requests. Imagine handling a request that sets auth()->user() to user #10, and then user #30 makes the next request but the application still thinks it's serving user #10. A Fatal Disaster!

This approach adds complexity that PHP is known for lacking. It has its pros but it also has its cons.

Laravel Octane was built to take care of cleaning between requests. It does its best to ensure the framework bindings and application state don't leak between requests.

Swoole Coroutines

If you're not familiar with coroutines, go read this introduction before continuing with this section.

Another benefit that Swoole offers is allowing you to handle each request inside a coroutine. That way a worker will be able to process multiple requests concurrently (at the same time) and switch between them whenever one is waiting for an I\O operation. This is a major benefit as it'll significantly increase your server's throughput.

In other words, in a traditional setup, 10 workers will be able to handle only 10 requests at the same time. With Swoole coroutines enabled, the same 10 workers will be able to handle more than 10 requests at the same time. With most of our web applications' code I\O bound, context switching by the Swoole scheduler will happen a lot, giving the chance for more requests to be handled.

However, doing so requires a lot of changes to how the Laravel framework runs. It'll also require that package maintainers adapt their packages, in some cases, to work with context switching. Something that the PHP world is not familiar with yet.

Octane takes care of cleaning between requests. However, with coroutines enabled, Swoole will switch from one request context to another concurrently. This requires that the framework stores an isolated state of all requests being handled at any given time. And then clean those states after a request is served. This is unlike any other PHP environment ever existed, and non of the PHP frameworks are built to deal with that.

There's also the problem of using database connections while coroutines are enabled. We need to have a dedicated database connection for each coroutine context since two coroutines cannot use the same database connection at the same time. And since databases can only accept a specific number of concurrent connections, we'll need to use connection pooling to manage connections and re-use them once a coroutine context is done and we're moving to another.

We at Laravel want to bring this very useful feature to the table and offer it as an option for the framework users. But this will take some time in order to do it right.

Swoole Task Workers

Swoole keeps a number of workers reserved for handling tasks. They don't handle HTTP requests at all, they just sit there and wait for incoming tasks. Octane uses this feature to allow you to send some tasks to be handled by those workers, wait for them to finish, and then use the results in handling the HTTP request.

You can use this feature to run a few slow DB queries in-parallel by sending them to the task workers. You can also configure Octane to run the tasks in the background and not wait for the results.

Are you familiar with the dispatchAfterResponse() Laravel helper? It allows you to send the response and then keep the FPM worker process running some tasks before it'll be able to handle the next request. This causes a delay of course as the worker will be occupied. Swoole configure separate workers to handle these tasks, that way your request handling workers will immediately handle the next incoming request.

Conclusion

Swoole is great. The benefits it brings to the PHP world are faster request handling and higher requests concurrency. It does that by allowing you to use your machine resources in the most efficient ways; no idle CPU cores, and no wasted memory space.

Unfortunately, it brings complexity that's not known to the PHP world. However, Laravel Octane is being developed to eliminate much of this complexity so developers can build their Laravel applications the same way they are used to while still run their applications faster than ever via Swoole.