This is a high-level guide in my road-to-release category describing how I created a faster router library for Tonics CMS in PHP.
Please, note that I am posting this in the hope that someone somewhere would find it useful, and most importantly as a high-level guide, I can come back to in the future.
- Rationale Behind The Router
- High-Level Overview of How Most Router Works
- High-Level Overview of My Implementation
- Benchmark
- Named Alias
- How Fast Can You Match Deep Dynamic Levels
Rationale Behind The Router
I was rewriting the 4th and final version of the Tonics CMS (circa 2021), and I needed to replace the terrible router I had in the 2nd and 3rd rewrites.
By the way, the first version was built with Laravel, and the 2nd and 3rd were built from scratch, however, I love the Laravel router (the syntax), and I want to try my best to retain it in the final rewrite.
I tried using the standalone router from Laravael, aside from the fact that it was a bit difficult to wire up for a non-laravel project, the dependencies were just too much, I might be weird, but anything above 5 dependencies is a deal breaker.
After researching more, I came across this one: https://github.com/skipperbent/simple-php-router, almost perfect, with little to no dependencies, and a laravel-like syntax.
I could have just gone with the library and moved on, however, after taking a sneak peek at how the library did the routing among others I have seen, I don't need most of the features.
Here is the library if you wanna take a quick look: https://github.com/tonics-apps/tonics-router-system
The rest of the guide gives a super simple overview of how most router works, then we delve into what I came up with, let's go...
High-Level Overview of How Most Router Works
Here is a table illustration:
GET |
admin/login |
admin/register |
password/reset |
... |
POST |
post/store |
user/store |
customer/store |
... |
If a user sends a GET request to admin/login
, this is easy, check the table, and if it is a GET request, do a hashmap lookup for the route which would be instant "0(1)", call the dispatcher, etc, pretty, cool huh?
The same applies to any POST request, we do a fast lookup.
By definition, we can say the routes are static, what if the routes are dynamic, here is another table:
GET |
admin/categories/:id |
admin/categories/:id/posts |
admin/categories/:id/posts/:id |
admin/categories/:id/posts/:id/edit |
... |
This is where most router library deploys regex, they define a set of regex that represent the desired URL patterns, and yh, it can include the above placeholders and a wild variety of constraints.
When an HTTP request comes in, the router makes an attempt to match the requested URL against the defined regexes.
If a match is found, the associated handler or function for that pattern is dispatched, cool right?
This is fine for most use cases.
I love the fact that insertion is faster, though, the expression can get complex over time.
One potential issue you might have is degradation over time, the more the routes, the higher the time it would take to match, although, it doesn't matter since the difference would be minuscule except you have 1 million routes (I barely have 500 routes).
Nevertheless, it is still a good idea to learn about other solutions, it might be useful somewhere.
I won't waste much time here, let's dive into the implementation I came up with...
High-Level Overview of My Implementation
Brainstorming
Before I started brainstorming, I knew I wanted something hierarchical, I just do not know how I should I go about it, so, I took my pen and paper, and I came up with the following idea:
The following day I came up with a similar design, but still trying to piece everything together.
In a little corner of that paper, you should notice the following:
[/]
[/] [home]
[/] [home] [:in]
...
Are you following the pattern?
Finding Route (Searching)
Suppose a user requests a dynamic route: /posts/:id
Where :in
can be dynamic, and we have the following table:
/ | ||
posts | ||
:id (dynamic) |
Note: The table represents a generated tree of all the routes we have in the route system.
How do we match this?
- The very first step is to split the URL (route that the user is requesting) into individual segments based on the delimiter ("/"). For the route "
/posts/:id
", the segments would be [/]
(root), [posts
], and [:id
]. I am oversimplifying here, you'll still need to do a bit of cleanup. - Then we begin at the root of the tree, which represents the solidus (
/
) in the URL path. It serves as the starting point for matching the segments We iterate through each segment of the URL path one by one, matching them against the corresponding nodes in the tree.
- Segment 1 (
/
): As we start at the root, we match the first segment, which is/
. Since the root represents the solidus (/
), the first segment matches successfully. - Segment 2 (
posts
): We move from the root to the next node in the tree that represents the "posts
" segment. If the node exists, we proceed to the next step. Otherwise, the route does not match, and we can stop and return a 404 error. - Hint: How do we move from the root to the next node, we can leverage the hierarchical feature. Check if there are no sibling nodes, in our case, there is none, so we move to the childNodes (
posts
), which is how we got to Segment 2 from 1. - Segment 3 (:
id
): In the tree, dynamic segments like ":id
" are typically represented by a special property which signifies we can have a potential dynamic segment. We check if such a node exists for the dynamic segment. If it does, we move to that node in the tree and proceed. If there is no special node, it suggests that the route does not support dynamic segments or does not have a match for the given dynamic value. In this case, the route does not match, and we can stop.
If we have successfully matched all the segments of the URL path, including the dynamic segment (if applicable), it means that the route matches the URL path.
Each node has its own HTTP methods, before we dispatch any handler, we can check if the requesting HTTP methods match, if it does, we can dispatch, checking the HTTP method is only a single lookup.
There are cases where you might have both a dynamic and a static route on the same path, here is an example:
/posts/23
and /posts/:id
Depending on how you want to implement it, you can give higher priority to the static route, this way, we would match the first one first, if we can't match the first, then we can match the second one.
Between, since this is a static route, we can just do a quick lookup, no need to traverse the tree.
Insertion
The above uses an already-generated route tree, how would we generate the tree?
Say we have the following:
$route->get("/post/:id", function (){});
Fortunately, the idea is similar to the search method, here are the steps:
- Just like before, we split the URL into individual segments based on the delimiter ("/"). For the route "
/posts/:id
", the segments would be [/]
(root), [posts
], and [:id
]. - Iterate through the segments of the route path and insert nodes into the tree by following the below steps:
- Segment 1 (/): Check if a node representing the
[/]
segment already exists, it doesn't, we create a root node for the tree - Segment 2 (
posts
): Check if a child node represents the[posts]
segment already exists in the root node's children. If it does, set the current node to that child node and proceed to the next step. If it doesn't exist, create a new node for the[posts]
segment, add it as a child to the root node, and update the current node to the newly created node - Segment 3 (
:id
): Similar to the previous explanation, as this segment starts with a colon (":"), it indicates a dynamic parameter. Check if there is a child node with a required parameter in the current node's children. If it exists, set the current node to that child node and update its route name to[:id]
. If there is no child node with a required parameter, create a new node for the[:id]
segment, add it as a child to the current node, and update the current node to the newly created node.
Tree construction is complete!
The tree would represent the route structure, with nodes for each segment and dynamic parameters appropriately handled.
Advantages
- Fast and Efficient Matching by organizing the routes into a tree-like structure, where each segment of the URL corresponds to a node in the tree.
- Reduced Search Space by only considering relevant paths.
- It can be memory-efficient, it only stores the necessary nodes and segments of the routes, minimizing unnecessary overhead.
Disadvantages
- Can use more memory as the route grows, this is a self-imposed disadvantage, you can mitigate this by not using objects to represent the node e.g. by using an array.
Constructing the tree initially can be time-consuming for large routes. Splitting the routes into segments and creating the appropriate nodes in the tree requires additional processing steps.
When dealing with long segments or deeply nested routes, lookup times can increase. Traversing through multiple levels of the tree to match long segments may introduce additional overhead, resulting in slightly slower lookup times. However, the impact of this drawback might be negligible unless dealing with extremely long segments or deeply nested route structures.
Even if you have: /path/:id/...
to 1k path, you should still be fine with this method which is an extreme route between.
Potential Improvement
The main bottleneck with this method is the insertion phase, for every route, we start from the root, then walk our way down the tree leveraging the path until our nodes have been inserted.
Avoid Unnecessary Node Insertions
Suppose we have the following routes:
/posts/:id/edit
/posts/:id/cat/edit
Instead of finding where to insert a node for each node in the routes, we can optimize it by combining it into something like so:
/posts/:id/edit
/cat/edit
The reason I place cat
below the edit
node is because they are siblings.
But how would we combine it?
The first step is to ensure your route is properly ordered, instead of the following:
$route->get("/post/:id/edit", function (){});
$route->get("/post/:id/cat/edit", function (){});
You should arrange it this way:
$route->group('/post/:id', function (Route $route){
$route->get("/edit", function (){});
$route->get("/cat/edit", function (){});
});
You let each method get a potential hint of the node path that is likely to already have been inserted, so, by the time, you get to the second route, we can simply jump to the appropriate position.
If you can't order the route, you can still jump to the appropriate position by tracking the path in your path generation stage.
You Do Not Need a Nested Node
Instead of storing nested nodes to represent the child nodes, you can simply store them as an array of nodes O(n), you store the child reference by leveraging the array index of the child.
This way, you can do away with the objects and still get less memory usage, this is just an idea, I did not do this myself.
It is still nested somewhat if you think about it.
Instead of having them in a nested-like format, you only have them in an array of nodes with each one of them pointing to either the next child index. The root node in this case would be the first one.
Benchmark
There is a quick demo on how to use the TonicsRouter, so, you can create your own benchmark, perhaps, I might be doing it the wrong way.
I am going to do a very simple benchmark, I would generate 100 routes with a bit of dynamic level, then we can benchmark that, something like so:
for ($i = 1; $i <= 100; $i++) {
$route->get("/post$i/:tell/:me/:your/:name", function (){}, alias: "posts.show.$i");
}
In Laravel, like so:
for ($i = 1; $i <= 100; $i++) {
Route::get("/post$i/{tell}/{me}/{your}/{name}", function (){})->name("posts.show.$i");
}
Direct Matching
This is what we do when a request comes, we find the route using the request URL.
Here is a benchmark that finds the URL 10k times (I mean, the last route, I am not interested in the first route, that is a cheat):
TonicsRouter
$treeGenerator = $route->getRouteTreeGenerator();
$start = microtime(true);
for ($i = 0; $i <= 10000; ++$i) {
// $treeGenerator->findURL("/post100/my/name/is/tonics");
}
$time = microtime(true) - $start;
echo $time; // 0.09s
It took 0.09s or 90ms to find the route when looped 10k times, not bad, let's do the same in laravel:
Laravel Router
$request = app('request')->create('/post100/my/name/is/tonics');
$route = app('router')->getRoutes();
$start = microtime(true);
$iterations = 10000;
for ($i = 0; $i <= $iterations; ++$i) {
$route->match($request);
}
$time = microtime(true) - $start;
echo $time; // 1.2s
Took 1.2 seconds on my machine.
Conclusion
The Tonics Router implementation is around ~13.3 times faster in this benchmark
Named Alias
Named alias is also a common feature across routers, you can use this to quickly generate a URL, useful in forms or when you want to redirect, let's do a benchmark of how fast that is in Tonics:
TonicsRouter
$treeGenerator = $route->getRouteTreeGenerator();
$start = microtime(true);
for ($i = 0; $i <= 10000; ++$i) {
$treeGenerator->namedURL("posts.show.100", ['my', 'name', 'is', 'tonics']);
}
$time = microtime(true) - $start;
echo $time; // 0.02s
Took 0.02s or 20ms on my machine, not bad, let's do the same in Laravel
Laravel Router
$route = app('router')->getRoutes();
$start = microtime(true);
$iterations = 10000;
for ($i = 0; $i <= $iterations; ++$i) {
route('posts.show.100', ['tell' => 'My', 'me' => 'Name', 'your' => 'Is', 'name' => 'Tonics']);
}
$time = microtime(true) - $start;
echo $time; // 0.2s
That took 0.2s on my machine.
Conclusion
The Tonics Router implementation is around 10 times faster in this benchmark.
How Fast Can You Match Deep Dynamic Levels
Here is where you appreciate the tree method, let's create something like so:
$dynamic = str_repeat("/:id", 1000);
$route->get("/post$dynamic", function (){});
Here I repeated the required param 1k times, here is the same approach in Laravel:
$dynamic = '';
for ($i = 1; $i <= 1000; $i++) {
$dynamic .= "/{id$i}";
}
Route::get("/post$dynamic", [ControllerTest::class, 'test']);
Now, let's do some searching, starting with the TonicsRouter:
TonicsRouter
$treeGenerator = $route->getRouteTreeGenerator();
$repeat = str_repeat('/hi', 1000);
$start = microtime(true);
$find = $treeGenerator->findURL("/post$repeat");
$time = microtime(true) - $start;
echo $time; // 0.0004
That took 0.0004s, around 0.4ms, let's do the same in Laravel
Laravel Router
$repeat = str_repeat('/hi', 1000);
$request = app('request')->create("/post$repeat");
$start = microtime(true);
$route = app('router')->getRoutes();
$route->match($request);
$time = microtime(true) - $start;
echo $time; // 0.03s
Took 0.03s in Laravel, which is around 30ms.
Conclusion
The more complex the dynamic levels, the slower it takes to match, again, this benchmark doesn't make much sense since no one would create a route as such, however, it is only for illustration.
The Tonics Router implementation is around 75 times faster here.
Hire Me
If you like what I do, I am available for hire, my email address is on my Hire Me Page: https://tonics.app/posts/f7b4f201bc4a91bb/hire-me, or just simply mail me at olayemi@tonics.app.
Thank you.