facebook youtube pinterest twitter reddit whatsapp instagram

Tonics Toc - Table of Content Algorithm

This is another guide in Tonics Road To Release category, in case you don't know what it is, Tonics is a General Multi-Purpose Modular CMS, you can read more about its Architecture & Features (Quick Overview).

In this guide, I would take you on a tour of the algorithm behind the Table of Content extension, in short, it is what I am using on this page (or on the above Architecture link).

Demo

Here is a quick demo of the Table of Content:

Background

I was perplexed as to how to generate a Table of Content, so I researched online and saw some solutions but the problem I had was most did not take into account the proper arrangement of nested headings like it is done on Wiki and books, so, I decided to figure it out on my own.

This isn't a problem of nested headings per se but they mostly fumble when the current heading level is lesser than the previous one or when the level goes too deep and you are trying to back off into the appropriate position.

Note: I do not want to sound like my implementation is the better way to do it or something, but it's cool when you figure out things on your own, so, this is more like documentation for me, than anything else, good if you find it useful.


Between I already have experience working with stack and tree like data structure so, this is just a matter of understanding the structure of a TOC and brainstorming how it can be algorithmically done.

Implementation

The implementation can be language agnostic but I'll go with vanilla JS since we have some cool helpers that could speed things up.

Step 1: Get The Headers Element

The first step is to select the headers element using the querySelectorAll method, I'll only be selecting h1-h4 headings but you can as well do h1-h6, it's your choice:

const selector = 'h1, h2, h3, h4';
const tocHeader = document.querySelector('.containerElement').querySelectorAll(selector);

The querySelectorAll() method is called on the selected container element ('.containerElement') which returns a NodeList of all the header elements within the container element that match the selector strings specified in the selector.

Step 2: The Helpers and Initial Variables

To make it easier, I'll dump the code and explain.

let currentLevel = 0, tocTree = [], lastAddedElementToTree = 0, result = `<ul>`,
item = {}, childStack = [],
breakLoopBackward = false;

function pushToTree(item) {
item.index = tocTree.length;
tocTree.push(item);
}

function* loopTreeBackward(treeToLoop = null) {
if (treeToLoop === null) {
treeToLoop = tocTree;
}
for (let i = treeToLoop.length - 1; i >= 0; i--) {
if (breakLoopBackward) {
break;
}
yield treeToLoop[i];
}
breakLoopBackward = false;
}

function str_replace(search, replace, subject) {
const map = new Map(search.map((item, index) => [item, replace[index]]));
const regex = new RegExp(search.map(item => item.replace(/[-[\]{}()*+?.\\^$|#,]/g, '\\$&')).join('|'), 'g');
return subject.replace(regex, matched => map.get(matched));
}

function slug(string, separator = '-') {
string = string.trim();
string = str_replace(
['&', '@', '%', '$', '*', '<', '>', '+', '!'],
[' and', ' at', ' percentage', ' dollar',
' asterisk ', ' less than', ' greater than', ' plus', 'exclamation'], string);
return string.toLocaleString()
.toLowerCase()
.normalize('NFD').replace( /[^\S]+/g, separator);
}

The above code contains the initial variable and some helper functions, here is a breakdown:

Initial Variables

Here is an explanation of the initial variables:

  • currentLevel: This variable is used to keep track of the current level of the table of contents tree. It starts at 0, which represents the top level of the tree.
  • tocTree: This is an array that represents the table of contents tree. Each item in the array is an object with properties such as title, slug, level, and index. The pushToTree(item) function adds items to this array.
  • lastAddedElementToTree: This variable is used to keep track of the index of the last added item in the tocTree array. It starts at 0.
  • result: This variable is used to store the HTML markup for the table of contents. It starts with the opening <ul> tag.
  • item: This is an object that represents a single item in the table of contents. It has properties such as title, slug, level, and index. This object is used to create new items that are added to the tocTree array.
  • childStack: This is an array that is used to keep track of the hierarchy of the headers encountered so far. It is an array that acts like a stack data structure. Each time we encounter a new header, we check its level and compare it with the previous header's level.
    Depending on the result, we either push the current header to the childStack or pop the stack until we find a header with the same level as the current header. 

    The childStack helps us keep track of the hierarchy of the headers, so we can determine the correct nesting of the table of contents.

  • breakLoopBackward: This variable is used in the loopTreeBackward() function to break out of the loop if needed. It starts as false.

Helper Functions

Here is an explanation of the helper functions:

  • The pushToTree(item) function adds an item to the tocTree array and assigns the current length of the array as the index property of the item.
  • The loopTreeBackward(treeToLoop) function is a generator function that loops through the tocTree array backward and yields each item. It can optionally take an argument treeToLoop which is the array to loop through.
  • The str_replace(search, replace, subject) function replaces all occurrences of each item in the search array with its corresponding item in the replace array in the subject string. It does this by creating a Map object from the search and replace arrays and using a regular expression to replace all occurrences of each search term with its corresponding replace term. This would be used in the slug function.
  • The slug(string, separator = '-') function takes a string and converts it to a URL-friendly slug. It first removes leading and trailing white space from the string. It then replaces certain special characters in the string using the str_replace() function, and finally, it converts the string to lowercase, normalizes it, and replaces all non-whitespace characters with the separator character. The default separator is -.

The str_replace function is inspired and a modified version of this: https://stackoverflow.com/questions/5069464/replace-multiple-strings-at-once/34560648#34560648

Step 3: Understanding The Algorithm

Let's try to understand the algorithm before implementing it, consider the following HTML code (I international indented the headings for clarity):

<body>
<h1>Heading 1</h1>
<h2>Subheading 1.1</h2>
<h3>Sub-subheading 1.1.1</h3>
<h2>Subheading 1.2</h2>
<h3>Sub-subheading 1.2.1</h3>
<h3>Sub-subheading 1.2.2</h3>
<h1>Heading 2</h1>
<h2>Subheading 2.1</h2>
<h2>Subheading 2.2</h2>
</body>

1) Init

Initially, tocTree is empty (as you can see in the initial variables section), so when the first header, <h1>Heading 1</h1>, is encountered, the tocTree is still empty, and we, therefore, call the pushToTree() function to push the first item to the tocTree.

2) The Next Header

Ready a new item object that contains the following properties value: currentLevel, headerID, headerText, and an empty data property value (not necessary as you can add that dynamically when the needs arise)

  • If the current level is the same as the last added element's level,  switch to step 5
  • If the current level is greater than the last added element's level, switch to step 3
  • If the current level is lesser than the last added element's level, switch to step 4
  • If the next header is the last header element, switch to 6

Finalize, by closing off in Step 7

3) The Current Level is Greater Than The Last Added Element's Level

If the current level is greater than the last added element's level, create a new item and push it to the tocTree, update the lastAddedElementToTree number, and finally push it to the childStack array

An example is the second header in the HTML code above, the second header, <h2>Subheading 1.1</h2> is greater than the first <h1>Heading 1</h1>

4) The Current Level is Lesser Than The Last Added Element's Level

If the current level is lesser than the last added element's level, loop the childStack backward and close the opened <li> tags until it finds a header with the same level as the current header.

While closing the li tags, pop it off the childStack and increment a variable timesToClose (should be initially set to 0).

If the loop has been broken,  the string </ul></li> is repeated timesToClose number of times, where timesToClose is the number of list items that we need to close to get to the same level as the current header.

This is because the HTML code that we generate for the table of contents is a nested list, and we need to close all the open list items to get to the same level as the current header. Then, we append the current item data to this string.

After that, we call pushToTree(item) to push the new item to the tocTree, and we update lastAddedElementToTree to the index of the last added element in the tocTree.

This is done so that we can use this index in the next iteration of the loop to check the level of the last added element to the tocTree.

An example of this step is the fourth header: <h2>Subheading 1.2</h2> which is lesser than <h3>Sub-subheading 1.1.1</h3>

5) The Current Level is The Same as The Last Added Element's Level

If the current level is the same as the last added element's level, add the current header's data to the last added element's data. They are on the same level, no point pushing anything.

An example of this is the sixth header, <h3>Sub-subheading 1.2.2</h3>, the heading level is the same as the fifth: <h3>Sub-subheading 1.2.1</h3>

6) Special Case: Last Item

If the header element is the last item to iterate, append the closing </li> tag to the last added element's data to complete the HTML structure of the table of contents. We do that for any last item regardless of whether it is on the same level as the previous item or not.

This completes the tocTree, and it can be used to generate the final HTML table of contents.

7) Closing Off

Finally, the algorithm iterates through  tocTree and concatenates all the data values into the result string variable. Then, it appends the closing </ul> tag to the result variable.

Having understand the algorithm, let's implement it...

Step 4: Implementing The Algorithm

There are several ways to implement the algorithm, we can either use a state machine or just by using forEach function and a couple of if statement which is what I'll be doing to make it easier on the eye.

I'll dump the code and explain:

if (tocHeader.length > 0) {
// The Algorithm Here //
tocHeader.forEach((header, index) => {

if (header.textContent.length > 0) {
let headerIDSlug = slug(header.textContent);
let headerText = header.textContent;
header.id = headerIDSlug;

currentLevel = parseInt(header.tagName[1]);
item = {
'level': currentLevel,
'headerID': headerIDSlug,
'headerText': headerText,
};
item.data = `<li data-heading_level="${currentLevel}"><a href="#${headerIDSlug}"><span class="toctext">${headerText}</span></a>`;
if (tocTree.length === 0) {
pushToTree(item)
} else {
// if currentLevel is same as previous, close previous and add unto it (for efficiency)
if (tocTree[lastAddedElementToTree].level === currentLevel) {
tocTree[lastAddedElementToTree].data += '</li>' + item.data;
}

if (currentLevel > tocTree[lastAddedElementToTree].level) {
tocTree[lastAddedElementToTree].data += '<ul>'
pushToTree(item);
lastAddedElementToTree = tocTree.length - 1;
childStack.push(item)
}

if (currentLevel < tocTree[lastAddedElementToTree].level) {
// loop backward..
let timesToClose = 0;
for (const treeData of loopTreeBackward(childStack)) {
if (treeData.level > currentLevel) {
treeData.data += '</li>'
childStack.pop();
timesToClose++;
}
if (treeData.level === currentLevel) {
breakLoopBackward = true;
}
}
item.data = '</ul></li>'.repeat(timesToClose) + item.data;
pushToTree(item);
lastAddedElementToTree = tocTree.length - 1;
}

// last element
if (index === tocHeader.length - 1) {
tocTree[lastAddedElementToTree].data += '</li>'
}
}
}
});

tocTree.forEach((item) => {
result += item.data;
});

result += '</ul>'; // End
}

Here is the breakdown:

The first step is checking if there are any headers (tocHeader) on the page, if so, it loops through each header using the forEach() method.

For each header, it checks if the header has a textContent, if so, it creates a slug from the text content of the header using the slug helper function that was explained earlier, saves the original text content of the header in the headerText variable and finally sets the id attribute of the header to the slug we just created, here is the code:

if (tocHeader.length > 0) {
// The Algorithm Here //
tocHeader.forEach((header, index) => {

if (header.textContent.length > 0) {
let headerIDSlug = slug(header.textContent);
let headerText = header.textContent;
header.id = headerIDSlug;

The next block of code gets the current level of the header by parsing the number from the header tag name, e.g, h1, h2, etc.

Having done that, it creates an object item to store the header information including the level, header ID slug, and header text, finally, It also sets the data property of the item object with an HTML string that represents a list item and an anchor tag containing the header text and ID slug:

currentLevel = parseInt(header.tagName[1]);
item = {
'level': currentLevel,
'headerID': headerIDSlug,
'headerText': headerText,
};
item.data = `<li data-heading_level="${currentLevel}"><a href="#${headerIDSlug}"><span class="toctext">${headerText}</span></a>`;

It would look like so in a preview:

<li data-heading_level="3">
<a href="#step-3:-implementing-the-algorithm">
<span class="toctext">Step 3: Implementing The Algorithm</span>
</a>
</li>

Let's move on...

This next block checks if the tocTree array, which stores the table of content items, is empty, If it's empty, it pushes the current item object onto the tocTree.

if (tocTree.length === 0) {
pushToTree(item)
}

If it's not empty, it checks the current level against the level of the last added element to the tree (tocTree[lastAddedElementToTree]).

If the current level is the same as the last added element, it appends the current item to the existing data property of the last added element.

 else {
// if currentLevel is same as previous, close previous and add unto it (for efficiency)
if (tocTree[lastAddedElementToTree].level === currentLevel) {
tocTree[lastAddedElementToTree].data += '</li>' + item.data;
}

If the current level is greater than the last added element, we need to add a new nested level in the TOC tree. To do this, we first add an opening <ul> tag to the last added element's data, then add the new item to the TOC tree using the pushToTree() function, update the lastAddedElementToTree variable to the new item's index in the TOC tree, and push the new item onto the childStack array.

if (currentLevel > tocTree[lastAddedElementToTree].level) {
tocTree[lastAddedElementToTree].data += '<ul>'
pushToTree(item);
lastAddedElementToTree = tocTree.length - 1;
childStack.push(item)
}

Next, we need to handle the case where the current level is lower than the last added element.

If the current level is less than the level of the last added element, the algorithm needs to traverse the childStack backward until it finds the last item with a level that is equal to or less than the current level.

This is achieved using the helper function loopTreeBackward() which iterates over the tree backward and stops when it finds an item with a level that is equal to or less than the current level.

When the desired item is found, the algorithm breaks the loop and creates the HTML closing tags for all the intermediate nested ul and li elements that need to be closed before the current item can be added to the tree.

The number of times to repeat the </ul></li> string is determined by counting the number of items popped off the childStack during the loop.

Finally, the current item is added to the tree and  lastAddedElementToTree and childStack variables are updated accordingly.

if (currentLevel < tocTree[lastAddedElementToTree].level) {
// loop backward..
let timesToClose = 0;
for (const treeData of loopTreeBackward(childStack)) {
if (treeData.level > currentLevel) {
treeData.data += '</li>'
childStack.pop();
timesToClose++;
}
if (treeData.level === currentLevel) {
breakLoopBackward = true;
}
}
item.data = '</ul></li>'.repeat(timesToClose) + item.data;
pushToTree(item);
lastAddedElementToTree = tocTree.length - 1;
}

If the current header is the last one, the algorithm adds the closing </li> tag to the last item in the tree:

// last element
if (index === tocHeader.length - 1) {
tocTree[lastAddedElementToTree].data += '</li>'
}

That is the end of the algorithm, outside the forEach iteration, it iterates the collected tree item to concatenate all the HTML strings generated for each item in the tree, and finally adds the closing </ul> tag to the result string:

tocTree.forEach((item) => {
result += item.data;
});

result += '</ul>'; // End

Conclusion

There are several ways to implement the algorithm but this is one way. 


Related Post(s)

  • Tonics Architecture & Features (Quick Overview)

    In this post, I would share a quick overview of the tonics architecture, note that there are so many things to share about this project, I am only doing a quick overview as a way to warm-up, let's get