A Linear Algebra Method Application — Google PageRank Algorithm

Kofi Osafo
4 min readApr 18, 2022

The linear algebra method used by Google’s PageRank algorithm evaluates the importance of web pages without the need for human intervention. The idea is that the more inbound links a web page has from other websites, the more valid the content on the page. Any web page’s importance score will always be a non-negative real number (stochastic matrix).

PageRank employs a mathematical formula (Fig.1) to determine which link node is critical, and thus ranks the importance of the score in the order of search retrieval inputted by the user.

Fig.1

The Pi and lj links, thus if one of those links is linked to the page Pi, then Pj will pass on 1/ lj of its importance to Pi. The importance ranking of Pi is then the sum of all the contributions made by pages linking to it, that is the representation of a set of pages linking to Pi by Bi (represented in the above formula, Fig.2).

Fig.2

The corresponding matrix on the node’s network is then denoted by A the transition matrix of the graph, A=

In the diagram above (Fig.2), each node distributes its importance evenly to the pages to which it links. Node 1 has three outgoing edges, so it will pass 0.33 (1/3) of its importance to node 1, and node 2 will pass all of its importance to node 1. Nodes 2 and 4 transfer two outgoing edges with 0.5 (1/2) importance, while node 3 transfers one outgoing edge with a score of 1 importance.

Linear Algebra Method

In representing the above diagram and corresponding matrix in a linear method, let’s assume the website node contains n pages indexed by an integer 1 ≤ k ≤ n, xk will be used to denote the importance score of page k on the web. Which the xk is non-negative xj > xk indicates that page j is more important than page k. Hence, a node has k outgoing edges as it will pass on 1/k of its importance to each of the nodes that it links. Therefore x1, x2, x3,
and x4 denotes the importance of the four nodes (website pages):

The eigenvector corresponding to the eigenvalue 1, where c represents eigenvalue 1, C= 0.032

With the stationary vector (PageRank vector) below as:

Thus, with the results of the PageRank vector above, all the nodes round up to 1, denoting its popularity, because it has the most links to website pages (node), thus a search on Google engine made by a user will retrieve website page 1 to be rank at the top of the search because it appears to show with a higher score than the rest of the networked website pages (nodes).

However, while node 1 has two backlinks and node 3 has three backlinks, on the network graph, node 3 has only one outgoing edge link to node 1, transferring all of its importance to node 1. Similarly, if a web surfer only follows hyperlinks and arrives at page 3 (node 3), the user can only return to page 1. (node 1). As a result, it appears that the rank of each node, as the weighted sum of the edges that enter the node, is significant.

Conclusion

The PageRank algorithm quantifies the network relevance of nodes (website pages) using a simple linear algebra method. Whereas hyperlinks contained in popular websites and pointing to a webpage will bring in more internet traffic to the website, thereby significantly contributing to the popularity of the webpage.

References

Austin, D. (2017). Feature Column from the AMS. [online] Feature Column from the AMS. Available at: http://www.ams.org/samplings/feature-column/fcarc-pagerank

Rev., S. (2017). The $25,000,000,000 Eigenvector: The Linear Algebra behind Google | SIAM Review | Vol. 48, №3 | Society for Industrial and Applied Mathematics. [online] Doi.org. Available at: https://doi.org/10.1137/050623280

--

--

Kofi Osafo

Big data and Machine learning fanatic | MSc Big Data | Data Analyst | Finance | Banking | Compliance