The PageRank algorithm in Python

PageRank is an algorithm used by Google Search to rank websites in their search engine results. PageRank is a way to measure the importance of website pages.

This is not the only algorithm used by Google to order search engine results, but it is the first algorithm used by the company it is best known.

The PageRank of a page is calculated from the sum of the PageRank of pages with a link entering the calculated page that is divided by the number of outgoing pages of the page, a mitigating factor is applied to symbolize the probability that The user surfs another page.

I install networkx, it is a Python package for the creation, manipulation and study of structure, dynamics and complex network functions.

Networkx provides data structures and methods for storing graphs that I use for the pagerank algorithm.

import networkx as nx
import numpy as np

graphe-nx. DiGraph()

TablePages - ["A","B","C"]page rank #Exemple with 3 pages
graph.add_nodes_from (tablePages) #Ajout tops of the graph

#on adds bows, we have:
#la Page A has a link to B 
#la page B has a link to C
#la Page C has a link to B
#la page C has a link to A
Page B has 2 incoming link
Page C has an incoming link 2 links out
Page A has a link entering an outgoing link
graph.add_edges_from([('A','B'), ('C','A'),('B','C'), ('C','B')])
print ("Graphe Summits:")
print (graphe.nodes))
print ("Stop the graph:")
print (graph.edges)
#Si an attenuation factor of 0.85 'd' is considered
The page rank formula is:
#PR (1-d)/n - Sum of all pages (PR(i) of incoming links to p/number of link coming out of the page that reference p)
PR(A) - (1-0.85)/3 - 0.85 - (PR(C)/2)
PR(B) - (1-0.85)/3 - 0.85 - (PR(A)/1 - PR(C)/2)
PR(C) - (1-0.85)/3 - 0.85 - (PR(B)/1)

pagerank - nx.pagerank
print (pagerank)