UL
IST
FMUL
INSA
iMM

Community Finding with Applications on Phylogenetic Networks

x

Guide

  1. Pick Network
  2. Choose View
  3. Select Algorithm
  4. Run
Phyl
Mixing
Average Degree
Network Size
Max Node Degree
Min Community Size
Max Community Size

Objectives

The aim of the thesis (extended abstract) was to implement three community finding algorithms – Louvain, Infomap and Layered Label Propagation; to benchmark them using two synthetic networks – Girvan-Newman and Lancichinetti-Fortunato-Radicchi; to test them in real networks, particularly, in one derived from a Staphylococcus aureus MLST dataset; to compare visualization frameworks – Cytoscape.js and D3.js (using SVG and Canvas elements), and, finally, to make it all available online.

Algorithms

Louvain

Louvain Algorithm NPM

Louvain is divided in 2 phases:
Modularity Optimization - At the beginning, the algorithm will randomly order all nodes in the network such that, one by one, it will remove and insert them in a different community. This will continue until no significant modularity's variation is verified (using equation below);
Community Aggregation - After finishing previous step, every node belonging to the same community is merged into a single giant one and the links of the new network are the sum of the links connecting nodes from the same pair of communities.
Both steps are executed until modularity's variation is lower than an input value.
Louvain Equation

Infomap

Infomap Algorithm NPM

Infomap considers 2 steps: Minimizing Description Length (using map equation) and Community Aggregation. Although the implementation is similar to the Louvain algorithm, the optimizing function is different:
Map Equation

Layered Label Propagation

Layered Label Propagation Algorithm NPM

LLP is a semi-supervised algorithm that is able to detect communities based on a set of nodes that is partially labeled or unlabelled.
Considering an unlabelled network, it starts by assigning a random community to each node. Then, at the beginning of each cycle, all nodes are considered in a random order. Based on the equation below, the community that maximizes its value for a given node, it is the one that is assigned. Being ki the number of nodes with label λi in the neighborhood of a given node and νi the total number in the whole graph with the same label. γ is the resolution parameter that buffers the number of nodes belonging to a given community in the neighborhood, by its usual presence across the whole network.
This algorithm is equivalent to Label Propagation whenever γ is 0.
Layered Label Propagation Equation

Results

Web Tools

PHYLOViZ

PHYLOViZ Online

This is an online version of the software PHYLOViZ, a software that allows the analysis of sequence-based typing methods that generate allelic profiles and their associated epidemiological data. The main purpose was to give an user-friendly solution for data analysis and sharing without installing any specific software.

INSaFLU

INSaFLU

It is a web-based application that deals with primary data towards the automatic generation of the output data that are the core first-line “genetic requests” for effective and timely influenza laboratory surveillance. Data integration is continuously scalable, fitting the need for a real-time epidemiological surveillance during the flu epidemics.

INSaFLU

Events

MTiH

1st Workshop MTiH (Madrid, Spain)

As an MTiH student, I presented a poster of the thesis and talked about past academic achievements. I was still able to suggest new ways of improving this European program, powered by EIT Health. Bursary awarded.

AMR Hackathon 2018 (Heidelberg, Germany)

Selected to participate in a 3-day AMR Hackathon, organized by EIT Health Germany. Participants were divided in teams of 6/7 elements that would analyze AMR situation in a different EU country. Bursary awarded.

AMR Hackathon
website counter

Luís

ResearchGate NPM Docker Hub GitHub