The aim of the thesis (extended abstract) was
to implement three community finding algorithms – Louvain, Infomap and Layered Label
Propagation; to benchmark them using two synthetic networks – Girvan-Newman and Lancichinetti-Fortunato-Radicchi;
to test them in real networks, particularly, in one derived from a Staphylococcus aureus MLST dataset; to compare
visualization frameworks – Cytoscape.js and D3.js (using SVG and Canvas elements), and, finally, to make it all available online.
Louvain is divided in 2 phases:
Modularity Optimization - At the beginning, the algorithm will randomly order all
nodes in the network such that, one by one, it will remove and insert them in a different community. This will
continue until no significant modularity's variation is verified (using equation below);
Community Aggregation - After finishing previous step, every node belonging to the same community is merged into a single
giant one and the links of the new network are the sum of the links connecting nodes from the same pair of communities.
Both steps are executed until modularity's variation is lower than an input value.
Infomap considers 2 steps: Minimizing Description Length (using map equation) and
Community Aggregation. Although the implementation is similar to the Louvain algorithm, the optimizing function
Layered Label Propagation
LLP is a semi-supervised algorithm that is able to detect communities based on a set of nodes that is partially labeled or unlabelled.
Considering an unlabelled network, it starts by assigning a random community to each node. Then, at the beginning of each cycle, all nodes are considered
in a random order. Based on the equation below, the community that maximizes its value for a given node, it is the one that is assigned.
Being ki the number of nodes with label λi in the neighborhood of a given node and νi the total number in the whole graph with the same label. γ is the
resolution parameter that buffers the number of nodes belonging to a given community in the neighborhood, by its usual presence across the whole network.
This algorithm is equivalent to Label Propagation whenever γ is 0.
This is an online version of the software PHYLOViZ, a software that allows
the analysis of sequence-based typing methods that generate allelic profiles and their associated
epidemiological data. The main purpose was to give an user-friendly solution for data analysis and sharing
without installing any specific software.
It is a bioinformatics free web-based suite that deals with primary data
(reads) towards the automatic generation of the output data that are actually the core first-line
“genetic requests” for effective and timely influenza laboratory surveillance. Data
integration is continuously scalable, fitting the need for a real-time epidemiological surveillance
during the flu epidemics.
As an MTiH student, I presented a poster
of the thesis and
talked about past academic achievements. I was still able to suggest new ways of
improving this European program, powered by EIT Health. Bursary awarded.
Selected to participate in a 3-day AMR Hackathon, organized by EIT Health Germany. Participants were divided in teams of 6/7 elements that would analyze AMR
situation in a different EU country. Bursary awarded.