MPI-INF Logo
Campus Event Calendar

Event Entry

What and Who

What PCA and Consensus Ensemble Clustering reveal about Human Migrations and Breast Cancer.

Prof. Gyan Bhanot
BioMaPS Institute and Biomedical Engineering, Rutgers University & Cancer Institute of New Jersey
AG3 Talk
AG 1, AG 2, AG 3, AG 4, AG 5, SWS, RG1, RG2  
Public Audience
English

Date, Time and Location

Wednesday, 24 October 2007
17:00
60 Minutes
E1 4
024
Saarbrücken

Abstract

Human mtDNA phylogeny is derived from 1737 complete sequences using a
new, direct method based on principal component analysis (PCA) and
unsupervised consensus ensemble clustering. Our method makes no a-priori
assumptions, is fast, stable to sample perturbation, uses all
significant polymorphisms in the data, works for arbitrary sample sizes,
avoids sample choice and haplogroup size bias and gives a tree with 90%
consensus accuracy on internal branches. It recreates the known
phylogeny of the N, M, L0/L1, L2, L3 clades, confirming the African
origin of modern humans and showing that the M and N clades arose in
almost coincident migrations. However, the N clade haplogroups are split
along an East-West geographic divide, with a “European R-clade”
containing the haplogroups H, V, H/V, J, T, U and a “Eurasian N
sub-clade” including haplogroups B, R5, F, A, N9, I, W and X. The
haplogroup pairs (N9a, N9b) and (M7a, M7b) within N and M are placed in
non-nearest locations in agreement with their expected large TMRCA from
studies of their migrations into Japan. For comparison, we also
construct consensus maximum likelihood, parsimony, neighbor joining and
UPGMA based trees using the same polymorphisms. We find that these
methods give consistent results only for the clade tree. For many recent
branches, the consensus accuracy for these methods is only in the range
of 1% to 20%. From a comparison of robust polymorphisms between
Chimp/Bonobo and all haplogroups, and assuming a Human-Chimp coalescence
time of 5 Million years, we infer the time to human mtDNA coalescence to
be 206 +/- 14 KYBP.
Similar techniques applied to human breast cancer microarray data
identified 8 distinct subtypes of disease. In HER2+ breast cancers, we
find two subtypes, one of which has an improved survival correlated with
a strong upregulation of immunoglobulins, suggesting a lymphocytic
infiltrate, verified by histopathology. Potential consequences of this
discovery for chemotherapy will also be discussed.

Contact

--email hidden
passcode not visible
logged in users only

Ruth Schneppen-Christmann, 10/22/2007 09:25 -- Created document.