Considerable effort has been devoted during the recent
five years to identify gene expression signatures that
predict aggressiveness and outcome of cancer at the time of its discovery. In breast cancer, different groups used different cohorts of patients and different DNA microarrays to produce short-lists of predictive genes,
and reported high success rates. Unfortunately, the predictive lists
found by different groups had very few genes in common.
I will review some of this work, point out problematic aspects of it and present PAC-ranking, a method designed to estimate the number of training samples needed to produce a robust predictive gene list. If time permits, I will describe briefly an ongoing study of colon cancer, where the machine - learning approach taken in the studies of breast cancer was replaced by one that fucuses on the underlying biology. |
![]() |