Ananda Mondal
Knight Foundation School of Computing at Florida International University (FIU), USA
Title: Discovery of long non-coding rna biomarkers for breast cancer Subtypes using machine learning
Biography
Biography: Ananda Mondal
Abstract
Statement of the Problem: The subtype of breast cancer dictates the choice of treatments. Researchers have reported that the genes responsible for breast cancer initiation and development are regulated by cis-regulatory elements such as long non-coding RNAs (lncRNAs). However, there have not been any studies to discover the biomarker lncRNAs specific to five breast cancer subtypes, including Basal, HER2, Luminal A, Luminal B, and Normal-like. This study aims to identify subtype-specific lncRNA biomarkers with clinical outcomes that might help develop appropriate cancer therapy.
Data and Methodology: The expression profiles of lncRNAs for breast cancer patients from The Cancer Genome Atlas (TCGA) were analyzed to discover the biomarkers. We proposed a simultaneous feature selection and classification approach for a multiclass problem combining recursive feature elimination (RFE) and l1-norm multiclass Support Vector Machine (L1MSVM), thus calling it RL1MSVM. The newly proposed model performs better than two state-of-the-art models, L1MSVM and Random Forest (RF), in selecting subtype-specific lncRNA biomarkers.
Results: A total of 196 lncRNAs, the optimum number of features based on RL1MSVM, were selected using all three methods for comparison. Finally, a stable set of 91 key lncRNAs was obtained using the union of the intersections of the two sets selected by two of the three approaches. Of 91 lncRNAs, 53 were previously identified, and the remaining 38 are novel. Significance: The novel and known key lncRNAs can augment breast cancer subtype-specific targeted therapy.