All Seminars

Title: Scalable Computational pathology: From Interactive to Deep Learning
Defense: Dissertation
Speaker: Michael Nalisnik of Emory University
Contact: Lee Cooper, lee.cooper@emory.edu
Date: 2017-03-30 at 10:00AM
Venue: W306
Download Flyer
Abstract:
Advances in microscopy imaging and genomics have created an explosion of patient data in the pathology domain. Whole-slide images of histologic sections contain rich information describing the diverse cellular elements of tissue microenvironments. These images capture, in high resolution, the visual cues that have been the basis of pathologic diagnosis for over a century. Each whole-slide image contains billions of pixels and up to a million or more microanatomic objects whose appearances hold important prognostic information. Combining this information with genomic and clinical data provides insight into disease biology and patient outcomes. Yet, due to the size and complexity of the data, the software tools needed to allow scientists and clinicians to extract insight from these resources are non-existent or limited. Additionally, current methods utilizing humans is highly subjective and not repeatable. This work aims to address these shortcomings with a set of open-source computational pathology tools which aim to provide scalable, objective and repeatable classification of histologic entities such as cell nuclei.\\ \\ We first present a comprehensive interactive machine learning framework for assembling training sets for the classification of histologic objects within whole-slide images. The system provides a complete infrastructure capable of managing the terabytes worth of images, object features, annotations and metadata in real-time. Active learning algorithms are employed to allow the user and system to work together in an intuitive manner, allowing the efficient selection of samples from unlabeled pools of objects numbering in the hundreds of millions. We demonstrate how the system can be used to phenotype microvascular structures in gliomas to predict survival, and to explore the molecular pathways associated with these phenotypes. Quantitative metrics are developed to describe these structures.\\ \\ We also present a scalable, high-throughput, deep convolutional learning framework for the classification of histologic objects is presented. Due to its use of representation learning, the framework does not require the images to be segmented, instead learning optimal task-specific features in an unbiased manner. Addressing scalability, the graph-based, parallel architecture of the framework allows for the processing of large whole-slide image archives consisting of hundreds of slides and hundreds of millions of histologic objects. We explore the efficacy of various deep convolutional network architectures and demonstrate the system's capabilities classifying cell nuclei in lower grade gliomas.
Title: The Artin-Schreier Theorem in Galois Theory
Defense: Honors Thesis
Speaker: Yining Cheng of Emory University
Contact: TBA
Date: 2017-03-30 at 1:00PM
Venue: MSC W303
Download Flyer
Abstract:
We first list and state some basic definitions and theorems of the Galois theory of finite extensions, as well as state and prove the Kummer theory and the Artin-Schreier extensions as prerequisites. The main part of this thesis is the proof of the Artin-Schreier Theorem, which states that an algebraic closed field having finite extension with its subfield F has degree at most two and F must have characteristic 0. After the proof, we will discuss the applications for the Artin-Schreier Theorem.
Title: Theory for New Machine Learning Problems and Applications
Seminar: N/A
Speaker: Yingyu Liang of Princeton University
Contact: James Lu, jlu@mathcs.emory.edu
Date: 2017-03-30 at 4:00PM
Venue: MSC W201
Download Flyer
Abstract:
Machine learning has recently achieved great empirical success. This comes along with new challenges, such as sophisticated models that lack rigorous analysis, simple algorithms with practical success on hard optimization problems, and handling large scale datasets under resource constraints. In this talk, I will present some of my work in addressing such challenges.\\ \\This first part of the talk focuses on learning semantic representations for text data. Recent advances in natural language processing build upon the approach of embedding words as low dimensional vectors. The fundamental observation that empirically justifies this approach is that these vectors can capture semantic relations. A probabilistic model for generating text is proposed to mathematically explain this observation and existing popular embedding algorithms. It also reveals surprising connections to classical notions such as Pointwise Mutual Information, and allows to design novel, simple, and practical algorithms for applications such as sentence embedding.\\ \\In the second part, I will describe my work on distributed unsupervised learning over large-scale data distributed over different locations. For the prototypical tasks clustering, Principal Component Analysis (PCA), and kernel PCA, I will present algorithms that have provable guarantees on solution quality, communication cost nearly optimal in key parameters, and strong empirical performance. \\ \\Bio: Yingyu Liang is an associate research scholar in the Computer Science Department at Princeton University. His research interests are providing rigorous analysis for machine learning models and designing efficient algorithms for applications. He received a B.S. in 2008 and an M.S. in 2010 in Computer Science from Tsinghua University, and a Ph.D. degree in Computer Science from Georgia Institute of Technology in 2014. He was a postdoctoral researcher in 2014-2016 in the Computer Science Department at Princeton University.
Title: Protecting Locations of Individual Movement under Temporal Correlations
Defense: Dissertation
Speaker: Yonghui Xiao of Emory University
Contact: Yonghui Xiao, yonghui.xiao@emory.edu
Date: 2017-03-29 at 11:30AM
Venue: Chemistry 320
Download Flyer
Abstract:
Concerns on location privacy frequently arise with the rapid development of GPS enabled devices and location-based applications. In this dissertation, we study how to protect the locations of individual movement under temporal correlations. First, we propose three types of privacy notions, location privacy, customizable privacy, and the privacy for spatiotemporal events. Location privacy is used to protect the true location of a user at each timestamp; Customizable privacy means the user can configure personalized privacy notions depending different demand; Privacy for spatiotemporal events needs to be specially preserved because even if location privacy is guaranteed at each timestamp the spatiotemporal events can still be exposed. Second, we investigate how to preserve these privacy notions. We show that the traditional $\ell_{1}$-norm sensitivity in differential privacy exaggerates the real sensitivity, and thus leads to too much noise in the released data. Hence we study the real sensitivity, called sensitivity hull, for the data release mechanism. Then we design the optimal location release mechanism for location privacy. We show that the data release mechanism has to be dynamically updated for the customizable privacy to guarantee the privacy is protectable, which is measured by a notion of degree of protection. To protect the spatiotemporal events we study how to derive the probability of the spatiotemporal events for arbitrary initial probabilities of adversaries. Then we check whether to release the location at each timestamp to bound the risk of the spatiotemporal events. Third, we implement these algorithms on real-world datasets to demonstrate the efficiency and effectiveness.
Title: A Study of Benford's Law for the Values of Arithmetic Functions
Defense: Honors
Speaker: Letian Wang of Emory University
Contact: Letian Wang, letian.wang@emory.edu
Date: 2017-03-29 at 1:00PM
Venue: MSC E408
Download Flyer
Abstract:
"Benford's Law characterizes the distribution of initial digits in large datasets across disciplines. Since its discovery by Simon Newcomb in 1881, Benford's Law has triggered tremendous studies. In this paper, we will start by introducing the history of Benford's Law and discussing in detail the explanations proposed by mathematicians on why various datasets are Benford. Such explanations include the Spread Hypothesis, the Geometric, the Scale-Invariance, and the Central Limit explanations. "To rigorously de ne Benford's Law and to motivate criteria for Benford sequences, we will provide fundamental theorems in uniform distribution modulo 1 in Chapter 2. We will state and prove criteria for checking uniform distribution, including Weyl's Criterion, Van der Corput's Di erence Theorem, as well as their corollaries.\\ \\"In Chapter 3, we will introduce the logarithm map, which allows us to reformulate Benford's Law with uniform distribution modulo 1 studied earlier. We will start by examining the case of base 10 only and then generalize to arbitrary bases. "Finally, we will elaborate on the idea of good functions. We will prove that good functions are Benford, which in turn enables us to nd a new class of Benford sequences. We will use this theorem to show that the partition function p(n) and the factorial sequence n! follow Benford's Law."
Title: Utility-cost of provable privacy: A case study on US Census data.
Seminar: Computer Science
Speaker: Ashwin Machanavajjhala of Duke University
Contact: Li Xiong, lxiong@emory.edu
Date: 2017-03-29 at 4:00PM
Venue: MSC W303
Download Flyer
Abstract:
Privacy is an important constraint that algorithms must satisfy when analyzing sensitive data from individuals. Differential privacy has revolutionized the way we reason about privacy, and has championed the need for data analysis algorithms with provable privacy guarantees. Differential privacy and its variants have arisen as the gold standard for exploring the tradeoff between the privacy ensured to individuals and the utility of the statistical insights mined from the data, and are in use by many commercial (e.g., Google and Apple) and government entities (e.g., US Census) for collecting and sharing sensitive user data.\\ \\ In today's talk I will highlight key challenges in designing differentially private algorithms for emerging applications, and highlight research from our group that try to address these challenges. In particular I will describe our recent work on modernizing the data publication process for a US Census Bureau data product, called LODES/OnTheMap. In this work, we identified legal statutes and their current interpretations that regulate the publication of LODES/OnTheMap data, formulated these regulations mathematically, and designed algorithms for releasing tabular summaries that provably ensured these privacy requirements. Our solutions are able to release summaries of the data with error comparable or even better than current releases (which are not provably private), for reasonable settings of privacy parameters.\\ \\ Bio: Ashwin Machanavajjhala is an Assistant Professor in the Department of Computer Science, Duke University. Previously, he was a Senior Research Scientist in the Knowledge Management group at Yahoo! Research. His primary research interests lie in algorithms for ensuring privacy in statistical databases and augmented reality applications. He is a recipient of the National Science Foundation Faculty Early CAREER award in 2013, and the 2008 ACM SIGMOD Jim Gray Dissertation Award Honorable Mention. Ashwin graduated with a Ph.D. from the Department of Computer Science, Cornell University and a B.Tech in Computer Science and Engineering from the Indian Institute of Technology, Madras.
Title: On Saturation Spectrum
Defense: Dissertation
Speaker: Jessica Fuller of Emory
Contact: Jessica Fuller, jfulle@emory.edu
Date: 2017-03-28 at 2:45PM
Venue: MSC E406
Download Flyer
Abstract:
Given a graph H, we say a graph G is H-saturated if G does not contain H as a subgraph and the addition of any edge not already in G results in H as a subgraph. The question of the minimum number of edges of an H saturated graph on n vertices, known as the saturation number, and the question of the maximum number of edges possible of an H -saturated graph, known as the Turán number, has been addressed for many different types of graphs. We are interested in the existence of H -saturated graphs for each edge count between the saturation number and the Turán number. We determine the saturation spectrum of (Kt-e)-saturated graphs and Ft-saturated graphs. Let (Kt-e) be the complete graph minus one edge. We prove that (Kt-e)-saturated graphs do not exist for small edge counts and construct (Kt-e)-saturated graphs with edge counts in a continuous interval. We then extend the constructed (Kt-e)-saturated graphs to create (Kt-e)-saturated graphs. Let Ft be the graph consisting of t edge-disjoint triangles that intersect at a single vertex v. We prove that F2-saturated graphs do not exist for small edge counts and construct a collection of F2-saturated graphs with edge counts in a continuous interval. We also establish more general constructions that yield a collection of Ft-saturated graphs with edge counts in a continuous interval.
Title: Finite index for arboreal Galois representations
Seminar: Algebra
Speaker: Andrew Bridy of Texas A and M
Contact: David Zureick-Brown, dzb@mathcs.emory.edu
Date: 2017-03-28 at 4:00PM
Venue: MSC W201
Download Flyer
Abstract:
Let K be a global field of characteristic 0, let f in $K(x)$ and b in K, and set $K_n := K(f^{-n}(b))$. The projective limit of the groups $Gal(K_n/K)$ embeds in the automorphism group of an infinite rooted tree. A difficult problem is to find criteria that guarantee the index is finite; a complete answer would give a dynamical analogue of Serre's famous open image theorem. When f is a cubic polynomial over a function field, I prove a set of necessary and sufficient conditions for finite index (for number fields, the proof is conditional on Vojta's conjecture). This is joint work with Tom Tucker.
Title: Compositional Models for Information Extraction
Seminar: N/A
Speaker: Mark Dredze of Johns Hopkins University
Contact: Eugene Agichtein, eugene@mathcs.emory.edu
Date: 2017-03-27 at 4:00PM
Venue: White Hall 207
Download Flyer
Abstract:
Information extraction systems are the backbone of many end-user applications, including question answering, web search and clinical text analysis. These applications depend on underlying technologies that can identify entities and relations as expressed in natural language text. For example, Amazon Echo may answer a user question based on a relation extracted from a news article. A clinical decision support system may offer a physician suggestions based on a symptom identified in the clinical notes from a previous patient visit. In political science, we may seek to aggregate opinions expressed in public comments about a new public policy. Advances in machine learning have led to new neural models for learning effective representations directly from data that improve information extraction tasks. Yet for many tasks, years of research have created hand-engineered features that yield state of the art performance. I will present feature-rich compositional models that combine both hand-engineered features with learned text representations to achieve new state-of-the-art results for relation extraction. These models are widely applicable to problems within natural language processing and beyond. Additionally, I will survey how these models fit into my broader research program by highlighting work by my group on developing new machine learning methods for extracting public health information from clinical and social media text.
Title: TBD
Seminar: N/A
Speaker: TBD of
Contact: TBA
Date: 2017-03-23 at 0:00AM
Venue: TBA
Download Flyer
Abstract: