All Seminars

Title: Novel Geometric Algorithms for Machine Learning Problems
Colloquium: N/A
Speaker: Hu Ding of State University of New York at Buffalo
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2015-02-27 at 3:00PM
Venue: MSC W303
Download Flyer
Abstract:
Machine learning is a discipline that concerns the construction and study of algorithms for learning from data, and plays a critical role in many other fields, such as computer vision, speech recognition, social network, bioinformatics, etc. As the data scale increases dramatically in the big-data era, a number of new challenges arise, which require new ideas from other areas. In this talk, I will show that such challenges in a number of fundamental machine learning problems can be resolved by exploiting their geometric properties. Particularly, I will present three geometric-algorithm-based results for various machine learning problems: (1) a unified framework for a class of constrained clustering problems in high dimensional space; (2) a combinatorial algorithm for support vector machine (SVM) with outliers; and (3) algorithms for extracting chromosome association patterns from a population of cells. The first two results are for fundamental problems in machine learning, and the last one is for studying the organization and dynamics of the cell nucleus, an important problem in cell biology. Some geometric-algorithm-based future work in machine learning will also be discussed.
Title: The Foxby-morphism and derived equivalences
Seminar: Algebra
Speaker: Satya Mandal of University of Kansas
Contact: David Zureick-Brown, dzb@mathcs.emory.edu
Date: 2015-02-26 at 4:00PM
Venue: W306
Download Flyer
Abstract:
Suppose $X$ is a quasi-projective scheme over a noetherian (Cohen-Macaulay) affine scheme $Spec(A)$, with $dim X=d$. In $K$-theory and related areas (Witt theory, Grothendieck-Witt theory), bounded chain complexes $G_{\bullet}$ of Coherent sheaves or locally free sheaves play an important role. One considers the category $Ch^b(Coh(X))$ (resp. $Ch^b(V(X))$) of bounded chain complexes of coherent sheaves (resp. of locally free sheaves). One also considers, the corresponding derived categories $D^b(Coh(X)$, $D^b(V(X))$, which is obtained by inverting the quasi-isomorphisms in the chain complex categories. \vspace{4pt} Given a chain complex map $L_{\bullet}\to G_{\bullet}$, between two complexes $L_{\bullet}$, $G_{\bullet}$, with extra information on homologies, one complex can be viewed as \emph{an approximation to} the other. Given one such complex $G_{\bullet}$, constructing such a complex $L_{\bullet}$, with desired properties, and constructing a map $L_{\bullet}\to G_{\bullet}$ would be challenging. In the affine case $X=Spec(A)$, such a map was constructed by Hans-Bjorn Foxby (unpublished), several other versions of the same was given by others. In this lecture we implement the construction of Foxby to quasi-affine case and give applications. Intuitively, one can look at this implementation as a "graded" version of Foxby's construction.
Title: Scalable Big Graph Data Processing
Colloquium: N/A
Speaker: Kisung Lee of Georgia Institute of Technology
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2015-02-26 at 4:00PM
Venue: MSC W303
Download Flyer
Abstract:
The application of graph data analytics is virtually unlimited because graph data are everywhere, from the friendship graphs of social networks to networks of the human brain. Even though graph data analytics is essential for gaining insight into big graphs, large-scale graph processing is complex because of its graph-specific challenges, including complicated correlations among data entities, highly skewed distribution, various graph operations, and the sheer enormity of graph data. This presentation will focus specifically on two new distributed systems for the scalable processing of big graph data. I will first present a graph system that efficiently supports graph pattern query processing (subgraph matching) by scalable graph partitioning and efficient distributed query processing. I will then describe a distributed system for iterative graph computations that can reduce memory requirements for running iterative graph algorithms while ensuring competitive performance. I will conclude the presentation by introducing a set of challenges for developing a general purpose graph analytics system that can support both efficient graph query processing and fast iterative graph computations under one unified system architecture.\\ \\ Bio: Kisung Lee is a Ph.D. candidate in the School of Computer Science at Georgia Tech. His research interests lie in the intersection of big data systems and distributed computing systems. Kisung has also worked on research problems in spatial data management and social network analytics. He has been a recipient of the best paper awards of IEEE Cloud 2012 and MobiQuitous 2014. Kisung received his B.S. and M.S. degrees in computer science from KAIST and has served as a reviewer for several conferences and journals.
Title: High Performance Spatial and Spatio-temporal Data Management
Colloquium: N/A
Speaker: Suprio Ray of University of Toronto
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2015-02-24 at 4:15PM
Venue: MSC W303
Download Flyer
Abstract:
The rapid growth of spatial data volume and technological trends in storage capacity and processing power are fuelling many emerging spatial and spatio-temporal applications from a wide range of domains. Spatial join is widely used by many of the emerging spatial data analysis applications. However, spatial join processing on even a moderate sized dataset is very time consuming. At the same time, there is a rapid expansion in available processing cores, through multicore machines and Cloud computing. The confluence of these trends points to a need for effective parallelization of spatial query processing. Unfortunately, traditional parallel spatial databases are ill-equipped to deal with the performance heterogeneity that is common in the Cloud. In this talk I present two systems that I developed to parallelize spatial join queries in the Cloud and in a large main memory multicore machine.\\ \\ With the proliferation of GPS-enabled mobile devices and sensors, Location Based Services (LBS) have become the most prominent among the spatio-temporal applications. These applications are characterized by high rate of location updates and many concurrent short running range queries. With a large number of devices, the rate of location update can easily surpass 1 million or more updates per second. Traditional relational databases may not be well-suited for this. As the era of "Internet of things" approaches, this issue is expected to get accentuated. In my talk I present a parallel in-memory spatio-temporal indexing technique to support the demands of LBS workloads. Our system achieves significantly better performance than existing approaches to indexing spatio-temporal data. I also present a parallel in-memory spatio-temporal topological join approach. Finally, I outline some of my ideas for future research.
Title: Taming the Big Data Elephant with Query Explanations
Colloquium: N/A
Speaker: Sudeepa Roy of University of Washington
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2015-02-23 at 4:00PM
Venue: MSC W303
Download Flyer
Abstract:
In recent years, the availability of big data has resulted in a growing number of users from a variety of backgrounds interested in identifying and interpreting the general trends and anomalies of large datasets. This presents an imminent requirement of sophisticated data analysis tools that can provide qualitative information based on query answers on such datasets. In this talk, I will describe my current research on developing a principled framework for explaining query answers in terms of intervention (explanations are changes in the database that change the observed query answers). I will present our solutions to core challenges in this task such as obtaining concise descriptions of explanations, handling inherent dependencies of database tuples, and achieving real-time efficiency in large explanation spaces. Then, I will briefly talk about my research in the areas of probabilistic databases, provenance, information extraction, and crowd sourcing. The unifying theme of this research is to address defining characteristics of modern datasets: uncertainty, unreliability, lack of structure, and the effects of human participation. I will conclude with my long-term vision of incorporating techniques to handle these challenges in the generic data explanation framework.
Title: Looking for Structure in Real-World Networks
Seminar: Computer Science
Speaker: Blair Sullivan of Department of Computer Science North Carolina State University
Contact: Michele Benzi, benzi@mathcs.emory.edu
Date: 2015-02-17 at 4:00PM
Venue: W306
Download Flyer
Abstract:
Graphs offer a natural representation of relationships within data -- for example, edges can be defined based on any user-defined measure of similarity (e.g. word frequencies, geographic proximity of observation, gene expression levels, or overlap in sample populations) or interaction (e.g. social friendship, communication, chemical bonds/protein bindings, or migration). As such, network analysis is playing an increasingly important role in understanding the data collected in a wide variety of social, scientific, and engineering settings. Unfortunately, efficient graph algorithms with guaranteed performance and solution quality are impossible in general networks (according to computational complexity).\\ \\ One tantalizing approach to increasing scalability without sacrificing accuracy is to employ a suite of powerful (parameterized) algorithms developed by the theoretical computer science community which exploit specific forms of sparse graph structure to drastically reduce running time. The applicability of these algorithms, however, is unclear, since the (extensive) research effort in network science to characterize the structure of real-world graphs has been primarily focused on either coarse, global properties (e.g., diameter) or very localized measurements (e.g., clustering coefficient) -- metrics which are insufficient for ensuring efficient algorithms.\\ \\ We discuss recent work on bridging the gap between network analysis and structural graph algorithms, answering questions like: Do real-world networks exhibit structural properties that enable efficient algorithms? Is it observable empirically? Can sparse structure be proven for popular random graph models? How does such a framework help? Are the efficient algorithms associated with this structure relevant for common tasks such as evaluating communities, clustering and motifs? Can we reduce the (often super-exponential) dependence of these approaches on their structural parameters? Joint work with E. Demaine, M. Farrell, T. Goodrich, N. Lemons, F. Reidl, P. Rossmanith, F. Sanchez Villaamil and S. Sikdar.
Title: Lattice point counting and the Hodge theory of degenerating hypersurfaces
Seminar: Algebra
Speaker: Eric Katz of Waterloo
Contact: David Zureick-Brown, dzb@mathcs.emory.edu
Date: 2015-02-17 at 4:00PM
Venue: W304
Download Flyer
Abstract:
Geometric properties of generic hypersurfaces in projective toric varieties are often determined by the combinatorics of their corresponding Newton polytopes, in particular, by the lattice point enumeration of dilates of the Newton polytope. Pioneering work of Danilov-Khovanskii gave combinatorial descriptions for certain topological and Hodge theoretic invariants in terms of combinatorics. In joint work with Alan Stapledon, we outline an alternative approach. Here, we degenerate the hypersurface into a union of linear subspaces and use the limit mixed Hodge structure to understand the cohomology. In addition, we discuss a theory of the combinatorics of subdivisions of polytopes to understand invariants of degenerating families of hypersurfaces.
Title: Data-Intensive Scientific Discovery in the Big Data Era
Colloquium: N/A
Speaker: James Faghmous of University of Minnesota
Contact: Vaidy Sunderam, vss@emory.edu
Date: 2015-02-13 at 3:00PM
Venue: MSC W303
Download Flyer
Abstract:
Data science has become a powerful tool to extract knowledge from the large data. However, despite massive data growth in the sciences, it remains unclear whether Big Data can lead to scientific breakthroughs. I will introduce a new knowledge discovery paradigm -- theory-guided data science -- that brings together novel data analysis methods and powerful scientific theory to extract knowledge from complex spatio-temporal data. The principles of this paradigm will be demonstrated with a data mining application to monitor the global ocean system.\\ \\ Bio:\\ James Faghmous is a Research Associate at the University of Minnesota where he develops new data science methods for data-intensive scientific discovery. In 2015, James received an inaugural NSF CRII Award for junior faculty and his doctoral dissertation received the "Outstanding Dissertation Award" in Science and Engineering at the University of Minnesota. James received his Ph.D. from the University of Minnesota in 2013 where he was part of a 5-year \$10M NSF Expeditions in Computing project to understand climate change from data. He graduated Magna Cum Laude in 2006 with a B.Sc. in computer science from the City of College of New York where he was a Rhodes and a Gates Scholar nominee.
Title: Text Analytics-from small to BIG-Challenges and Ideas
Seminar: Computer Science
Speaker: John Kuriakose of Infosys Labs
Contact: Jinho Choi, choi@mathcs.emory.edu
Date: 2015-02-06 at 3:00PM
Venue: MSC W303
Download Flyer
Abstract:
This talk will explore issues and challenges faced in Text Analytics through the lens of real-world use-cases. I will show a demo of our existing News analytics system that leverages Entity and Event extraction and then describe 5 major challenges that we want to address.
Title: Pencils of quadrics and the arithmetic of hyperelliptic curves
Colloquium: Number Theory
Speaker: Jerry Wang of Princeton University
Contact: David Borthwick, davidb@mathcs.emory.edu
Date: 2015-02-05 at 4:00PM
Venue: MSC W303
Download Flyer
Abstract:
Finding integral and rational solutions to polynomial equations with integer coefficients has always been a fascinating subject to mathematicians. In this talk we will look at the hyperelliptic equations y^2 = f(x) and discuss how many solutions they have typically. There has been several results on this recently by Manjul Bhargava and his collaborators via the study of rational orbits of certain representations of reductive groups and by applying the techniques of geometry of numbers to count these orbits. We will discuss our recent joint work with Manjul Bhargava and Benedict Gross on solutions to the hyperelliptic equations over odd degree field extensions of Q and see how the geometry of pencils of quadrics plays a pivotal role in this work.