You are here

Invited Paper Session Abstracts - Big Ideas About Big (and Less Than Big) Data

Thursday, July 27, 2:00 p.m. - 5:00 p.m., Continental Ballroom B

Data analytics is a growing field, with graduate degrees, undergraduate majors and minors, and concentrations popping up at colleges and universities around the country. Data analysis impacts our lives broadly from predictions of movie rankings on Netflix to targeted marketing by retailers, to name two of many applications. The landscape of data science is broad. The ideas of the field can be applied using smaller datasets from a biometric device like a Fitbit or iWatch to large datasets in finance or health care. This session will sample areas of data science from a variety of applications, calling on various topics in mathematics such as graph theory and linear algebra, as well as statistical modelling. The session will also include presenters from government, academia and business demonstrating the inherent interdisciplinarity of studying big and less than big data.

Tim Chartier, Davidson College
Jennifer Galovich, St. John's University and the College of St. Benedict

Know Thyself: Introspective Personal Data Mining

Talithia Williams, Harvey Mudd College

The leading edge breed of high-tech, wearable health technology is changing how we monitor personal data. We can quantify everything from heart rate and sleep patterns to body temperature and sex life. But, what is the average person to do with the massive amounts of data being collected? This talk makes a compelling case that all of us should be recording simple data about our bodies and will help you begin to analyze and understand your body’s data. Surprisingly, your own data can reveal much more than even your doctors may know!

Using Big and Less-Than-Big Data Sets in Public Health

Martin I. Meltzer, Ph.D., Health Economics and Modeling Unit (HEMU), Division of Preparedness and Emerging Infections, National Center for Emerging and Zoonotic Infectious Diseases, Centers for Disease Control and Prevention (CDC)

The use of different types of databases in public health to estimate the potential burden of disease and impact of interventions, will be illustrated by considering the following published examples: The use, and attendant problems, of large healthcare insurance databases to assess the risks of specific disease-related physician visits, hospitalizations and deaths. These databases will be contrasted by reviewing published papers that demonstrate the relative paucity of relevant epidemiological data during the 2009 H1N1 influenza pandemic and the 2014-2016 Ebola epidemic in West Africa. The overall conclusion that will be illustrated is that public health policy makers cannot assume that the relevant data will be available, requiring analysts to use a wide variety of data bases.

Let Me See Your Papers: Using Real-Time Network Graph Traversal to Uncover Suspicious Offshore Activity

Abhishek Mehta, Tresata

As the biggest data leak in history, the release of the Panama Papers rocked the world in 2016, instigating a slew of criminal investigations and most notably leading to the resignation of Iceland’s Prime Minister. The International Consortium of Investigative Journalists (ICIJ) made the database associated with the Panama Papers publicly available shortly thereafter. Using OPTIMUS, Tresata's Analytics Operating system, we decided to conduct some investigations of our own – scrutinizing entities within the dataset (in real-time) at a unique segment of one, discovering their associations, and seeing which interactions were above board.

Toward Unsupervised Learning for Social Media Using Linear Algebra

Michael Berry, University of Tennessee, Knoxville

In large-scale text mining applications such as tweet classification there is need for fast yet robust techniques to summarize or track concepts without prior knowledge of the content. Linear algebra plays a very important role in the design and implementation of the underlying algorithms needed for the automated summarization of time-sensitive documents, especially those from social media. Matrix and tensor factorization methods can greatly facilitate the extraction of key documents (tweets) that can summarize a current stream and thereby reduce the exhaustive human effort that would be needed to read and synthesize an enormous number of documents. The long term goal of this research is to develop the core numerical algorithms and software needed for unsupervised learning when no prior labels or metadata is available.

Finding and Telling Data Stories

Dash Davidson, Tableau Software

Hidden in any dataset, from the largest to the smallest, are stories – most often, many of them. In this session you will see how you can employ several different analytical techniques to draw these stories out of your data. Through combining visual analysis with storytelling, you will learn how to bring even the simplest of datasets to life in a compelling way.

Creating Partnerships with Industry and Finding Data Analytics Problems for Students

Michael Dorff, Brigham Young University

Suppose you wanted to develop partnerships with people in business, industry, or governments (BIG) to get research problems, many of which are data analytics problems, for your students to work on and be better prepared for careers in BIG. How would you make these partnerships? How would you get research problems from industry? What would those problems look like? Answers to those questions can come through the PIC Math program. PIC Math is a MAA/SIAM supported program funded by NSF to prepare mathematical sciences undergraduate students for industrial careers by engaging them in research problems from industry. In this talk, we will discuss how faculty members like you (many of whom have no experience in applied math or in BIG) develop partnerships with people in industry, get data analytics research problems for their students as a result of these partnerships, and what these problems look like.