Data Science/Computational Social Science Seminars winter roster announced

Tuesday, 01/12/2021

The University of Michigan Data Science / Computational Social Science (DS/CSS) faculty have announced the speakers for their Winter 2021 seminar series.

The Data Science/Computational Social Science seminar series brings together a vibrant and diverse community of scholars whose cutting-edge research in Information Science, Computer Science, or the Social Sciences aims to broaden our understanding of the important social and technological issues.

The events are scheduled for Thursdays at noon ET. Due to current in-person gathering restrictions, all seminar talks will be held online via Zoom.

Organizer for the Winter 2021 series is Paramveer Dhillon, assistant professor of information at the School of Information.

Topics and abstracts will be announced separately, closer to each event.

The Winter 2021 seminar schedule of speakers:

Jan. 21: Veronica Perez-Rosas, University of Michigan - Link to event page

Developing Natural Language Processing Tools for Enhanced Psychotherapy

Abstract:

In recent years there has been an increasing need for psychotherapy to address a wide variety of behavioral and mental health issues. Meeting this need has become a significant challenge as the current mental health workforce is unable to cope with the demand as a new counselor’s training relies heavily on human supervision and interaction. In this talk, I will present work towards developing NLP tools that can provide support and supervision during counseling interactions in a timely and scalable fashion. In particular, I will describe a counseling dialog system that provides language feedback to counseling trainees using the pretrained transformer architecture and context augmentation techniques inspired by traditional strategies used during counseling training.

Jan. 28: Ingmar Weber, Qatar Comp. Research Institute - Link to event page

Using Advertising Data to Monitor International Development

Abstract:

Most of the big internet companies, such as Facebook, Google or Twitter, generate their revenue from targeted advertising. To offer advertisers with advanced targeting capabilities, these companies collect large amounts of user data to build elaborate profiles. Based on these profiles an advertiser can then choose to target only, say, female Facebook users living in Doha, Qatar who are aged 25-29, who used to live in the Philippines, who have a self-declared university degree, and who use an iOS device to access Facebook.

To help advertisers in planning their advertising campaigns and the related budget needs, the advertising platforms provide so-called audience estimates on how many of their users match the provided targeting criteria. In the example above, Facebook estimates that there are 3,100 monthly active matching users (as of January 23, 2021). In this talk I’ll describe how, in close collaboration with different UN agencies, we’re tapping into these audience estimates to (i) monitor international migration, (ii) track digital gender gaps, and (iii) map wealth inequalities. We consistently find that, despite fake profiles, and noise in the inference algorithms, data derived from the advertising platforms can provide valuable information that is complementary to other data sources. At the same time, our work shows the risk of identifying vulnerable groups, rather than individuals, which is often not adequately considered in discussions focused on individual privacy.

Feb. 4: Munmun De Choudhury, Georgia Tech - Link to recording

Bridging Machine Learning and Collaborative Action Research: A Tale of Engaging with Three Stakeholders in Digital Mental Health

Abstract:

Digital traces, such as social media data, supported with advances in the artificial intelligence (AI) and machine learning (ML) fields, are increasingly being used to understand the mental health of individuals and populations. However, such algorithms do not exist in a vacuum -- there is an intertwined relationship between what an algorithm does and the world it exists in. Consequently, with algorithmic approaches offering promise to change the status quo in mental health for the first time since mid-20th century, interdisciplinary collaborations are paramount.

But what are some paradigms of engagement for AL/ML researchers that augment existing algorithmic capabilities while minimizing the risk of harm? This talk will describe the experiences from working with three different stakeholders in projects relating to digital mental health – first with a federal public health agency, second with healthcare providers, and third with a non-profit grassroots organization. The talk hopes to present some lessons learned by way of these engagements, and to reflect on approaches that go beyond technical innovations and building technological artifacts to contributions that center humans’ roles, beliefs, needs, and expectations within those innovations and artifacts.

Feb. 11: Andrew Guess, Princeton University - Link to event page

Does Social Influence Shape Online Political Expression?

Abstract:

Expressing opinions on social media has become a standard form of participation in the political process, but we know little about the factors that shape it. In this paper, we investigate the role of social context. Decades after the development of the canonical "Spiral of Silence" model, the public sphere has radically shifted toward a networked space mediated by social platforms. We articulate a theory of social influence in social media expression and test it by analyzing unique datasets linking U.S. survey respondents to their public Twitter accounts. To measure political expression, we develop and validate a supervised classifier of tweet-level ideology and apply it to respondents' tweets and the tweets of people they follow. We find that the ideology of Twitter followees' tweets is predictive of respondents' own expressed ideology on Twitter, even after holding constant self-reported ideological predispositions. Our findings demonstrate a powerful methodological approach for studying these dynamics.

Feb. 18: Susan Athey, Stanford University - Jointly hosted with MIDAS and SBEE seminars - Link to recording

Using Experiments and Observational Data for Policy Learning

Abstract:

In this talk, I will overview several recent papers about designing and analyzing experiments. I consider problems of designing and analyzing experiments with staggered rollouts; combining experimental and observational data to study long-term outcomes; and analyzing data from adaptive experiments.

Feb. 25: Jacob Eisenstein, Google Research - Link to recording

Computational Models of Language Variation and Change

Abstract:

The study of language variation and change offers unique opportunities for interdisciplinary collaboration. On one hand, computational methods can offer new insights to social science and the digital humanities through the analysis of large-scale corpora of text and metadata. Conversely, social science and the humanities provide theoretical frameworks that are essential for natural language processing algorithms to adequately address the diversity of human language. In this talk, I will describe two collaborations, one in each direction. First, I will present work in the digital humanities, in which a computational model of lexical semantic change leads to a network analysis that offers new perspectives on the movement to abolish slavery in the 19th century United States. Second, I will describe an effort to make natural language processing more robust to dialect variation by building linguistic characterizations of dialect features into automated classifiers through few-shot learning.

March 4: Sandra González-Bailón, University of Pennsylvania - Link to recording

Exposure to News in the Digital Age

Abstract:

The abundance of media options is a central feature of today’s information environment. The current media landscape is more decentralized than it ever was, and information is simultaneously flowing through many parallel channels (i.e., the web, social media, TV). In this talk I will discuss recent research in which we measure exposure to news across channels to (1) test claims of increasing audience fragmentation and ideological segregation and (2) measure the influence of automated accounts in distorting the salience of news sources on social media. Using an unprecedented combination of observed data from the US comprising a five-year time window and involving tens of thousands of panelists, I will show that co-exposure to diverse news is on the rise. And using social media data from two contentious political events in France and Spain, I will show that verified accounts are significantly more visible than unverified bots, and that discrepancies in source salience in social media and the web are generated by both human and bot activity. I will discuss the implications of these findings for how we think about the current communication environment, exposure to news, and ongoing attempts to limit the effects of misinformation, including social media verification policies.

March 11: Deen Freelon, University of North Carolina - Link to recording

Citation inequities in the social sciences: The case of Communication studies

Abstract:

Calls for equity across categories of race, gender, and national identity are nothing new in politics or the academy. Yet the social sciences continue to marginalize both research on and by members of such identity categories. To quantify these inequities, I analyze citation patterns from ten prominent journals in the field of Communication between 2000 and 2019. The data come from Web of Knowledge, and the analysis focuses on each author’s identity and professional characteristics, including race, gender, country of employment, and discipline. As this research is currently in progress, the talk will focus on the methods used to collect and preprocess the data. Also, a preliminary descriptive analysis of gender disparities in Communication citation practices will be presented. This research offers both a model methodology and an empirical baseline for measuring inequities in citation practices across disciplines.

March 18: Kristina Lerman, University of Southern California - Link to recording

Biased Data & Other Threats to Validity of Models

Abstract:

Data is often heterogeneous, generated by subgroups with different traits and behaviors. The correlations between the traits, behaviors, time, and how the data is collected, create dependencies that bias analysis. Models trained on biased data will make invalid inferences about individuals – what’s known as ecological fallacy. The inferences can also be unfair and discriminate against individuals based on their membership in protected groups. I describe common sources of bias in heterogeneous data, including Simpson’s paradox, survivor bias, and longitudinal data fallacy, showing that ignoring these sources of bias can dramatically alter conclusions and lead to wrong policy recommendations. I highlight with an example of COVID-19 pandemic to show that spatial aggregation of disease statistics exaggerates estimated growth rates. Finally, I describe a mathematical framework for de-biasing data that addresses these threats to validity of predictive models. The framework creates covariates that do not depend on sensitive features, such as gender or race, and can be used with any model to create fairer, unbiased predictions. The framework promises to learn unbiased models even in analytically challenging data environments.

March 25: Dashun Wang, Northwestern University - Link to recording

Initial Progress on the Science of Science

Abstract:

The increasing availability of large-scale datasets that trace the entirety of the scientific enterprise, have created an unprecedented opportunity to explore scientific production and reward. Parallel developments in data science, network science, and artificial intelligence offer us powerful tools and techniques to make sense of these millions of data points. Together, they tell a complex yet insightful story about how scientific careers unfold, how collaborations contribute to discovery, and how scientific progress emerges through a combination of multiple interconnected factors.

These opportunities—and challenges that come with them—have fueled the emergence of a multidisciplinary community of scientists that are united by their goals of understanding science. These practitioners of the science of science use the scientific methods to study themselves, examine projects that work as well as those that fail, quantify the patterns that characterize discovery and invention, and offer lessons to improve science as a whole. In this talk, I’ll highlight some examples of research in this area, hoping to illustrate the promise of science of science as well as its limitations.

April 1: David Lazer, Northeastern University - Link to recording

The prevalence and sharing patterns of "fake news" in the US in 2016 and 2020

Abstract:

This presentation will discuss the prevalence and sharing patterns of "fake news" in the United States in 2016 (regarding the election) and 2020 (regarding COVID-19). Substantively, the questions will be: How common is fake news, as a specific genre of misinformation, been on Twitter? How concentrated are exposure and sharing patterns? And how does fake news fit into the broader information ecosystem on Twitter? And, methodologically, the focus will in part be on the development of panels of accounts that are linked to administrative data as a method to measure aggregate behaviors on social media.

April 8: Ziv Epstein, MIT Media Lab - Link to recording

Shifting attention to accuracy can reduce misinformation online

Abstract:

In recent years, there has been a great deal of concern about the proliferation of false and misleading news on social media. Academics and practitioners alike have asked why people share such misinformation, and sought solutions to reduce the sharing of misinformation. We attempt to address both of these questions. First, we find that the veracity of headlines has little effect on sharing intentions, despite having a large effect on judgments of accuracy. This dissociation suggests that sharing does not necessarily indicate belief. Nonetheless, most participants say it is important to share only accurate news.

To shed light on this apparent contradiction, we carried out four survey experiments and a field experiment on Twitter; the results show that subtly shifting attention to accuracy increases the quality of news that people subsequently share. These findings indicate that people often share misinformation because their attention is focused on factors other than accuracy—and therefore they fail to implement a strongly-held preference for accurate sharing. Our results challenge the popular claim that people value partisanship over accuracy , and provide evidence for scalable attention-based interventions that social media platforms could easily implement to counter misinformation online. I will also discuss some finding practical tips for deploying accuracy prompts on digital platforms and introduce a new platform for studying attention on social media.

April 15: Chenhao Tan, University of Chicago - Link to recording

Towards Human-Centered AI: Understanding Human Production and Consumption of Explanations

Abstract:

AI plays an increasingly prominent role in decision making for societally critical domains such as criminal justice, healthcare and fake news. It is crucial that the AI systems be able to explain the basis for the decisions they are recommending in ways that humans can easily comprehend, thus serving as a bridge between humans and AI. While most current computational research in generating explanations focuses on the AI side, this talk focuses on our recent work on how humans provide and interpret explanations. First, I will discuss the distinction between emulation and discovery in building AI and its implications on the role of human explanations. Second, I will show that human explanations can vary substantially across tasks/datasets and advocate rethinking the challenges in obtaining human explanations. Finally, we demonstrate the effectiveness of explanations as real-time assistance in improving human decision-making and as model-driven tutorials to help humans understand model behavior.

April 22: Ed H. Chi, Principal Scientist at Google

Building and Understanding Recommenders for Long-Term User Experiences

Abstract:

How can and should recommender systems shape user experiences? In recent years, our understanding of the objective of recommenders has evolved from making individual good predictions of user interest, to now creating positive, long-term experiences. This task of enabling positive, long-term experiences is significantly more challenging, both making clear the recommender responsibility in shaping the user experience and the difficulty in optimizing over long-term user trajectories.

In this talk we'll discuss three recent advances in understanding how recommenders effect the user experience and reinforcement learning can be used to improve it: (1) We'll start with how traditional recommendation can be framed as an RL task and how off-policy RL can significantly improve user satisfaction. (2) We'll explore what are the effects of the recommender on the user experience using simulations. (3) We'll discuss how we can control for some effects through framing the recommendation challenge as a multi-objective safe RL problem. Finally, I'll comment on how recommenders are dynamic wicked problems that are socially complex with no stopping points. It is hard or impossible to summarize modeling objectives into loss functions as boundary objects, which implies what we need are learning objects that help us archive and make sense of the evolving problem.