PolMeth XXXVIII Poster Session

Political Methodology Society

The 2021 Annual Summer Meeting of the Political Methodology Society will take place online on July 13-16.


More info: https://polmeth2021.com/

Back to top

Choosing Imputation Models

Moritz Marbach

Abstract
Imputing missing values is an important preprocessing step in data analysis, but the literature offers little guidance on how to choose between different imputation models. This letter suggests adopting the imputation model that generates a density of imputed values most similar to those of the observed values for an incomplete variable after balancing all other covariates. We recommend stable balancing weights as a practical approach to balance covariates whose distribution is expected to differ if the values are not missing completely at random. After balancing, discrepancy statistics can be used to compare the density of imputed and observed values. We illustrate the application of the suggested approach using simulated and real-world survey data from the American National Election Study, comparing popular imputation approaches including random forests, hot-deck, predictive mean matching, and multivariate normal imputation. An R package implementing the suggested approach accompanies this letter.
Presented by
Moritz Marbach
Institution
Texas A&M

From Parties to Leaders and Back: Voting Behavior Patterns in Western Democracies 1960s-2020s

Alessio Albarello

Abstract
The study of voting behavior has seen a shift from theories centered on social groups to theories centered on individuals. One of the most prominent expectations has been the individualization of politics, for which individuals (leaders) increased their importance for vote choice at the expense of groups (parties). In this paper, I assemble the largest dataset ever used to analyze the relative importance of parties and leaders for voting behavior, including 122 national election studies and spanning from 1961 to 2019 with more than 170,000 unique respondents. Additionally, I account for observed and unobserved heterogeneity at the country, election, and party levels that may have not been fully captured in the existing scholarship. I find that there is not an increase in leader importance over time, but rather an important drop of party relevance in the 90s followed by a gradual party comeback. I argue that the drop is a consequence of the end of the Cold War and of the communist ideology which left parties without a vital cleavage over which organize electoral competition. The subsequent revival of parties suggests that they may have successfully reorganized themselves over new programmatic divides. I provide various supporting evidence for the theory, and rule out possible alternative explanations, including the individualization hypothesis.
Presented by
Alessio Albarello
Institution
University of Rochester

Latent Factor Approach to Missing not at Random

Naijia Liu

Abstract
Missing data is prevalent among social science datasets. Existing multiple imputation methods assumes missing at random (MAR), which is a more restrictive assumption than MNAR. The problem becomes more challenging under missing not at random (MNAR) scenario, such as missingness in sensitive survey questions. This paper confronts MNAR by modeling the latent structure of the missingness to mitigate the influence of the unmeasured confounders that cause the missing values. This approach allows one to assume missing at random (MAR) conditional on the latent factor. The proposed method outperforms multiple imputation methods under MNAR. In addition to simulation comparison, I show an application using latent factor model to impute the missing values in an observational causal inference study, in which imputation significantly altered the estimate of causal effects. I offer some theoretical discussion to show why MNAR is a serious problem for observational causal inference procedures. I conclude the paper with discussions of the scope of the method and potential extensions.
Presented by
Naijia Liu <naijial@princeton.edu>
Institution
Princeton University

Near-Optimal Topic Models for Large Scale Text Data

Adam Breuer

Abstract
In this paper, we introduce a new class of topic models called Exemplar Topic Models (ETMs) that give results that are provably more accurate, interpretable, and scalable than the current state of the art. Topic models have rapidly become our workhorse method for inferring the key themes that characterize the content of text datasets. Yet despite their massive popularity, all standard topic models (e.g. LDA, STM, etc.) share two key shortcomings: First, unlike other core methods in the political methodology toolkit, standard topic models have no theoretic guarantees. This means they find topics that are arbitrarily mismeasured (even on ideal data) according to their own models’ objective functions; their results are unstable over multiple runs; and they often miss important topics and return spurious ones. Second, standard topic models find topics that lack a rigorous interpretation, so interpreting a topic requires a researcher to manually inspect each topic’s distribution over ~20,000 dictionary words and make a subjective judgement (e.g. claim ‘this topic is about war’).

The ETMs we introduce share the same rich probabilistic model as conventional topic models such as LDA. However, unlike conventional topic models, ETMs find the topics associated with each document in the text dataset that, even in the worst-case, nearly maximize the model’s a-posteriori probability, guaranteeing near-optimal results. Moreover, each topic found by an ETM has a concise and rigorous interpretation related to a single word (e.g. ‘war’). Leveraging recent results from theoretic machine learning, we show that these results are surprisingly always achievable in sublinear computation time in parallel, which means that ETMs can be applied even to massive new political text datasets containing billions of documents. Beyond their theoretic guarantees, ETM’s also consistently outperform standard LDA topic models in terms of standard measures of topic quality on a variety of canonical datasets as well as new original datasets, such as all online political ads in the US election cycle and all posts on Parler.com during the Jan. 6th attack.
Presented by
Adam Breuer
Institution
Harvard

Sensitivity Analysis in the Generalization of Experimental Results

Melody Huang

Abstract
Randomized control trials (RCT’s) allow researchers to estimate causal effects in an experimental sample with minimal identifying assumptions. However, to generalize a causal effect from an RCT to a target population, researchers must adjust for a set of treatment effect moderators. In practice, it is impossible to know whether the set of moderators has been properly accounted for. In the following paper, I propose a three parameter sensitivity analysis for the generalization of experimental results using weighted estimators, with several advantages over existing methods. First, the framework does not require any parametric assumption on either the selection or treatment effect heterogeneity mechanism. Second, I show that the sensitivity parameters are guaranteed to be bounded and propose (1) a diagnostic for researchers to determine how robust a point estimate is to killer confounders, and (2) an adjusted calibration approach for researchers to accurately benchmark the parameters using existing data. Finally, I demonstrate that the proposed framework can be easily extended to the class of doubly robust, augmented weighted estimators. The sensitivity analysis framework is applied to a set of Jobs Training Program experiments.
Presented by
Melody Huang <m.huang@ucla.edu>
Institution
UCLA

Back to top

A Logical Model for Predicting Minority Representation: Application to Redistricting, Voting Rights Cases, and More

Yuki Atsusaka

Abstract
**The first half of this poster is based on my article that is available at https://t.co/58qyzraJwe?amp=1.

Understanding when and why minority candidates emerge and win in particular districts entails critical implications for redistricting and the Voting Rights Act. I introduce a quantitatively predictive logical model of minority candidate emergence and electoral success --- a mathematical formula based on deductive logic that can logically explain and accurately predict the exact probability at which minority candidates run for office and win in given districts. I show that the logical model can predict about 90% of minority candidate emergence and 95% of electoral success by leveraging unique data of mayoral elections in Louisiana from 1986 to 2016 and state legislative general elections in 36 states in 2012 and 2014. I demonstrate that the logical model can be used to answer many important questions about minority representation in redistricting and voting rights cases. All applications of the model can be easily implemented via an open-source software: logical.

In this poster, I further present three extended applications of the model to detecting racially polarized voting, measuring the level of minority descriptive representation, and evaluating the mechanical effect of electoral systems on minority representation.
Presented by
Yuki Atsusaka <atsusaka@rice.edu>
Institution
Rice University

Footprints of Malfeasance: A Simulation-Based Approach to Measuring Corruption in Public Spending

Gustavo Guajardo

Abstract
For decades, political scientists have overwhelmingly relied on expert assessments or citizen perceptions for the study of corruption. Still, these measures have important limitations, and fail to directly observe the levels of corruption within governments. This study proposes a novel approach to measuring corruption in public spending. I argue that corruption in public spending can be conceptualized as an unobserved latent variable, and that discrepancies between a government's budget and expenditure reports hide corrupt adjustments to the budget. I develop a simulation-based approach to uncovering corrupt adjustments, where the observed discrepancies are contrasted with a simulated counterfactual of "clean" spending. In order to illustrate, I leverage data from the Mexican government's expense reports (2014-2020), which provide fine-grained tracing of the universe of changes to the official budget by government agency and type of expenditure. The proposed method would enable a closer study of the black box of government activity and has the potential to shed light into how, when, and why public resources are falling prey to malfeasance.
Presented by
Gustavo Guajardo
Institution
Rice University

Volatility in Party Support

Douglas Rivers, Stephanie A. Nail

Abstract
In this paper, we analyze volatility in support for U.S. political parties among demographic and geographic groups. Traditionally, partisan identities have been thought of as "psychological bonds" that are formed early on in life and are relatively unchanged over time (Campbell et al. 1960; Green, Palmquist, and Schickler 2002). In this view, an individual's partisan identity stems from in-group and out-group associations with social groups (Green, Palmquist, and Schickler 2002; Hetherington and Rudolph 2015; Green, Huber, and Washington 2010).

The traditional of view of party alignments as being relatively stable over long periods is to some degree contradicted by voting patterns over the past half century. Groups formerly identified with one party now support the other party. States that were not too long ago safely Democratic are now Republican and vice versa. These trends extend well beyond the well-known movement of the South to Republicans and racial minorities to Democrats. At the macro (or group) level, party support is fairly volatile.

At the same time, we see high levels of stability in individual-level partisanship and little party switching between elections. Split-ticket voting has declined, there are fewer swing voters, and only a few people switch between the parties. This apparent micro-macro discrepancy is analyzed using multilevel models to estimate the magnitude and sources of change in the U.S. party system. More specifically, we estimate hierarchical models using empirical Bayes for each group using ANES data from 1976-2020 and CPS data from the same time period.
Presented by
Stephanie Nail <sanail@stanford.edu>
Institution
Stanford University

When Do Voter Files Accurately Measure Turnout? How Transitory Voter File Snapshots Impact Research And Representation

Seo-young Silvia Kim

Abstract
Voter files are an essential tool for both election research and campaigns, but relatively little work has established best practices for using these data. We focus on how the timing of voter file snapshots affects the most commonly cited advantage of voter file data: accurate measures of who votes. Outlining the panel structure inherent in voter file data, we demonstrate that opposing patterns of accretion and attrition in the voter registration list result in temporally-dependent bias in estimates of voter turnout for a given election. This bias impacts samples for surveys, experiments, or campaign activities by skewing estimates of the potential and actual voter populations; racial/ethnic minorities and other low turnout groups are particularly impacted. We provide a sensitivity analysis approach that allows researchers to measure the impact of this bias on their inferences. We then outline methods that measurably reduce this bias, including combining multiple snapshots or using commercial files that preserve the turnout histories of dropped voters.
Presented by
Seo-young Silvia Kim
Institution
American University

Back to top

Discourse and Policy: Using text as data to capture elite polarization in speech

Cybele Kappos

Abstract
My research uses text analysis to understand the effect of political polarization among policymakers to pass new legislation. The questions I seek to answer are as follows: 1) is text analysis a good approach to understanding polarization? 2) what is the effect of polarization on the EU’s ability to pass new legislation? Traditionally, political scientists have used roll-call voting to scale political actors but this method examines the final outcome of deliberation rather than the process of deliberation itself, which is discourse and debate. Analyzing political speech allows us to develop a more nuanced understanding of a political actor’s policy preference. My research project makes use of a collection of around 17,000 speeches by political elites in the EU. I analyze this corpus using Wordfish, an approach developed by Slapin and Proksch. By focusing on three different policy areas, I capture heterogeneous effects of polarization. This method assumes that word frequency follows a Poisson distribution with the parameter λ determined by the actor’s position on the left-right dimension after controlling for fixed effects. In this paper, I find evidence in support of the hypothesis that polarization reduces the count of legislation. Moreover, I show that the use text analysis and processing as a method to study polarization is a promising methodological approach.
Presented by
Cybele Kappos
Institution
University of California, Los Angeles

Embedded Lexica: Extracting Topical Dictionaries from Unlabeled Corpora using Word Embeddings

Patrick Chester

Abstract
The rise of the internet, social media, and the digitization of archives have led to an accumulation of untold quantities of unlabeled text data of relevance to the social sciences. Efficiently extracting information from those corpora frequently involves applying topical dictionaries to identify tweets, news articles, or other documents of interest to researchers. However, human-coded dictionaries are too costly to generate for them to be practical solutions for specific information extraction tasks. Also, existing algorithms for extracting dictionaries, such as supervised machine learning and the semi-automatic WordNet, require many user-provided seed words to generate useful results and do not incorporate contextual information of natural language. In this paper, I present a novel algorithm, conclust, that applies word embeddings towards extracting topically-related dictionaries from unlabeled text using a small number of user-provided seed words and a fitted word embeddings model. Compared to existing methods of lexicon extraction conclust requires few seed words, is computationally efficient, and takes word context into account. In this paper, I describe this algorithm’s properties, and evaluate it according to its ability to replicate word topics from the WordNet Domains database.
Presented by
Patrick Chester
Institution
New York University

Emotions and Flight from Violence: Evidence from Punjabi/English Video Archives

Aidan Milliff

Abstract
When people are exposed to violence, do their emotional experiences help explain what they will do in order to survive? Which emotions predict politically consequential strategies like fleeing violence? I analyze an under-used and widely available source of individual data about experiences of violence––archives of videotaped oral histories––to show that propensity to flee violence is associated with experiencing fear and surprise. I apply a pre-trained computer vision classifier to label basic emotions in over 35,000 video frames from an archive of over 500 oral history videos from the 1984 Living History Project, an organization that collects stories from Sikhs who experienced civil war and pogrom violence in the Indian states of Punjab and Delhi in 1984. Patterns in the emotion labels show that experiencing fear and anger together is associated with conflict-induced migration/fleeing. I validate these findings with two complimentary methods: measurement of appraisals (emotion proxies) in the oral history transcripts via a custom-tuned multi-lingual NLP model, and qualitative analysis of over 200 histories.
Presented by
Aidan Milliff <milliff@mit.edu>
Institution
Massachusetts Institute of Technology

Machine Assisted Coding for Event Data

Chase Bloch

Abstract
I introduce a method that avoids the concerns over lower accuracy associated with machine-coding, as opposed to hand-coding, while still allowing for its benefits in efficiency. The idea of this method is to use a machine classification model to code only the observations for which it is most confident, the rest are then human coded. Using this method on articles from the Militarized Interstate Disputes (MIDs) project, I machine classify around 45% of observations into 5 different categories with 95% accuracy. This has the potential to save research groups dozens of hours and thousands of dollars on hand-coding while still maintaining high levels of accuracy. It can also make the coding of large datasets more accessible to researchers without large research budgets and/or limited time. This method makes two contributions to the literature on machine-human hybrid coding. First, it allows machines to do more coding and more accurate coding than previous approaches, enabling more efficient coding overall. Second, it introduces a system that researchers can use to determine how much coding they want the machine to do by manually setting the trade-off between number of machine-coded stories and performance.
Presented by
Chase Bloch
Institution
Pennsylvania State University

Back to top

Data Visualization for Difference-in-Differences

Juraj Medzihorsky

Abstract
We present data visualizations for diagnostic checks on difference-in-differences (DiD) techniques. This approach shows that even in the two-period model without covariates, there is information available to assess the performance of standard DiD approaches. In particular, we show (1) that adding distributional information on the outcomes to standard parallel trends plots can reveal situations where quantile-based changes-in-changes (CiC) approaches (Athey and Imbens 2006) will be preferred to standard DiD and (2) that plots of the distribution of imputed potential outcomes can be used to assess robustness of findings for ATT. In addition, (3) the ATT can be bounded by a support-adjusted estimate. Because time-invariant and group-invariant confounding are not equivalent assumptions on the quantile scale, (4) simultaneous presentation of CiC and reverse CiC allow for a more robust check on DiD when one of these assumptions is not preferred to the other. We illustrate these approaches with the classic analysis of the effect of raising the minimum wage in NJ (Card and Kreuger 1994).
Presented by
Juraj Medzihorsky
Institution
London School of Economics and Political Science

Intersectionality and Machine Learning: Relaxing Improbable Independence Assumptions

Melina Much

Abstract
Using the lens of methodological pluralism, we investigate the compatibility of machine learning and intersectionality.  Intersectional quantitative methods would require the researcher to proactively specify the interwoven relationship between race, gender, and class; particularly as it relates to the political experience of multiply marginalized groups. In certain machine learning algorithms, there are required assumptions of independence between predictors, which make these methods unsuited for specifying and understanding intersectional group-based heterogeneity. This poses a tension between machine learning methods and intersectional scholars. In this piece, we will further the literature on quantitative methods for the study of identity by proposing the utility of Bayesian Classifiers and K Nearest Neighbors with relaxed assumptions of independence.
Presented by
Melina Much
Institution
UC Irivne

Subjective Neighborhood Identification and Analysis

Cory McCartan

Abstract
Partisan and racial sorting has emerged in recent years as an important geographic dimension in American politics, and a growing literature demonstrates the increasing sorting of the electorate (Bishop, 2009; Rodden, 2011; Abrams and Fiorina, 2012; Mummolo and Nall, 2017; Nall, 2018; Martin and Webster, 2018; Brown and Enos, 2020), even down to the neighborhood level. However, defining voters’ neighborhoods is a difficult task, as the term is necessarily subjective. We propose a generative Bayesian model for subjective neighborhood identification, based on Census block and voter characteristics, and apply it to neighborhoods collected as part of a survey of voters in the New York, Miami, and Phoenix metropolitan areas. Preliminary findings underscore the continued importance of local demographics in determining voters' perceptions of their neighborhood.
Presented by
Cory McCartan <cmccartan@g.harvard.edu>
Institution
Harvard University

Voted In, Standing Out: Public Response to Immigrants' Political Accession

Guy Grossman and Stephanie Zonszein

Abstract
Nativist policies and exclusionary attitudes are on the rise in Western democracies. At the same time, integrating ethnic minority immigrant communities into their host country’s political life has become a challenge for these societies. Under this context of nativism and low participation of ethnic minority immigrants, what is the reaction of the host society when ethnic minorities succeed at integration in political institutions? Building on threat theory --- which links political power to hostility against marginalized groups --- we argue that when minority ethnic immigrants win political office, the native-born fear that immigrants pose a threat to their dominant position. This in turn triggers a hostile reaction from the public. We test these dynamics using hate crime police records, public opinion data, and text data from over 500,000 news articles from 350 national, regional and local UK newspapers, covering the last four general elections. We identify the public's hostile reactions with a regression discontinuity design that leverages close election results between minority-immigrant and dominant group candidates. Our findings suggest a public backlash against minority immigrants integration into majority settings.
Presented by
Stephanie Zonszein
Institution
University of Pennsylvania

Back to top

Causal Inference With Bundled Treatments And Moderators

Zachary Markovich

Abstract
Bundled variables are ubiquitous in political science. Concepts like democracy, national power, or effective policy do not exist as uni-dimensional traits observed by the researcher -- instead the analyst is left to parse the influence of many different bundles of distinct but conceptually related traits. Conventional causal estimands are not defined in terms of such bundles. Current approaches to causal inference in such settings require discarding much of the variation in the complete bundle and, consequently, risk understating the magnitude of causal effects.

This poster fills this gap by providing a novel estimand explicitly defined in terms of such bundled variables. Specifically, I first require the researcher to specify some causal estimand, which I term the Causal Set Effect, that is defined in terms of different sets of the bundles. For example, in the bundled treatments case, this might take the form of the difference in average potential outcomes between units treated with an element of the first rather of the second of set of treatment bundles. In the bundled moderators case on the other hand, the Causal Set Effect might be defined as the difference in average treatment effects between units that received a moderator bundle in the first set rather than the second. I propose focusing on the maximum of these Causal Set Effects - an estimand I term the Maximum Causal Set Effect (MCSE). I also limit the sets of bundles considered to those which have at least probability q of occurring, where q is a researcher specified constant, so that the MCSE does not end up being dominated by unrepresentative edge cases.

I propose two estimators for this novel quantity of interest. One has a positive bias while the other has a downward bias. Together they provide bounds on the MCSE. I use this framework to analyze the effect of democratic political institutions on the likelihood of civil war onset and identify larger causal effects than have been uncovered by past researchers, speaking to the broad utility of this novel quantity of interest for applied researchers.
Presented by
Zachary Markovich
Institution
MIT

Causal Effect of Sending Mail-In Ballots to All Registered Voters on Voter Turnout and Composition

Yimeng Li

Abstract
The impact of universal mail-in voting on voter turnout and voter composition is not yet well understood. Since counties are important units in terms of election administration, existing work analyzing cross-county differences is unlikely to discover the true effect of universal mail-in voting. In this paper, we exploit a state law that requires election officials to send mail ballots to registered voters in some but not other congressional and legislative districts within a large county to estimate the causal effect.
Presented by
Yimeng Li
Institution
California Institute of Technology

On the reliability of published findings using the regression discontinuity design

Drew Stommes

Abstract
The regression discontinuity (RD) design has become a standard method in political science. Researchers use the method because it offers identification of causal effects under relatively weak assumptions. But identification does not necessarily imply that the causal effects can be estimated with precision using limited data. In this paper, we highlight that estimation is particularly challenging with the RD design, with inherent concerns about statistical power and researcher discretion. To investigate whether these concerns manifest themselves in the empirical literature, we collect all RD-based findings published in top political science journals from 2009–2018. The distribution of published findings exhibit pathological features compatible with publication bias. We conducted a reanalysis of all studies with available data using a standardized suite of modern estimation tools, but this does not resolve the pathologies. We find that most of these studies were underpowered to detect all but large effects. We argue that these power issues, combined with well-documented selection pressures in academic publication, cause concern that many published findings using the RD design are exaggerated, if not entirely spurious.
Presented by
Drew Stommes
Institution
Yale University

Structural Causal Models and Factorial Experiments: Identification Problems and Applications in Social Sciences

Guilherme Jardim Duarte

Abstract
Despite their cost, randomized controlled trials (RCTs) are widely regarded as gold-standard evidence in disciplines ranging from social science to medicine. In recent decades, researchers have increasingly sought to reduce the resource burden of repeated RCTs with factorial designs that simultaneously test multiple hypotheses, e.g. with conjoint experiments that evaluate the effects of many immigrant demographics or product characteristics on host-country or consumer choices, respectively. Treatments discovered to be effective in factorial RCTs are typically regarded as effective when applied to the experimental population. I prove that this common inference is not generally accurate without major assumptions in terms of functional forms. Without those assumptions, surprisingly, treatments with positive effects in factorial RCTs can even have negative effects when applied in isolation. Intuitively, this is because factorial RCTs inherently disrupt the complex ways that interventions interact with or causally affect one other when deployed alone in the wild, producing F-bias when extrapolating from the former setting to the latter. Such extrapolations are nevertheless widespread in applied research, informally suggesting that scholars find factorial RCTs be at least somewhat useful. I formalize this intuition and show how extrapolation of effect sizes relies on strong, typically unstated assumptions about the data-generating process. Finally, I develop nonparametric sharp bounds---i.e., maximally informative best-/worst-case estimates consistent with limited RCT data---that show when extrapolations about effect sign are empirically justified. These new results are illustrated with applications to common field trials.
Presented by
Guilherme Jardim Duarte
Institution
Princeton University

Sensitivity Analysis for Sequential Outcome Tests

Elisha Cohen

Abstract
This project applies sequential outcome tests to measuring gender bias throughout the multi-stage process of becoming a representative for the U.S. House. Even as we have seen changes in women's educational and professional achievements and shifts in mass public opinion, we are still trying to understand why so few women are elected to Congress. Currently 27\% of US House members are women far below population parity. The low levels of women make us question whether the three stages in the electoral selection process: 1) choosing to run, 2) winning the primary election and 3) winning the general election are free from gender bias. Past work has generally applied a multivariate regression approach to individual stages in isolation focusing on electoral outcomes. To correctly evaluate gender bias as a barrier to more women in Congress these stages need to be studied as a sequential multi-stage process. Therefore, a better methodological approach evaluates the bias directly in all three stages. Outcome tests are a method used to evaluate bias in selection processes by comparing systematic differences in outcomes across groups. Using data on campaign finances, an important indicator of electoral success, and receipt of federal outlays, an important indicator of legislator success, combined with sequential outcome tests I am able to evaluate the amount of bias at each stage. Additionally, I use a covariate adjusted sensitivity analysis to evaluate the bias at each stage when the selection-on-observables assumption is violated. In the first stage at least 19% of Democratic men and 18.8% of Republican men would not have chosen to run had they been women and given an unmeasured confounder as strong as the indicator for open seat elections. In the second stage, Democratic women continue to face bias as at least 21.8% of Democratic men would not have won their primary had they been women and given an unmeasured confounder as strong as the indicator for open seat elections. And in the final stage, Republican women are heavily discriminated against with at least 31.3% of Republican men would not have won their general had they been women and given an unmeasured confounder as strong as a covariate for being a state capital district. These results suggest gender bias is still a large part of all stages of the electoral selection process into the US House.
Presented by
Elisha Cohen
Institution
Emory University

Back to top

A direct approach to understanding electoral system changes

Samuel Baltz

Abstract
Presented by
Samuel Baltz <sbaltz@umich.edu>
Institution
University of Michigan

Reelection can Increase Legislative Cohesion: Evidence from Clientelistic Parties in Mexico

Lucia Motolinia

Abstract
It is often argued that when legislators have electoral incentives to cultivate personal votes, parties are less cohesive. This is because legislators have an alternative principal with whom they must build bonds of accountability: their voters. I offer a theory for why this will not always be the case. I posit that when parties control access to the resources candidates need to cultivate a personal vote, the introduction of personal vote-seeking incentives can increase party cohesion, not decrease it. It can do so, because party leaders can condition a legislator's access to the resources they need to cultivate a personal vote on loyalty to the party's agenda. To test this theory, I turn to the case of Mexico, where an electoral reform in 2014 introduced the possibility of reelection for state legislators. I estimate the ideological placement of Mexican state legislators by applying correspondence analysis to a new dataset of over half a million speeches in 20 states from 2012 to 2018. Leveraging the staggered implementation of the reform, I conduct a difference-in-difference analysis of its effects on intra-party cohesion. Results accord with the theory and have broad ramifications for work on personal vote-seeking, for Mexican politics, and for countries introducing personal vote-oriented electoral reforms.
Presented by
Lucia Motolinia
Institution
New York University

Studying Language Usage Evolution Using Pretrained and Non-Pretrained Embeddings

Patrick Y. Wu

Abstract
Changing usage of language in politics indicate shifts in how specific issues are discussed and what is politically salient at a given time. Natural language processing researchers have used distributed word embeddings to study the evolution of particular words over time. Typically, separate word embedding spaces are trained using the text from each period of interest. Distributed word embedding approaches offer an advantage over topic models because the researcher can examine how usages of specific words evolve. However, the corpora that political scientists usually work with are much smaller than the extensive corpora used in natural language processing research. Splitting up the corpus into even smaller corpora leads to poorly trained embeddings. This paper proposes a framework, based on the theory developed in Arora et al. (2018), that uses both pretrained and non-pretrained embeddings to learn time-specific word embeddings. I call this the pretrained-augmented embeddings (PAE) framework. In the first application, I apply the PAE framework to a corpus of New York Times text data spanning several decades. The PAE framework matches human judgments of how specific words evolve in their usage more closely than existing methods. In the second application, I apply the PAE framework to a corpus of tweets published during the COVID-19 pandemic about masking. I show that the PAE framework automatically detects discussions about specific events during the COVID-19 pandemic vis-a-vis the keyword of interest.
Presented by
Patrick Y. Wu
Institution
University of Michigan

Understanding Hong Kong Digital Nationalism: A Topic Network Approach

Justin Chun-ting Ho

Abstract
This paper studies the digital aspect of Hong Kong nationalism. Drawing on data from Facebook, this paper examines how elements of nationalism discourse were invoked by political actors to advance their agenda. Facebook is selected as the data source due to its key roles in Hong Kong political communication. A novel mixed-method approach is introduced to study political discourses. The analysis begins with the quantitative phase, topic modelling is used to identify the recurring themes. To identify the key topics, this work introduces a novel method to generate a topic network and uses centrality measures from social network analysis to identify core topics in the topic model. Jointly, these methods serve to discern the overall pattern from the data and select a meaningful subset for subsequent qualitative analysis. In the qualitative phase, selected texts are analysed discursively so as to identify the major frames presented in the data. The findings reveal that Hong Kong nationalism discourse includes three frames, the threat frame which constructs the overarching narrative of China threat, the identity frame that engages with the debate on Hong Kong local and national identity, the action frame which discusses the actions to be taken in response to the threats.

#HongKong #Nationalism #SocialMedia #TextAsData
Presented by
Justin Chun-ting Ho <justin.chunting.ho@sciencespo.fr>
Institution
Sciences Po

Back to top

Autoregressive Count Models that are Trivial to Estimate

Garrett Vande Kamp and Soren Jordan

Abstract
Presented by
Garrett Vande Kamp
Institution
University of Georgia

How Economic Outcomes Translate into Economic Evaluations

Jan Zilinsky

Abstract
Conventional wisdom suggests that many people have a poor grasp of the state of the economy. By contrast, I theorize that the average citizen perceives the state of the economy as a function of potentially complex combinations of objective economic conditions. I propose a perceptive-crowds hypothesis, and test it using an original approach without imposing expectations about which economic indicators are likely to be salient in citizens' minds, or pre-selecting the most readily available economic indicators. Agnostic models estimated for OECD countries and other large economies between 2002 and 2019 provide significantly more accurate predictions of economic evaluations than models typically employed in the literature.
Presented by
Jan Zilinsky <jz981@nyu.edu>
Institution
NYU

How Modeling Unpredictable Events can Improve Congressional Election Predictability

Daniel Ebanks, Jonathan N. Katz, Gary King

Abstract
We generalize the now standard statistical approach to modeling district-level congressional election results, by using systematically measured variables such as lagged vote, incumbency, partisanship, etc. Our new Gausssian mixture approach adds the prevalence of electoral surprises, even when they cannot be measured systematically -- factors such as unexpectedly strong challengers, political scandals, deaths, retirements, heuresthetical maneuvers, etc. We show that allowing these surprises, usually only recognized in qualitative research, into the standard quantitative model improves our understanding of congressional elections, out-of-sample predictions of district election results, and the ability to discover where and when these surprises occur.
Presented by
Daniel Ebanks <debanks@caltech.edu>
Institution
California Institute of Technology, Harvard University

Detecting Coverage of Social Unrest on Telegram

Ishita Gopal

Abstract
Telegram has gained popularity as a mobilization platform due to its emphasis on privacy combined with a social media like experience. It is widely being used in authoritarian contexts. For example, media reports suggest that Telegram played an instrumental role in coordinating mass protests in Belarus. The platform has been targeted by multiple governments -example, Iran, Russia, Belarus - with harsh censorship, further indicating its importance. But there isn’t a clear understanding of what type data exists on Telegram and specifically, which social unrest events are discussed and which are not. To shed light on this question I aim to conduct a cross national study which tries to map contentious events covered in ACLED to conversations on this platform and then test event and country level variables which explain discussions of these events on Telegram. In the poster I present a preliminary analysis for Belarus.
Presented by
Ishita Gopal
Institution
The Pennsylvania State University

Back to top

Measuring Candidate Ideology from Congressional Tweets and Websites

Michael Bailey

Abstract
Estimating ideology is an important task in the study of American politics. In this paper, we estimate the ideological location of all candidates in the 2020 elections based on their social media and web content. Several challenges make this a difficult task. First, the mapping from term-use to ideology differs across parties, creating an identification problem. Second, the large number of terms create potential for spurious associations between term-use and ideology. Third, simultaneously estimating term and individual level parameters creates a potential for parameters to spiral in ways that inappropriately place undue weight on some terms. We address these challenges by pre-processing the data and using priors based on roll-call votes by incumbents. The process produces term parameter and ideal point estimates, the face validity of which compare favorably to other approaches.
Presented by
Michael Bailey
Institution
Georgetown University

Comparative Elite Networks in the Arab World

Omer Yalcin

Abstract
The news media provides an important source of information regarding political events. I study elite political networks in the Arab world using news article text from aljazeera.net, one of the best known and oldest Arabic news sites. The fact that Arabic is the official language of all 22 members of the Arab League and that Aljazeera covers the entire Arab world provides an opportunity to study political networks across Arab countries. I use about 35,000 news articles that cover one of the 20 countries in the Arab League (I exclude Comoros and Djibouti due to their size) published since 2016. Through the use of state-of-the-art Farasa NLP software for Named-Entity-Recognition, I map the co-occurrence network of influential persons for each country. The “divide-and-rule” model of politics states that authoritarian rulers try to disrupt elite network formation to have many smaller components rather than a giant connected component in order to protect the regime from overthrows or democratization. I look at this theory across two dimensions: authoritarian vs. democratic countries and kingdoms vs. republics. Using Polity IV scores, I find support for the argument that the relative size of the largest connected component is smaller in more authoritarian regimes of the region. I also test the arguments for kingdoms vs. republics. I find even stronger support for the hypothesis that kingdoms have relatively smaller largest connected components compared to republics.
Presented by
Omer Yalcin
Institution
Pennsylvania State University

Constructing the Migration Crisis: Automated Text Analysis of European Newspapers 2008-2022

Michelle Reddy

Abstract
This project analyses the discursive construction of a migration and asylum crisis in European media in the mid-2010s. In a first article, using a rich textual dataset extracted from six main French national newspapers and their online additions, we address three empirical questions: How is the crisis constructed as an event and sensationalised in the major French newspapers? How was the crisis polarised and politicised in the press across time? How is the crisis framed according to the nationalities or origin of immigrants and asylum seekers, leading to what we assume is a racialisation of the migration and asylum crisis? Our results first show that the press generated a highly polarised coverage of the crisis, opposing the unwanted (irregular) migrants and the deserving refugees, in the absence of either immigration and asylum increase in France. Yet, preexisting media slant only partially explains this politicisation of migration and asylum. We find that certain migration-related events happening in Europe particularly had a polarising effect on media framing. Finally, our analysis reveals that the framing of the crisis reflects the racialisation of migration and asylum, meaning a differential treatment of migrants and asylum seekers across origins.
Presented by
Michelle Reddy
Institution
Sciences Po Paris

Text Semantics Capture Political and Economic Narratives

Elliott Ash, Germain Gauthier, Philine Widmer

Abstract
The standard computational representations of text used in social science lack information on the actions taken and the associated actors. Yet these semantic roles are the atoms from which spring narratives -- the stories in fiction, politics, and life that shape beliefs, actions, and government policies. In this paper, we provide a novel method for quantifying the latent narrative structures in text, with an application to the U.S. Congressional Record (1994-2015).
Presented by
Germain Gauthier
Institution
CREST, Polytechnique; ETH Zurich; Uni. St. Gallen

Back to top

Is Depoliticized Propaganda Effective? -- Estimation of a Global Effect

Shiyao Liu

Abstract
Presented by
Shiyao Liu
Institution
Massachusetts Institute of Technology / New York University Abu Dhabi

Measurement and Inference using Satellite Data in Benin

Luke Sanford

Abstract
In this paper I show that satellite imagery can be used to improve estimates of the effects of programs or policies in settings where the location of the treatment is known and the outcome(s) of interest are observable in satellite imagery. We often only observe where and when an intervention occurs but lack an adequate control group with which to compare. I combine remote sensing methods with a "double machine learning" strategy to account for confounds that appear (and may only appear) in the satellite record, and which may contribute to both selection into treatment and outcomes of interest. In non-experimental settings this approach thus allows researchers to directly control for confounders that must otherwise be assumed away. I demonstrate this approach using both convolutional neural networks and random forests and show how both can be used to take advantage of multi-spectral, high frequency imagery. As a case study example, I apply this technique to estimate the effects of land tenure formalization on landowner investment behavior in Benin, West Africa and find that conventional techniques would over-estimate the effects of land titles on conversion from forest to cropland.
Presented by
Luke Sanford <luke.sanford@yale.edu>
Institution
Yale School of the Environment

Praise from Peers Promotes Empathetic Behavior

Adeline Lo, Jonathan Renshon, Lotem Bassan-Nygate

Abstract
Empathy is a powerful tool for shaping policy preferences, promoting cooperative behavior and warming attitudes towards others. Yet, engaging in empathy is costly and existing interventions to encourage it are themselves expensive and time-consuming. Across five studies, we precisely estimate the magnitude of these costs to engaging in empathy, propose and test an intervention to encourage greater empathy and trace the causal process through which our treatment works. We motivate and test a light-touch and scalable intervention based on "peer praise" to encourage empathetic behavior. Across our studies, we find that empathy is costly, that peer praise can encourage greater empathy, and that one way it operates is by boosting the positive emotions. Supplementary analyses suggest further promise for the intervention, as we document its ability to work broadly across political ideology and race to encourage empathetic behavior.
Presented by
Adeline Lo
Institution
University of Wisconsin Madison

Building Loyalty through Personal Connections: Evidence from the Spanish Empire

Marcos Salgado

Abstract
The personal loyalties of high-ranking officials can help overcome or exacerbate agency problems. The Spanish Empire promoted links between colonial officials and their superiors in Spain and discouraged social ties between them and local elites. I use superiors’ entries and exits as within-official shocks to connections to estimate their effect on promotions and performance. I find that connected ministers were more likely to be promoted and raised more revenue. On the other hand, ministers with more links to local elites collected less revenue. These patterns are explained by personal connections, defined as sustained in-person interactions during their early careers. I also validate the connections measure by showing that they predict active friendships.
Presented by
Marcos Salgado <msal@stanford.edu>
Institution
Stanford University

Back to top

Scoring Mass Protests In Repressive Settings

Kimberly Turner

Abstract
Canonical binary measurement of protest success fails to capture the relative concessions demonstrators might extract from their regimes. A 21-point scale captures the dimensions of gains protests might achieve (in the form of regime concessions) and the costs they pay for those concessions (in the form of state reprisals). Using Mokken scale analysis, country success scores pinpoint a protest’s position along a unidimensional continuum of abject failure to transformative changes in the body politic is developed. The measure offers an improved method of capturing regime behavior in the form of ‘ignoring’, exhibits more accurate point estimates, identifies misclassified cases, and accounts for active state repression. The new measure is used to then evaluate potential interactions between protest features and political/social contexts upon protest successfulness. Protest features, such as crowd size and diversity, are often cited as causal determinants of success. And yet, most protests in repressive settings fail despite displaying these protest characteristics, indicating the potential interaction effects. Using a moderated mediation model, mediation and moderation effects of protest features and contextual factors are simultaneously estimated. Evidence of a large and significant interaction effect diminishes protest successfulness. The application indicates that success scores offer more accurate estimation of potential interactive relationships between success determinants that reduce their efficacy.
Presented by
Kimberly Turner
Institution
Southern Illinois University

Temporal Validity

Kevin Munger

Abstract
The ``credibility revolution" has forced social scientists to confront the limits of our methods for creating general knowledge. The current approach aims to aggregate valid but local knowledge. ``Temporal validity" is a form of external validity in which the target setting is in the future---which, of course, is always the case. Positivist social science has until recently been hamstrung with other, more immediate threats to validity and inference, but I argue that the cutting edge of non-parametric statistical approaches to the problem of external validity lay bare the inability of these approaches to inform human decision-making in, or make predictions about, the future. Using a large database of 32,000 RCTs conducted by a media firm between 2013 and 2015, I simulate the process of social science knowledge production and demonstrate that at the current margin, temporal validity is a first-order problem.
Presented by
Kevin Munger <KEVINMUNGER@GMAIL.COM>
Institution
Penn State University

The Costs of Doing Research

Jane Sumner

Abstract
Although the costs associated with doing research are integral to the research process, they are rarely studied. Yet research always has costs, most researchers operate with resource constraints, and those resource constraints vary considerably across and within departments. Our point of departure for the two papers presented in this poster is the idea that there may be a disconnect between the sometimes-high costs of research and the resources provided to researchers to conduct that research. To study this, we conducted two surveys: the first sampled every article published in six journals over a three-year span about the costs of their research and the second sampled every R1 assistant professor in the US and a random sample of non-R1 APs about their start-up packages. We find that the median article in our sample costs just under $700 and that 17% of the assistant professors in our sample could not afford to solo author that article. Further, we find that certain characteristics of research are associated with higher costs, and we find some evidence that money can buy time. For startup funds, we find strong evidence that university characteristics predict start-up funds, as do competing offers and the rank of an individual's Ph.D. alma mater.
Presented by
Jane Sumner
Institution
University of Minnesota, Twin Cities

Back to top

Money Complicates Things: A Mixed-mode Finite Mixture Model of Political Donors

Jay Goodliffe

Abstract
Different types of donors have different motivations, but how can we identify those different types? Using data from the 2020 CCES, I use a mixed-mode finite mixture model on 10 donor targets (latent class analysis/binomial finite mixture model) and donation amount (latent profile analysis/Gaussian finite mixture model) to identify donor types. Applying quantitative measures of fit and substantive interpretability, I find more types than most mixture models. Including donation amount complicates the model (and increases the computational burden), but yields more homogeneous types.
Presented by
Jay Goodliffe
Institution
Brigham Young University

Responsiveness in a Fragmented Local Politics

Bryant J. Moy

Abstract
Are local governments responsive given overlapping governing institutions? Current approaches to representation, at this level, may underestimate the extent to which responsiveness exists because they fail to account for the overlapping nature of local governance. To fill this gap, I first implement a framework that takes into account multiple overlapping governing institutions: cities, counties, school districts, and special districts. Second, I estimate a novel measure of local preferences for cities over time. To assess the impact of ideology on public policy outcomes, I use a within-between random-effects model. This approach allows researchers to model dynamic and cross-sectional effects through a single model. I have three major findings. First, cross-sectional responsiveness exists. Second, I find mixed evidence for dynamic responsiveness. Lastly, I provide suggestive evidence that consolidated governance fosters greater responsiveness. In all, I reframe the responsiveness discussion from a single governing unit to a holistic system of overlapping institutions.
Presented by
Bryant J. Moy
Institution
Washington University in St. Louis

Legislative Support for Environmental Policy Innovation: An Experimental Test for Diffusion through a Cross-State Policy Network

Ishita Gopal and Bruce Desmarais

Abstract
In this registered report we describe a field experiment that has been designed to provide evidence of causal effects underlying the micro-foundations of public policy diffusion across the U.S. states. The aim of our study is to test how and if cross-state legislator level connections serve as a vector through which support for policies diffuses. We measure a novel cross-state legislative network dataset in which two legislators are connected through co-signing environmental policy statements organized by the National Caucus of Environmental Legislators. We propose to survey legislators’ support for policies proposed in other states, and randomize the degree of information included in the policy description regarding support by other legislators in the network. Our study is situated to contribute to our understanding of state legislative politics, policy networks, and interest group politics. We focus on environmental policy due to the inherently nationalized consequences of state and local policy innovations.
Presented by
Bruce Desmarais
Institution
Pennsylvania State University

Let them eat pie: addressing the partial contestation problem in multiparty electoral contests

Ali Kagalwala, Thiago Moreira, Guy Whitten

Abstract
Although most scholars acknowledge that district-level election results in multiparty systems are compositional variables, few model them as such. Instead, they tend to model the shares of single parties across districts because of partial contestation, situations in which not all parties field candidates in every electoral district. In this project, we argue that partial contestations are examples of selection processes and evaluate potential solutions to address this issue. Using Monte Carlo simulations, we show that the conventional approach and easy solutions that slightly change the data are biased if patterns of contestation follow a selection process. The two-stage modeling strategy of compositional data that we propose, in turn, recovers unbiased estimators.
Presented by
Thiago Moreira
Institution
Texas A&M

The Importance of Dyadic Representation: Evidence from America's Opioid Epidemic

Rachel Porter

Abstract
Since the late 1990’s nearly one million Americans have died of an opioid-related drug overdose—and deaths have only continued to surge since the Covid-19 pandemic took hold of the United States. Just this week, the Center for Disease Control (CDC) reported a record 93,300 American died of an opioid overdose in 2020. In this project, I employ an original collection of text data from campaign websites to assess congressional candidates’ proposed solutions to the opioid epidemic. I pair new data on campaign platform text with a keyword assisted topic model (keyATM) by Shusei, Imai, and Sasaki (2020) to investigate the content of candidates’ discussions of opioid-related issues. I find that partisanship and geographic context play a key role in shaping candidates’ proposed solutions to the opioid epidemic. On their campaign websites, Democrats and Republicans tended to use their discussions of opioid epidemic as a conduit to tout other “party-owned” issues. Conversely, I show that candidates hailing from states with higher rates of opioid-related deaths were much more likely to focus their opioid-related text on topics like alternative pain management, community-based recovery, and the adoption of drug courts—the kinds of solutions purported by entities likes the CDC, AMA, and HHS. My results demonstrate that elites are responsive to the needs of their constituents and, additionally, highlight the vital nature of dyadic representation. Normatively, these findings signal a lack of collective advocacy for the kinds of solutions best suited for combatting the opioid epidemic and may point to future struggles in fighting this public health crisis.
Presented by
Rachel Porter
Institution
University of North Carolina at Chapel Hill

Back to top

Family ties: Fetching Political Dynasties Names using Text as Data

Marcus Vinícius de Sá Torres

Abstract
Presented by
Marcus Vinícius de Sá Torres
Institution
Universidade Federal de Pernambuco

Legislators' Sentiment Analysis Supervised by Legislators

Akitaka Matsuo and Kentaro Fukumoto

Abstract
The sentiment expressed in legislator’s speech is informative, in particular in a legislature with partisan discipline. But extracting legislators’ sentiment requires polarity dictionaries or labeled data, which are labor-intensive to construct and could be subjective. To address this challenge, we propose a research design to exploit closing debates on a bill, where legislators themselves label their speech by pro or con. We apply our method to the corpora of all speeches in the Japanese national legislature, 1955--2014. After establishing the face validity of our sentiment scores, we show that, to a moderate degree, government backbenchers and opposition members get more polarized as the next election is approaching, although both sides come together towards the end of a legislative session.
Presented by
Kentaro Fukumoto <kentaro.fukumoto@gakushuin.ac.jp>
Institution
University of Essex and Gakushuin University

Measuring the Influence of Individual BureaucratsWith Historical Documents

Clara Suong

Abstract
How do individual bureaucrats affect foreign policy outcomes? In this paper, I introduce new measures of individual bureaucrats’ influence on policymaking. Specifically, I measure the different types of influence U.S. ambassadors have on foreign policy outcomes using the Foreign Relations of the United States corpus. These measures disaggregate ambassadorial influence by each ambassador's diplomatic experience, expertise, advice they provide, and authority delegated from leaders. Using these measures, I show a causal relationship between ambassadorial influence and changes in foreign policy outcomes, finding evidence of significant ambassadorial influence on military aid directed to and economic sanctions imposed on host countries.
Presented by
Clara Suong <clara.suong@vt.edu>
Institution
Virginia Tech

Sensitivity Analysis to Sample Selection Bias

Oliver Rittmann

Abstract
Social scientist often wish to estimate causal effects based on data that suffers from nonrandom sample selection. A well established finding of previous research is that this may lead to biased estimates if not accounted for. Available methods to account for endogeneous sample selection, such as the Heckman selection model, only work under fortunate circumstances: Researchers must be able to explicitely model the sample selection mechanism with variables that do not occur in the outcome equation. For many reasons, this is oftentimes not possible. Researchers may not know the selection mechanism, they may not observe the variables necessary to model the selection mechanism, or the variables that affect sample selection also confound the causal effect in the outcome equation.

This article extends existing approaches to sensitivity analysis to enable researchers to assess how violations of the random sampling assumption alter their estimates. The approach combines ideas from sensitivity analysis for omitted variable bias with the Heckman selection model. In the spirit of Heckman, sample selection bias is perceived as a special form of the omitted variable bias. As a consequence, tools to assess the sensitvity of estimates to violations of the no omitted variables assumption can be adjusted for the assessment of the severity of potential sample selection bias. The approach equips researchers with a tool to make educated and concrete statements about how robust their causal estimates are to violations of the random sampling assumption.

Graphical tools facilitate convenient communication of estimation sensitivity to sample selection bias. To demonstrate the approach, I present applications with simulated and empirical data.

MATHEMATICAL APPENDIX: https://www.dropbox.com/s/5r5tntakm3nh9zj/math_appendix.pdf?dl=0
Presented by
Oliver Rittmann
Institution
University of Mannheim

The Incumbent-Challenger and the Incumbent-Runner-up Advantage: Regression Discontinuity Estimation and Bounds

Leandro De Magalhaes

Abstract
The causal effect of becoming the incumbent versus being the runner-up can be estimated straightforwardly with Regression Discontinuity (RD). The effect of being the incumbent versus being the challenger can be approximated with an RD that compares winners and runners-up under certain assumptions. For example, one must impute the counterfactual success rate of runners-up who are compliers (rerun only if incumbent). We propose bounds for both estimates and show that the upper bound for both concepts are equivalent. This allows us to perform cross-country comparisons and to provide correlates of the incumbency advantage.
Presented by
Leandro De Magalhaes
Institution
University of Bristol

Back to top

A Subnational Measure of Corruption in China Using News Report

Rosemary Pang

Abstract
Most existing measures of corruption are at national level, while subnational level measures of corruption are still underdeveloped. This paper develops a subnational measure of corruption in China using news reports from 2010 to 2015. After identifying the names of corrupt officials, their positions, the amount of corrupt money involved, and the province where corruption happened using Natural Language Processing approach, this paper builds a latent score of corruption in each province using Continuous Response Model. Compared to previous subnational measures of corruption in China, this measure provides higher within-province variation. It also incorporates detailed information about the identity of corrupt officials, which may influence citizens’ perceptions of corruption and further shape their level of satisfaction with the local government. This new measure of corruption contributes to multiple corruption-related research domains, including understanding corruption influences economic development, social movement formation, as well as the diffusion of corruption within a country. While this project focuses on China and covers only a short time period, this design is generalizable and scalable such that it would be relatively easily expanded to other settings and other time periods.
Presented by
Rosemary Pang
Institution
Pennsylvania State University

Click, click boom: Using Wikipedia metadata to predict changes in battle-related deaths

Christian Oswald, Daniel Ohrenhofer

Abstract
Data and methods development are key to improve our ability to forecast conflict. Relatively new data sources such as mobile phone and social media data or images have received widespread attention in conflict research recently. Such data do oftentimes not cover substantial parts of the globe or they are difficult to obtain and manipulate which makes regular updating challenging. These sometimes vast amounts of data can also be computationally and financially costly. The data source we propose instead is cheap, readily and openly available, updated in real-time, and it provides global coverage: Wikipedia. We argue that the number of country page views can be regarded as a measure of increased interest or salience whereas the number of page changes can be regarded as a measure of controversy between competing political views. We expect these predictors to be particularly successful in capturing tensions before a conflict escalates or after a period of peace is followed by violence again, for instance electoral violence. Predicting fatalities after calm periods is particularly challenging as past violence is not a suitable predictor. We test our argument by predicting changes in battle-related deaths in Africa on the country-month level. We find evidence that country page views do increase predictive performance while page changes do not. Contrary to our expectation, our model seems to capture long-term trends better than sharp short-term changes.
Presented by
Christian Oswald
Institution
Trinity College Dublin

Computational Game Theory to Study Empirical Elections

Fabricio Vasselai

Abstract
We propose a novel flexible way of simulating electoral results that accounts for strategic voting and strategic abstention. Our technique resorts to a sub-field of Artificial Intelligence known as Multi-Agent Systems, as a framework to translate canonical Game Theory models of voting into computer simulations. Specifically, we implement SMD (Myerson and Weber, 1993), SNTV (Cox, 1994) and Run-off (Bouton, 2013) models as iterative discrete-time algorithms, extending those to include strategic abstention - following ideas in Palfrey and Rosenthal (1985) and Demichelis and Dhillon (2010) - and sincere voters. Then, we both generalize Myerson’s (1998) two-candidate Poisson pivotal probabilities to multi-candidates, with multi-way ties, and to SNTV and Runoff, as well as propose heuristics for those probability calculations. Finally, we illustrate how those simulations can be used to study actual empirical elections.
Presented by
Fabricio Vasselai
Institution
University of Michigan

A graph-theoretic approach to causal inference under interference

David Puelz

Abstract
Interference exists when a unit’s outcome depends on another unit’s treatment assignment. For example, intensive policing on one street could have a spillover effect on neighboring streets. Classical randomization tests typically break down in this setting because many null hypotheses of interest are no longer sharp under interference. A promising alternative is to instead construct a conditional randomization test on a subset of units and assignments for which a given null hypothesis is sharp. Finding these subsets is challenging, however, and existing methods are limited to special cases or have limited power. In this paper, we propose valid and easy-to- implement randomization tests for a general class of null hypotheses under arbitrary interference between units. Our key idea is to represent the hypothesis of interest as a bipartite graph between units and assignments, and to find an appropriate biclique of this graph. Importantly, the null hypothesis is sharp within this biclique, enabling conditional randomization-based tests. We also connect the size of the biclique to statistical power. Moreover, we can apply off-the-shelf graph clustering methods to find such bicliques efficiently and at scale. We illustrate our approach in settings with clustered interference and show advantages over methods designed specifically for that setting. We then apply our method to a large-scale policing experiment in Medellín, Colombia, where interference has a spatial structure.
Presented by
David Puelz
Institution
University of Chicago