Data Equity: Six steps beyond data disaggregation

April 1, 2021

Woman that are members of the young mothers, and breast feeding women group gather regularly to discuss sexual reproductive health, and family planning options. Photo credit: Jonathan Torgovnik/Getty Images/Images of Empowerment

We’ve just wrapped up Women’s History Month. This year’s celebration stood out as having an exciting focus on data. Whether taking stock of “what we know and don’t know about women” globally or asking why sex-disaggregated data aren’t sexy—data were front and center.

When the pursuit of equity touches on data—for women or other groups facing structural barriers to opportunity—there is usually a big focus on disaggregation. Disaggregation means breaking down information into smaller sub-populations, for example by age or gender. This is a natural instinct since we need to see people more clearly in the data to understand their conditions and address their needs. We see disaggregation prioritized in the links above, in calls for data to pave the way for a more equitable recovery from COVID-19, and as a way to advance racial justice. It is central to the Biden / Harris administration’s Executive Order On Advancing Racial Equity, which reads “Many Federal datasets are not disaggregated by race, ethnicity, gender, disability, income, veteran status, or other key demographic variables. This lack of data has cascading effects and impedes efforts to measure and advance equity.” Yet data disaggregation is just one of many aspects of data equity.

In the spirit of this year’s International Women’s Day theme #ChooseToChallenge, I challenged myself to think more expansively about data equity. Once I recognized the challenge, it felt like answers were all around me[1]. Here I share six aspects of data equity beyond disaggregation that apply to gender equity, to racial equity, and to any effort to see and support a community that struggles for justice.

Voice. Affected communities should have a say in what data are collected, how data are used, and how their own identities and experiences are captured. In A Human Rights-Based Approach to Data, UN Human Rights advocates for “involving ethnic minorities and marginalized groups in all aspects of data collection activities” (h/t SDSN TReNDS). The data-driven COVID-19 response has inspired new structures and movements to expand the voices involved in decision-making around data use. New York City experimented with a Data Assembly to “solicit diverse, actionable public input on data re-use for crisis response.” With leadership from Open Institute and Amnesty Kenya, civil society groups in Africa have come together around the #RestoreDataRights campaign that demands that “the public is consulted on how sensitive data (e.g. location data derived from mobile phones) are shared and used to tackle COVID-19.”
Asset-framing. Inspired by a workshop with Trabian Shorters on asset-framing, I have been looking at data with new eyes. He reminds us that, in an effort to describe or gain support for the communities we serve, we often lean on data that highlight their deficits and challenges. We see this in describing the challenges of women and girls around the world. I see it by local officials in my hometown of San Francisco, two of whom recently released a report entitled Investment of Funds to Support the Black Community in San Francisco. The reports highlight “recommendations, research and data raised through the community input process to prioritize resources to the Black community.” But every single data point is about a deficit or a struggle—school dropout rates and suspensions, low incomes, high mortality rates, incarcerations, etc. Public officials don’t fall into this deficit-framing trap because they are bad; rather because the trap is so easy to fall into. It is the norm. Shorters argues that deficit-framing perpetuates stigma and narrow narratives about people in the very effort to support them. He encourages us to use data that speak to people’s aspirations and assets. For example, did you know that Black Americans are much more likely to serve the nation in military and civilian roles, making them stand-outs for patriotism? And that Black Americans have higher rates of charitable giving and volunteerism than other groups? Data equity requires us to use data that expands narratives and reduces stigma about communities facing injustice.
Structures vs. individuals. I recently had the fortune of attending a virtual session with Kimberlé Crenshaw, lawyer, civil rights advocate, philosopher, and a leading scholar of critical race theory who developed the theory of intersectionality. Crenshaw reminds us that structural racism often gets individualized, putting both blame and the burden of improvement on individuals, rather than on the racist structures holding them down. There are parallels in the “focus on women and girls” in global development. It is hard to deny the importance of empowering women and girls, yet we cannot let the focus on individuals distract us from addressing the sexist policies and structures that generations of individual women and girls face no matter how empowered they are. As we get better at seeing individuals in data, we must ensure we don’t push all the weight of their liberation on them as individuals.
Governance. Data governance goes beyond laws, regulations, and headline-grabbing data breaches. It includes ongoing, behind-the-scenes decision-making about who is allowed to access and use data, and for what purpose. As Catherine D’Ignazio and Lauren F. Klein argue in this interview about their book Data Feminism, being seen in the data is liberating if you are being served, and threatening if you are being persecuted. They call this the “paradox of exposure.” To manage this paradox in an equitable way, we need transparent, representative, mechanisms to make decisions about data use. Teki Akuetteh Falconer, executive director of Africa Digital Rights Hub, gave a great example of this during a UN World Data Forum session, Balancing data use and data protection – Learning from African Experiences. She describes a collaboration among the Ghana Statistical Service, Vodafone Ghana, and Flowminder to use call detail records to understand mobility patterns in the context of COVID-19 restrictions. While the overall project operated within the bounds of the European General Data Protection Regulation, Ghanaian data protection laws, and detailed legal agreements, none of these guided the granular decisions about who could use the data, and for what. Faced with increasing requests for access to the data, a steering committee came together, comprised of governmental and non-governmental actors, to decide together how data would be shared and used based on mutually agreed-on criteria.
Combating bias. In her stellar piece 5 Steps to Take as an Antiracist Data Scientist Emily Hadley argues that to address bias in data, we first need to acknowledge bias exists and is causing real harm. In criminal justice, health, facial recognition, and many other places, she reminds us that “data and algorithms have been used to perpetuate racism and racist societal structures.” This is what motivates the #NoMoreDataWeapons campaign launched by Data 4 Black Lives and other efforts to sound the alarm about algorithmic bias. It can feel daunting to weed out bias that has such deep roots. As one step, the We All Count Data Equity Framework guides on how to detect and counter bias at seven junctures common to data projects, starting from who funds data collection through to how data and derivative insights are communicated.
Humility. In the U.S. and around the globe, communities that face structural barriers to opportunity have heard breathless optimism about how [name your favorite solution] will address their woes. In using data to advance equity, we mustn’t fall into the same trap. Data aren’t a panacea. I am inspired by WeRobotics’ blog “So You’re Thinking of Using Drones in Response to COVID-19? Why?” You can [insert your favorite tech or data solution] into their framework to consider whether your solution will actually be helpful. Chris Chibwana, former Head of Africa for IDinsight reminds us in his reflections on three years of informing decisions with data and evidence that we must “recognize the limits of our personal knowledge, especially compared to local and lived experience.” And Emily Hadley puts it plainly that “to be antiracist data scientists (or producers or users or advocates), we must take the steps to be antiracist individuals.”

Whether we bring data into Women’s History Month, Black History and Futures Month, or the day-to-day struggles for equity—data disaggregation is just the first step. Thanks to all the great minds referenced above, I see that a fuller vision for data equity requires (at least) elevating the voices of people reflected in the data, and making sure the data reflect their aspirations and assets not just their challenges. It includes using data to describe the societal structures that foster injustice, not just to shine a spotlight on the individuals struggling within them. It requires admitting and combating bias within data and having inclusive and transparent governance over how data are used. And it requires all of us—particularly those of us endowed with enormous privilege, to be humble in our efforts even as we are ambitious in our aspirations for equity.

[1] Special thanks to Ruth Levine who is a master at curating and sharing content and ideas, including on the topic of data equity. Her practice of sharing knowledge is both individually tailored and beautifully field-building in spirit and impact.