Skip to content

What is Data Erasure and Why It's Important

08.29.24

By: Jennifer Zhang, Senior Research and Data Analyst 

No one should be invisible in data. But data, or the lack of it, can erase certain communities while overrepresenting others. Systemic exclusion from data affects all groups, and there is no true representation without the inclusion of all. Therefore, it’s crucial that those of us who work with data understand how it can exclude, so we can minimize this effect in our analyses. 

Erasure happens when certain groups are left out of narratives due to gaps in the original data. Although data scientists always contend with limitations when they work with existing data, there are techniques that can help enhance representation. 

Catalyst California improves representation in its research by intentionally disaggregating data, or separating it into smaller parts, based on feedback from communities. Without data disaggregation, data can be skewed or inaccurate, erasing real experiences and leading to inadequate policies, especially for communities affected by systemic racism.  

Over time, we have worked with communities to improve representation in the data we use. In this post, we detail what we have learned from that process.  

Thoughtful Definition and Disaggregation of Populations

U.S. Census Bureau data is the largest public source that Catalyst California and other institutions use in research. Below we list the lessons learned from community on how to intentionally improve inclusiveness when analyzing Census Bureau data.   

  • Data approach: Our best practice is to include estimates for both ‘AIAN alone’ and ‘AIAN alone and in combination with another race’. If we are unable to include both, we compare the estimates and decide intentionally with partners which to use—with ‘AIAN alone or in combination’ often being the preferred definition. The Census Bureau also has data on urban and rural AIAN groups. The experiences of indigenous people living in an urban space versus in more rural areas or at a reservation can vary greatly and should be considered when analyzing AIAN data.  
  • Why this is important: The Census Bureau’s definition of AIAN still does not include all indigenous groups. Because Native Americans have a very high percentage of multiracial people, the Census Bureau’s tendency to use strictly ‘AIAN alone’ results in exclusion of a large portion of multi-racial AIAN people. In fact, current data collection methods exclude more than 75% of Native Americans who also identify as Latinx or another race. Systemic and cultural barriers also limit data collection for this group. The 2020 Census had a significant undercount of AIAN people due to the agency’s failure to fully partner with tribal nations in data collection, among other issues.  
  • Examples of AIAN Data Disaggregation: 
    "We the Resilient," California Native Vote Project, accessed August 19, 2024, https://canativevote.org/what-we-do/research/

Catalyst California calculations of American Community Survey (ACS) table DP05, 2017-2022 5-year estimates, We the Resilient Report Update by CNVP, Publication forthcoming

The lack of comprehensive data disaggregation for the Asian community is one of the most glaring examples of data erasure. While the Census Bureau’s American Community Survey includes more than 50 Asian subgroups, not all Census Bureau data tables can be disaggregated by these subgroups. Sample sizes for Asian subgroups are often too small to be statistically stable for certain population characteristics (e.g., poverty), or for local geographies, making the data difficult to use. Disaggregating Asian data is necessary to reveal the disparities that different Asian groups experience.  

Additional resources on how to get better Asian disaggregated data and why this is important: 

  • Data approach: We use the Census Bureau’s reported ancestry field to create a custom definition for a SWANA category, by taking a list of ancestries that we define to be part of the SWANA region and aggregating estimates for those reported ancestries. This group is also commonly known as Middle Eastern or North African (MENA).   
  • Why this is important: Many people from the Southwest Asian ("Middle East") and North African regions do not see themselves represented in the census, especially under the race question. As a result, they may identify as ‘White’ or ‘Other Race,’ or not identify themselves at all for lack of options. This renders invisible a huge group of people who we know face unique experiences. Wars and conflict in the Southwest Asian and North African region have driven many people to the United States as refugees, where their identities have been negatively stereotyped. Without proper data disaggregation and accurate self-identification, this group's invisibility can hinder effective policymaking and systemic change efforts. 
  • Examples of SWANA data disaggregation: 
    Queer Crescent nation-wide survey of LGBTQI+ Muslims  

Catalyst California, Bold Vision Mid-Term Report, Forthcoming, 2024. www.boldvisionla.org  

Identifiers for this community have been changing rapidly in recent years. Catalyst California has chosen to use Latinx and will continue monitoring usage to determine if a change is appropriate. 

Call to Action 

Accurate representation of communities of color requires thoughtful data disaggregation in analysis. Using disaggregation techniques informed by community helps ensure these groups are not erased, even within the limits of the data sets. More representative data can promote racial equity and multiracial solidarity by ensuring no groups are minimized. Robust data advocacy can push for systemic changes that allow for better data collection and inclusivity. 

But as we push for those changes, we can all incorporate inclusive principles in how we work with data. Here are a few we use as a starting point:  

  • Be transparent about disaggregation methods and their limitations  
  • Acknowledge community experiences absent from data  
  • Consider including, or at minimum naming, groups with unreliable estimates or small sample sizes rather than excluding them completely 
  • Use proxy variables and methods when data is not available by race or for certain groups 

Disaggregating data is just the first step; how you frame and contextualize it is crucial for telling a complete story. Check our next blog for our best data storytelling practices or revisit our first blog for a history of census race categories

 

Resources from other places 

Data disaggregation requires a tailored approach for each project and data source, as each has unique needs and limitations. However, key principles remain consistent. Below are additional guides for data disaggregation and research. When looking to disaggregate data for a specific community, always seek out recommendations directly from groups working in partnership with those communities. 

Thank you to Tessie Borden, our Senior Communications Manager and to Elycia Mulholland Graves, Chris Ringewald, Maria T. Khan, and Alexandra Baker from our Research and Data Analysis Team for their contributions.