#8M - Women in Data Science | Datasketch
Logo Datasketch

#8M - Women in Data Science

Let's break gender gaps in data science #UnstoppableWomen #8M

Laura Tamia Ortiz. March 07, 2022.

Data science is the interdisciplinary use of data, artificial intelligence, and analytics. It is used to inform policy and decision-making in the public and private sectors. However, according to recent statistics, women represent only 15% of all data scientists globally. Furthermore, according to UNESCO, only 35% of female students focus on STEM-related subjects in higher education.

Equal representation is crucial. Data can transform societies, but it also has the power to perpetuate harmful and oppressive biases. Machines or AI are susceptible to learning and reproducing human weaknesses, even if these are unconscious.

International Women’s Day commemorates the scientific, cultural, political, historical, and socio-economic achievements of women. However, it also gives visibility to the challenges, such as the pay gap and the shortage of women in science.

  • According to the 2021 Global Gender Gap Report, and on the current trajectory, it will take another 135.6 years to close the gap.
  • The 2021 Women in Data Science Report shows that women face numerous roadblocks to entering the fields of science, technology, engineering, and mathematics. It includes the absence of role models and mentors and limited scholarships and funding to further their professional development.

How to break through the gap

1. Understanding the risks of biased data

Decisions about measuring, collecting, organizing, and analyzing data influence the results obtained and open the door to bias.

Consciously or not, data scientists incorporate their values, interests, and experiences into the data they handle, shaping the results according to their understanding of the world. Unfortunately, the underrepresentation of women in the field increases the risk that data-driven policies will be designed and implemented in ways that harm their interests.

Data are great tools for decision-making, but they do not always provide a comprehensive view of the social, economic, and even labor conditions of the people reflected in them.

We see the simple example given by Tierra Bills, an expert in Transportation Equity at UCLA (University of California, Los Angeles) in the framework of Women in Data Science (WiDS) at Stanford University. She refers to the job opportunities of two people, taking into account the distance to a place where there are job opportunities. It shows how taking samples that do not consider the characteristics of a group of people can influence the results given by the data and decision-making. She invites the use of contextual and probabilistic data, such as the chances of a particular group participating in the sample, to analyze and collect information.

Source: Tierra Bills (UCLA) presentation at Women in Data Science (#WiDS2022)

2. Increasing the training of women in the field

Increasing the number of women in data science means solving another major problem requiring political decisions and wills committed to inclusive educational models.

Decision-makers must ensure that women and girls have the fundamental skills needed to participate in digital technology, including basic literacy and numeracy.

The participation gap for women in this area includes multiple factors, such as cultural biases that discourage women and girls from using technology, lower income levels that prevent women from purchasing digital hardware and software, and less access to education.

Fortunately, multiple initiatives are putting data in the hands of women and girls, giving them the skills they need to move up the skills pyramid and participate in the digital economy.

Still, much work remains to be done to recognize and foster the development that data and digital expertise provide for women.

3. Create diverse teams

Achieving diversity of approaches and viewpoints is critical.

We have already seen that interpreting data sets requires subtlety and that both humans and algorithms can repeat patterns that lead to biased or even dangerous conclusions.

For example, AI algorithms trained on historical data points (such as past hiring decisions) are susceptible to learning and perpetuating existing biases.

Once data sets are biased, they are challenging to fix. Instances of bias are more likely when a homogeneous group of people is in charge of all stages of the data value chain.

One way to reduce bias from the outset is to include diverse experiences and perspectives in teams working with data.

4. Women in leadership positions

Women need a seat at the table and decision-making power. Discussions about gender and data have focused on the need to collect data disaggregated by gender and sex. While these efforts are crucial, more attention needs to be paid to how gender biases can enter downstream, including analysis and use.

The gender gap in data science roles and leadership positions remains an urgent problem. It requires conversations at all levels, from educational institutions awarding degrees to CEOs discussing promotions to leadership positions within companies.

It is critical to create opportunities for girls to start and stay on the STEM path. We should encourage them to develop the data skills and digital expertise needed to participate in all stages of data science.

All organizations and companies can benefit from prioritizing diversity, equity, and inclusion initiatives within their internal culture and hiring processes.


If you want to learn more about this topic, we recommend:

📚To read: Invisible Women: Exposing Data Bias in a World Designed for Men by Carolina Criado-Pérez. It shows examples of gender biases in data that has harmed women in issues that affect everyone’s day-to-day life.

🔊To learn: Women in Data Science (WiDS) at Stanford University (March 7, 2022). This annual conference brings together hundreds of women in data science.

👩🏽To explore:

🎙️ To listen: Data Bytes Podcast - Women in Data. Data stories, professional interviews, and the latest trends in the data world narrated by women.