Enter your words and observe the differences that exist in the way we perceive gender.

Using machine learning, we can generate word associations present in a given media source. By looking at those associations we can tell how closely words are related to women or men. The Gender Graph project allows users to plot where words lie on a scale of "he" to "she" based on a selected media source.

Observing this chart clearly reveals that the media commonly associates toxic words with women. We consume this media every day therefore subliminally inherit these biases. Much of our community believes that feminism isn’t relevant anymore as women and men have “equal rights”. Hopefully this scientific evidence will be concrete proof of the disparities that exist in the way we perceive gender, and that we still have a long way to go.

How does it work?

In order for computer to understand english words, they need to be converted to numbers. In particular each word can be represented as a point in multidimensional space. It can be roughly visualized in two dimensions.

Your browser does not support the video tag.

We use the word2vec tool to generate these word vectors based on semantic relationships between words in a given text source. This collection of word vectors is called a model. We wrote custom tool that uses this model to score user words in relationship to given pair of words (in our case he and she).

In order to quantify if a word is more commonly associated with women or men, we can find how far away this word is positioned from “she” and “he”. Mathematically, it can be accomplished by finding the vector direction between “she” and “he”, and projecting user words onto this vector using simple vector properties such as the dot product.

Your browser does not support the video tag.

The length of the projection onto this axis gives us an association score, where values closer to 0.0 are related to “he”, and values closer to 1.0 are related to “she”.

This approach give us a very good picture of semantic biases in the media. However, it is important to understand that in reality these models are not perfect. Factors such as data quantity, quality, and algorithmic imperfections may introduce noise into the model.