Find out a name’s likely gender using Natural Language Processing in Tensorflow, Plotly Dash, and Heroku.
Choosing a name for your child is one of the most stressful decisions you’ll have to make as a new parent. Especially for a data-driven guy like me, having to decide on a name without any prior data about my child’s character and preferences is a nightmare come true!
Since my first name starts with “Marie,” I’ve gone through countless experiences of people addressing me as “Miss” over emails and text only to be disappointed to realize that I’m actually a guy when we finally meet or talk 😜. So, when my wife and I were researching names for our baby girl, an important question we asked ourselves was:
Will people be able to identify that the name refers to a girl and not a boy?
It turns out we can use Machine Learning to help us check if potential names would be associate more with boys or girls! To check out the app I’ve built to do exactly this, please head over to https://www.boyorgirl.xyz.
The rest of this post talks about the technical details, including
- Obtaining a name to the gender training dataset
- Preprocessing the names to make them compatible with Machine Learning (ML) models
- Developing a Natural Language Processing (NLP) ML model to read in a name and output if it’s a boy’s name or a girl’s name
- Building a simple web app for people to interact with the model
- Publishing the app on the internet
To train any Machine Learning model, we need a large quantity of labeled data. In this case, we need a large number of names and the associated gender of that name. Luckily, Google Cloud’s Bigquery has a free open dataset called
USA_NAMES [Link] that “contains all names from Social Security card applications for births that occurred in the United States.” The dataset contains roughly 35000 names and the associated gender, which works very well for our model.
Human names are textual data, while ML models can only work with numeric data. To convert our text into a numeric representation, we’ll do the following steps.
- Lowercase the name since each character’s case doesn’t convey any information about a person’s gender.
- Split each character: The basic idea of the ML model we’re building is to read characters in a name to identify…
Continue reading: https://towardsdatascience.com/boy-or-girl-a-machine-learning-web-app-to-detect-gender-from-name-16dc0331716c?source=rss—-7f60cf5620c9—4