OkCupid Data Analysis

Data Processing

Cisco ISE

Scenario

You are part of a business intelligence team at okcupid.com.  The team has been asked to make an in-depth exploratory analysis of site users.  The goal of the marketing team is to create micro-segments and personas for future campaigns.  Keep in mind, interesting data correlations may not be beneficial in a marketing context.  For example, identifying 5 users with specific attributes may be interesting but hardly a segment worth attracting.

You are asked to examine the data, clean it, use supplemental data to enrich the data then identify 4 or more interesting insights from the user data.  All relevant cleaning, enriching  and EDA steps along with the 4 insightful data nuances should be organized into a presentation.  You will present to the head of marketing who is looking for an “ah –ha” persona or previously unknown data relationship among two or more interactions.   As the head of marketing, relevant information is consumed visually instead of in table form.  Thus, your presentation should include visualizations when appropriate.  You will need to turn in code and PowerPoint slides.

okCupid heart to data

Data

Source: https://www.researchgate.net/publication/309668116_The_OKCupid_dataset_A_very_large_public_dataset_of_dating_site_users

This data set was scraped from user profiles.  At the time, OKCupid did not authorize the data to be collected.  After the data was released as part of academic literature, the data was authorized to be used by OKCupid.com .  

As a result, there is some moral ambiguity related to the use of the dataset.

The data set your business analysis team is using has been authorized, cleaned and anonymized.  

The original data, publication, code, and codebook was obtained here: https://github.com/rudeboybert/JSE_OkCupid

To get the data run the following in your console once you have set your working directory:

profiles <- read.csv('profiles.csv')

Presentation goals

·      Organization – Was the presentation well organized?

·      Delivery – Was the content delivered clearly and persuasively with the audience in mind?

·      Documentation – Was the data mined to support the conclusion, 4 unique insights identified?

·      Data Mining Process – Was the approach to the problem similar (as applicable) to steps outlined in page 19 of the book?

marketing team's goalPublic R-Studio examination of the data example

https://rstudio-pubs-static.s3.amazonaws.com/209370_b62220c849b946088b463fdbec935848.html

Keynote Presentation


Program insights.R
GitHub Repository

Contact Me

Lets Work Together

Thank you! Your submission has been received!
Oops! Something went wrong while submitting the form.

Stay in touch

Ready to Talk

Feel free to contact me