Social Media Analytics

A partner, Saghar Tamaddon, and I analyzed social media (Twitter) reputations of Corporate Social Responsibility for six companies. We then created a platform, called Echolytics, on which to report the findings. This experiment involved several components: ETL/data collection, natural language processing (NLP), database development and information visualization. We applied skills from information retrieval, data cleansing, data architecture, programming, data analysis and algorithm development, to needs analysis and interface design. My contributions are outlined in the following sections.

Click a component below to learn more.

+ Information Visualization


Echolytics' graphic visualizations included a pair of two-by-two matrices and several anontated Tableau charts for each of the six companies evaluated. I crunched the numbers underlying the Tableau graphics, created them in Tableau, investigated prominent spikes or dips and annotated the charts accordingly with analytical insights. I later formatted the Tableau graphics to fit the webpage window and inserted them into the HTML.

I designed the two-by-two matrices and calculated 75% of the matrices values. Saghar Tamaddon researched the remaining 25% of the matrices values. I created the JSON file with these values that populates the online matrices. This image shows the original design which differs slightly from the online implementation.

Information Visualization Report


Aside from the graphics mentioned above, the bulk of the website was built by a third party, Iris Cheung, on top of a website template. My website-related contributions were sourcing the website template, formatting the Tableau graphics to fit inside the web page window, inserting the charts into the HTML, and providing the JSON data file that populates the values in the two-by-two matrices.

Website (Available for Firefox, Safari & Chrome)

Presentation of Results

My project partner, Saghar Tamaddon, produced a poster, paper and presentation that summarize the project.

+ Natural Language Processing

Natural Language Processing

I wrote a number of Python programs to perform natural language processing (NLP) functions: chunking, part of speech tagging, hash tag extracting, and determining the top words. My partner wrote a program for classification. These, coupled with a training set, allowed us to classify which tweets were or were not about a company (for example, some tweets containing "Chipotle" were referring to Chipotle sauce which is unrelated to the Chipotle Mexican Restaurant chain); whether or not a tweet discussed corporate social responsibility (CSR); and the sentiment of each tweet (Positive, Neutral or Negative).

+ ETL/Data Collection

ETL/Data Collection

Over the course of this project I wrote several Python programs that extract, transform and load (ETL) tweet data from Twitter's API into a database. One sample of this code is below.

Python code

+ Database Development

PostgreSQL Database

The PostgreSQL database I created for Echolytics housed 700,000 (0.7 million) tweets, gathered over a two month period for 20 brand names - six of which were used in the final analysis - spanning 3 industries. The tweets were loaded daily directly from the Python scripts that extracted tweets from Twitter's API. Potential duplicates were flagged as such and metadata such as the search query that retrieved each result and the data it was retrieved were appended to each record. Extensive Postgres queries and calculations were performed inside of the database to cleanse, organize and summarize the data. Some data was formatted and exported for additional manipulations outside of the database.