Skip to content

Student project of the Social Media Analytics web-course by Fitech.io & Aalto University

License

Notifications You must be signed in to change notification settings

andrejkurusiov/social-media-analytics-2020

Repository files navigation

Social Media Analytics 2020 - student project

Overview

The project is centered on Twitter data analysis since it’s the most affordable platform for educational research, and there are several handy tools available for analysis, like TAGS.

Since there’s no possibility to access historical data, I focus on current events (6-12 October (Tue-Mon) 2020), and one of the most prominent ones is a military conflict between Armenia and Azerbaijan over disputed territories, denoted by the “#karabakh OR #artsakh” hashtags on Twitter.


Main points of the exercise

  • read a very large Google Sheets data file and clean it
  • write processed data back to .xlsx file
  • process resulting .xlsx file locally

All processing is done in Python (v. 3.8.) using Jupyter Notebooks.

Obstacles

  1. Very large Google Sheets data file

    When TAGS software generates a Google Sheets data file of around 187000 records, it's almost impossible to work with it in the browser. Therefore one has to download it to local machine.

  2. It's impossible to download a large Google Sheets data file without errors

    Upon downloading to local machine as an .xlsx file, Excel detects errors and at least in MacOS version, it is unable to fix them.

Solutions

  1. Read in Google Sheets data file in Google Colab, process it (clean) and save as an .xlsx file (in order to save space)
  2. Download resulting (much smaller) data file to a local machine
  3. Process data locally with pandas and other libraries, generate WordCloud image

Data and results

  1. Original Google Sheets data sample: data_sample_original_GoogleSheets.png

  2. Downloaded .xlsx data sample: data_sample_downloaded.png

  3. WordCloud resulting image: wordcloud.png

  4. Final project report: PDF

  5. badge