Do you want to go further from traditional text preprocessing approaches? This deep dive tutorial on Tweet preprocessing is for you. In our next event, we will host Assistant Professor Steven Wilson from Oakland University. We will examine how informal social media text poses challenges for traditional preprocessing methods and explore several approaches to complement state-of-the-art computational text analyses. We will show several Python libraries and some useful codes to apply to real data from Twitter.
Many traditional text preprocessing pipelines remove special characters, out of vocabulary words, emojis, URLs, special social media features like mentions and hashtags. In some cases, these are replaced with standard tokens like <URL> or <OOV> instead of removing them. If your goal is just to examine the standard language, this would be enough for you. Is there a way to encode these removed parts of texts? This tutorial will show you alternative approaches you can take to handling these features.
Oakland University
Asst. Prof.
Steve study online communication using Natural Language Processing methods with a focus on understanding and incorporating the social context of text data and the people that create it.
He completed his Ph.D. at the University of Michigan where he was a member of the LIT Lab. After that, he spent a couple of years working as a postdoc in the SMASH group at the University of Edinburgh.
Northeastern University
Istanbul Twitter Developer Community Lead