Python Lambda Functions & Regex for Data Hygiene

Not having a rigorous process in place for data hygiene can lead to unnecessary blockers in data pipeline building. I experienced this with my team when delving into natural language processing (NLP) and text analytics work. We faced challenges when it came to receiving data from different sources such as email, chat transcripts, social media posts, reviews, etc. We found it especially helpful to collaborate and compile useful Python lambda functions and regex cleaning code snippets to use for different datasets. We also learned it’s important to be discerning when applying data hygiene protocols as you do not want to accidentally remove text you intended to keep.

Screen Shot 2019-10-01 at 7.11.07 PM

Comments are closed.

Create a website or blog at WordPress.com

Up ↑