I had the opportunity to work for TD Bank, Mount Laurel, NJ as their full-time data analyst in the Enterprise Data& Analytics Department. I worked for 10 weeks(from June 3rd to August 9th).
In the Hadoop big data environment, I mainly used Apache Hive and Impala through Hue or CML to perform ad-hoc queries and to conduct data explorations to support daily tasks and answering questions, such as partition data source by date to reduce computing time. I received training on data wrangling using Trifacta and I received the level 1 certification. Trifacta makes me easily standardized the data without traditional IT heavy effort. Through a project and training, I gained knowledge of Hadoop, Apache, and the data ecosystem for a bank. I learned how data are ingested into the data lake through what pipeline. How would we store, manage and use data securely. How do we import data to an analytical space and how to output findings. We used Confluence, Jira, and bitbucket to share our findings, to practice Agile methodology and to store our code. I also received training on Pyspark and run my first machine learning tasks with the Titanic datasets using Jupyter notebook.
I initiated a project called Olona that will integrate six data sources together to produce a clearer customer journey of how TD customers interact with the bank from the perspective of an omnichannel, such as gateway, web, mobile, teller system and US BDW. Due to the fact that there isn’t an enterprise-level primary key to join all the data sources. I identified relevant keys and method to join the data sources together. I presented findings to a Canadian team that is on a similar project. The project will finally have all the information from different channels about one customer, therefore, enhance our understanding of customers to further eliminate their pain points and improve user experience. One example would be identifying the region where people go to the brach more frequently than using a digital channel and find out why. Maintaining a branch is more expensive than digital channels so we would like to convert people to use digital channels more, therefore, realizing tremendous cost-saving.