Mastercard’s Ngoc Minh Tran discusses his role as lead data scientist and the growing importance of ‘model explainability’ in AI and analytics.
Ngoc Minh Tran is a lead data scientist at Mastercard. Minh Tran began his career with a PhD in applied statistics/machine learning and has over 15 years of experience working in data science, with a focus on classical machine learning, deep learning and reinforcement learning.
With an impressive portfolio of achievements – including being the lead inventor on 10 US patents in machine learning domains and the co-inventor of nine more – we asked Minh Tran about what a typical day is like for a lead data scientist.
‘The most important thing for success in AI work is to practise as much as possible on multiple types of data’
If there is such a thing, can you describe a typical day in the job?
I always start a day with planning to work through the list of tasks under my responsibility, together with a target result or outcome. I also like to think about what tech book I am going to read if I manage to have some free time.
As a data scientist lead at Mastercard, I spend up to half of my day in meetings, both with my smaller tech team to support other data scientists to solve technical issues they are facing and with the wider Mastercard product teams to report or catch up on projects and plan for future projects.
The rest of the time is mainly for technical works such as architecting, designing, coding and problem solving.
What AI/analytics skills do you use on a daily basis?
Currently, machine learning and big data are the two AI skills that I am using the most in daily projects, in addition to SQL which is needed for analytics.
However, the most important elements that I am developing in my current role are management and communication skills. I try to pull key learnings that I see from experienced managers and apply them to manage my smaller tech team.
What are the hardest parts of working in AI/analytics, and how do you navigate them?
The hardest part is the data itself. Getting the correct data, and clean data, for the right project is important. This work cannot be solved alone, it needs support from the whole team.
An area of growing importance in the field of AI and data analytics is that of ‘model explainability’ – this concept relates to how we can explain or interpret, in human terms, how a model is reaching its prediction or decision. This is an important part of Mastercard’s commitment to ethical AI, and means that we need to consider how we can achieve an appropriate level of explainability when we embark on developing a new model.
Do you have any productivity tips that help you through the day?
If you have free time, try to read technical books. It’s incredibly productive to use time to read while learning and improving your technical expertise.
I’m a big believer in trying to automate your work as much as possible. Automating repetitive tasks will free you from any mistakes caused by human error while also saving you time and energy that can be used to do other tasks. In my work, I find that I can often use existing tools such as Jenkins and Docker to automate the CI pipeline for running unit tests, or I’ll develop a simple tool in Python or some other scripting language to do it myself.
What skills and tools are you using to communicate daily with your colleagues?
Tools like Jira, ALM and Confluence are very efficient for daily work collaboration. While Jira and ALM help to manage the progress of projects efficiently, Confluence is a place where you can store documents and share knowledge with your colleagues.
These tools are very useful for advancing the efficiency of our collaboration besides other common communication tools such as email, instant messaging and coffee tables.
How has this role changed as the AI/analytics sector has grown and evolved?
In the past, data scientists were often familiar with data science skills but not engineering skills. However, ‘full stack’ data scientists nowadays should train themselves for the engineering skills such as Docker containerisation, Kubernetes, AWS cloud and CICD (machine learning/dev-ops).
‘If you have free time, try to read technical books’
It is also recommended that data scientists should practice standard coding styles (such as using Git professionally, following PEP8 rules, writing unit tests, etc.)
What do you enjoy most about working in AI/analytics?
Joining an AI project and knowing that I am creating a product that is useful for our community makes me happy. I used to contribute to an open-source AI project which has thousands of users, so I feel like I am helping people to make the world a better place.
I also push myself to continuously innovate and I work hard to secure more and more patents for the company for the newest technology.
What advice would you give to someone who wants to work in AI/analytics?
The most important thing for success in AI work is to practice as much as possible on multiple types of data. For example, joining Kaggle competitions is a good starting point for junior data scientists, where they have multiple types of free data to practise with. Furthermore, success in these competitions also gives people more confidence and advances their career. In addition, practical tutorials of deep learning frameworks such as Tensorflow and Pytorch are also helpful to practice in AI.
Don’t forget to always be innovative – writing patents will help refresh your mind and keep you motivated.
In addition, I recommend that people working in AI should read. I like to read good technical books and often read them twice. This habit helps extend your knowledge efficiently to help your job and your career in the long term. Pattern Recognition by Christopher Bishop is an enjoyable book that helps to master machine learning and I must have read this at least three times. Deep Learning by Ian Goodfellow, Aaron Courville and Yoshua Bengio is also a book that you cannot miss when working in AI.
10 things you need to know direct to your inbox every weekday. Sign up for the, Silicon Republic’s digest of essential sci-tech news.
#hardest #part #working #analytics #data