Data Scientist

Job description

About us

Who we are? We are Big Data experts , working with international clients, creating and leading innovative projects related to the Big Data environment. We offer tailor-made solutions. It does not matter whether we are talking about building a Data Lake, conducting training in the field of data management, or performing detailed Big Data analysis. We don’t just focus on one technology, instead we specialize in a whole range of open-source and public cloud tools. Our team brings together over 130 specialists in their fields. We have participated in dozens of conferences, written countless amounts of code, we are the organizers of Big Data Tech Summit Warsaw, the largest Polish conference related to Big Data topics. We run webinars, share knowledge on blogs, creating whitepapers and more. Why? Because we believe that Big Data is an indispensable future of business.
Thanks to that, we always select the most optimal Big Data solutions.


Team

The GetInData Advanced Analytics team consists of analytics professionals, including:

  • Data Scientists

  • Machine Learning Engineers

  • Business Intelligence developers

  • MLOps enthusiasts

While the main goal of the GetInData and the team is creating value for clients using their data, we also do internal R&D work to grow our skills, create our solutions and practically verify others tools and approaches.

Example of projects

  • Ads personalization engine developed for a mobile app with over 250 million daily active users and 1 Billion daily ads impressions. The model predicted which ad category (e.g. healthcare, cars) a given user is interested in the most and gives such ads a higher chance to be displayed
  • In other project we are developing a Multilingual Neural Machine Translation model which will be translating 30 languages to English. The goal was to understand the context in which digital advertising inventory is sold. The technology stack includes AWS, dedicated neural net architectures and MLOps solutions for scaling and standardising the process
  • For a leading podcast-hosting app we are implementing personalised ads targeting models based on natural language understanding. We use AWS and IBM cloud services, as well as external data providers
  • User Suspension Model developed using Kedro and VertexAI. The model predicts whether the user is fraudulent and should not be granted access to the app. Multiple external data sources are used, such as social media data or user credibility based on a phone number
  • For a large Asian telecom, we developed time-series forecasting models. They are being used to predict hourly traffic volume for each of thousands of BTS in their network

    Apart from commercial projects, we take part in internal initiatives, like:
  • Kaggle competition, where we develop a solution using Google Cloud Platform (Vertex AI, BigQuery, Looker) and our MLops stack (Kedro, MLflow)
  • Creating internal methodologies: Data Science projects methodology, Support for developing Data Driven cultures
  • Knowledge sharing sessions (Advanced Analytics Guild, Analytics Coffee etc.)


Responsibilities

  • Starting assumption is involvement in R&D initiatives utilising modern MLOps stack. The main goal is to deliver value on customer projects
  • Nurture your independence - take calculated risks, bring your ideas instead of just taking directions, take ownership of your growth- we promise to help :) 
  • Be curious - look for opportunities to have fun with the problem you are solving - read, try approaches, learn from failure
  • Be honest and opened - say what you think is worth doing, be critical of your biases and opened to ideas, ask for help when necessary and give help when possible

Technologies used:

  • Core development: Python, Spark, Jupyter Notebook, Git
  • Model productization stack: Kedro, MLflow, Airflow, Kubeflow
  • Cloud stack: GCP, BigQuery, Google Cloud Vertex AI

Requirements

  • Strong analytical and statistical skills
  • Understanding of machine learning models and concepts, e.g. logistic regression, clustering, decision trees, random forest, boosting, regularisation, etc.
  • Experience with machine learning libraries, e.g. scikit-learn, Pandas, Spark ML
  • SQL programming skills
  • Data visualization skills

Nice to have:

  • Experiment design and model productization
  • Experience with Big Data technologies
  • Experience with advanced ML: Deep Learning, NLP
  • Familiarity with software development concepts and best practices
  • Experience with Cloud Platforms: GCP, AWS, Azure
We offer
  • Salary: 90-130 PLN net + VAT/h B2B (depending on knowledge and experience)

  • 100% remote work

  • Elastic working hours

  • Possibility to work from the office located in the heart of Warsaw

  • Opportunity to learn and develop with the best Big Data specialists in Poland

  • International projects

  • Possibility of conducting workshops and training.

  • Clear career path and certifications

  • Co-financing sport card

  • Co-financing health care

  • All equipment needed for work