Please note: the service contract for this position will not be concluded with Boehringer Ingelheim International GmbH but with GULP Consulting Services GmbH

Boehringer Ingelheim

Senior Data Scientist (m/f/n)

Posted Jun 27, 2022
Project ID: BIJP00000816
Location
Ingelheim am Rhein
Duration
8 months
(Jun 15, 2022 - Feb 28, 2023)
Hours/week
25 hrs/week
Payrate range
Unknown
Application Deadline: Jun 15, 2022 12:00 AM

We are embarking on two separate Data Science topics (Deviations Recurrence Check, Deviations Trending) in our project that seek to employ Natural Language Processing (NLP) + clustering algorithms to group similar records together in our Global Quality management system (QMS).
The contractor will receive the entire data set out of the QMS for the fulfillment of the services

The Deviations Recurrence Check seeks to replace the existing manual process whereby users determine whether or not a problem is recurrent. For example, for each new deviation, the Deviations workflow user must determine if this deviation is related to existing deviations in order to ascertain whether or not a problem is ongoing. From the QMS perspective, when the user has the basic, required fields for a new deviation filled out, they would click a button that fires a web service using the results of the clustering algorithm to return a list of similar Deviations for user review purposes.
The goal of this Deviations Recurrence Check project is to give our users a function that associate a current/new deviation with existing deviations in order to determine if a problem continues to occur. For example, if I have a deviation focused on broken tablets, we want the function to find all records related to the problem (or: trend) broken tablets in order to determine if this a new problem or continuing problem. By extension, we could use this information to determine if a CAPA was effective.

The contractor needs to employ a clustering algorithm(s) 

  • by checking different algorithm methods based on the contractor's experiences and best practices,
  • by considering that QMS system has multiple language content (for example long-text fields) for sematic processing
  • by proposing drafts which needs presented to the project team for review and usability
  • by adapting the draft based on the delivered feedback from the project team
  • by handing over the final version to final approval by project team.
     

The Deviations Trending project seeks to find problems in the sea of data that is QMS. We envision a weekly/nightly-retrained model that analyzes the complete set of quality data (Deviations, Complaints, Audits, Investigations, OOX, Supplier Qualifications, Events, Non-Conformities, OOXs) in order to identify problem trends (e.g., broken tablets caused by a specific model of machine). We want to achieve primary cluster group which we are able to drill down to secondary or third cluster groups. Perform a clustering algorithm on the dataset to divide them into groups of problems/trends. Our plan is to use HDBSCAN to three in order to get very granular trends. The contractor will receive an example of the clustering levels. The overarching goal is to automatically analyze our QMS data and find abnormalities which cannot be easily identified with simple search functionality in the QMS.
 

The contractor needs to employ a clustering algorithm(s) 

  • by checking algorithm methods based on the contractor?s experiences and best practices,
  • by clustering records in scope of algorithm using previously defined key words over period of time ( e.g. we found 50 -broken tablet' records in the last 12 months).and counting clustered records (e.g. 5 records in January, 7 in Feb, 8 in Mar, 10 in Apr, etc.
  • by proposing drafts which needs presented to the project team for review and usabilit
  • by adapting the draft based on the delivered feedback from the project team
  • by handing over the final version to final approval by project team.

Similar projects

+ Search all projects