Biobase is a digital product that helps scientists during the drug development process to have easy access to the relevant data. It links the data from different data sources form different product stages (development, production, etc.) via harmonization to a combined data set. This data is/will be made available with the corresponding analyses and graphs.
The product is developed along a roadmap with different use cases that reflect the corresponding needs in the relevant business departments.
The overall goal is to make all the existing data available to reduce the effort of collecting the data and creating the analyses and to make analyses possible that were not possible in the past to generate additional knowledge in development and production of biological drugs.
The product is being developed in SCRUM, with a dedicated Scrum Master and a Product Owner. We have an three week Sprint in which the team member work self-organized and collaborative, with the following ceremonies:
1. Daily standup, 15 to 30 mins every week day, starting at 09:30 in the daily standup, all team members provide an update on the progress, ask for collaboration and gives information. After the daily short discussions on special topics take place in the Daily Parking Lot. After that, the contractor works iteratively on his tasks.
2. The Review takes place twice during the Sprint; one after two weeks and one at the end of the Sprint. The team member can attend according to their needs. After that, the contractor works iteratively on his tasks.
3. Three weekly Sprint Retrospective, 1,5 hours: All team members discuss how the previous sprint went, what challenges occurred and actions for the next sprint are defined. The contractor will also provide his input in the discussion. After that, the contractor works iteratively on his tasks.
4. Three weekly Story Refinement: The Stories are discussed in the team, described in more detail and assessed with effort values. The Stories are the put in priority order for the planning. After that, the contractor works iteratively on his tasks.
5. Three weekly Sprint Planning: The Product Owner and the Development Team are defining what is the priority of each refined story and could describe sub-Tasks. After that, the contractor works iteratively on his tasks.
The main tasks delivered during the assignment are (in alignment with data quality measures state of the art):
- Iterative development of data processing pipeline in Apache Hadoop ecosystem based on User stories using Apache Spark and GraphX. The main programming languages are Scala and python.
- Definition, implementation and execution of tests for the desired functionality
- Data exploration using SQL, Jupyter, Apache Spark
- Participating in code reviews: Reviewing code from other team members
- Participating in code reviews: Present technical details of the implemented solution
- Automation of data pipeline with Airflow
- Monitoring of data pipeline and troubleshooting