This article discusses the advantages of the method Agile Scrum in software development and data science projects, highlighting frequent deliveries, of regular customer feedback, a adapting to change. It describes how Scrum, with its iterative approach, promotes continuous exploration, puts the customer at the center, offers visibility and stimulates innovation.

Agile “methods” are popular in software development projects. Scrum is the most used Agile framework, in almost 60% of cases, and its benefits are numerous:

  • Frequent and regular deliveries of usable functionalities
  • A faster, iterative market launch and therefore regular customer feedback.
  • As a result, a product that meets the needs of the customer; and the list is far from being exhaustive!

As a reminder, a Scrum project is broken down into iterations — called Sprints — lasting 4 weeks or less. The Scrum team is composed of 3 roles that intervene on a permanent basis and collaborate together:

  • The Product Owner is the voice of the customer/user and will translate the customer need in such a way that it is understood by the development team, through Users Stories.
  • The development team is composed of 3 to 9 people whose set of skills make it possible to transform User stories into results that can potentially be delivered to the customer.
  • The Scrum Master Mastering the Scrum framework and ensuring that the Scrum team understands, knows, and applies the framework He is in turn a trainer, coach and much more!

The stages of a Data Science project

A Data Science project is not managed in the same way as a software project, the first being very exploratory in nature, and its steps are very specific:

Step 1: Definition of the Business Problem

As in any project, the aim is to answer the following questions: What are the customer's needs? What problems does he want to solve? What are the priorities? Can they be solved using data? The challenge of this phase, regardless of the method, is to ensure that the business clearly expresses its need and that it is understood by the Data teams.

Step 2: Data collection

This stage is both critical and decisive for the rest of the project. Some projects can end there if we notice that the data we have does not allow us to respond to the problem, or even if the Data we need does not exist. There are also questions of data accessibility and compliance with the RGPD because existing data does not always mean that it will be usable.

Step 3: Data cleaning

This is the most time-consuming part of the project! It will require a lot of back and forth between the Data team and the business in order to understand the data. The Agile method is therefore fully adapted here. Indeed, a poor or partial understanding would result in a biased analysis. Also, in the era of Big Data, we have to deal with an ever larger volume of data. Data is often incomplete; not standardized (identical data in different formats, for example: France, FR); obsolete; duplicates. It is therefore necessary to be attentive and rigorous if we want to obtain “clean” and usable data.

Step 4: Exploring

In possession of a harmonized and correctly formatted data set, the Data Scientist will be able to start the analysis. Again, it is difficult to predict what the result will be, or even if the result will be satisfactory. But at this point we get a first glimpse.

Step 5: Creating the Data Science Solution

The creation of the solution can begin if the exploration has yielded sufficiently conclusive results. It is essential that the Data Scientist always has the initial business problem in mind in order to create the algorithm that will best respond to it and maximize the return on investment.

Step 6: Deployment

The deployment or production phase is the moment when customers take possession of the tool that has been developed so that it can be used.

What is the point of conducting a Data Science project using the Scrum methodology?

The limits of “classical” methodologies

In a project managed in a “cascade” or “V cycle”, each of the stages does not start until the previous one is completed. It can then happen that almost all of the time or budget of the project is consumed before the final stages, forcing either to rush the latter or to increase the initial project budget.

Before proceeding with deployment, i.e. delivery to users, the project team carried out technical tests and a sample of users did functional tests. If the results are satisfactory, the product can be deployed to all users.

After deployment, in the context of projects managed in cascade, we very often observe that:

  • Users are not trained to use the product
  • Of those that have been developed, only a small number of functionalities will be used, while other missing functionalities were expected.

Why such a result? In a cascading project, the stage of collecting the need takes place only once. It must therefore be exhaustive and can therefore last a few weeks or several months. The customer is therefore heavily involved at that time, then a few months later, at the time of functional tests... once the product has been developed in its entirety! There is therefore almost no room for manoeuvre in the event that the product is not satisfactory!

The benefits of the Agile approach

To overcome this, the “founders” of Agile methods wanted to put the customer back at the center of the project in order to ensure satisfaction. The 4 values of the Agile Manifesto demonstrate this:

  • People and their interactions more than tools and processes.
  • Operational products more than comprehensive documentation.
  • Collaboration with customers more than contractual negotiation.
  • Adapting to change more than following a plan.

Also, the Scrum approach makes it possible to “parallelize” several of the 5 steps seen above during a single Sprint. It is therefore no longer months that separate the needs collection phase from the deployment phase. At most 4 weeks pass between the Sprint Planning, during which customer needs are addressed, and the Sprint Review, during which what has been produced is inspected by these same customers.

It is the entire development team that exchanges with the Product Owner during the Sprint Planning and it is the entire development team that receives feedback from customers/users during the Sprint Reviews! And that, at each iteration!

With Scrum, investigations will start as soon as we have identified a need, even in a summary way. The Product Owner starts collecting the first information about the customer need. He expresses it to Data Scientists through “User Stories” and daily exchanges. At the same time, Data Scientists are beginning to “prepare the ground”: identifying the tools where the data is stored, requesting access to the various tools and databases, collecting and exploring the data in order to be able to start identifying how it will use it to achieve the expected result.

Also, if one of the requests is impossible to satisfy, we will notice it very quickly. The project can then be reoriented at the start of the next sprint. The role of the Product Owner is to collect and prioritize user requests while the development team works to meet initial needs. Scrum's iterative approach allows you to start going further than just collecting customer needs, in order to quickly and regularly discover if the project is on track.

If the data (or a sample) is available and usable as is, the exploration phase can start from the first Sprint! With the Scrum approach, we can get the first results from the first Sprints. These are not final but they make it possible to validate the track followed in order to continue or to invalidate it in order to change the strategy.

Of course, depending on the complexity of the model to be produced, it may be delivered at the end of one or more sprints. However, the idea is to quickly provide a first result that can be used by the customer; this first result is called an MVP (Minimum Valuable Product). The customer can then reap the first benefits and provide feedback that allows the team to improve/complete the model during the following sprints.

Each iteration is therefore an opportunity for the customer to refine, to specify their needs, to prioritize them according to what brings them the most added value. For the Scrum team, each iteration is an opportunity to improve their understanding of customer needs in order to provide them with the best possible solution.

If, at the end of a sprint, what has been produced is not satisfactory, the Sprint review makes it possible to detect it... after 4 weeks maximum! It is then entirely possible to reorient the next sprint, based on new feedback collected.

Conclusion

In a Scrum project, if the results are not satisfactory, it is at most after 4 weeks that we notice it and we can reorient the project as early as the next sprint. If the sprint result is satisfactory, users can benefit. The project is not over, requests for improvements and new functionalities are being made over time..., while the development of additional functionalities is in the various previous phases.

As you will have understood, each of the 6 phases seen above will be addressed and repeated in each of the sprints in order to produce a partial but potentially usable result! But it is likely that the first sprints will not be able to give results until the data is available.

From a customer/user perspective, Scrum allows them to regularly ensure that their needs have been understood, through demonstrations and regular deliveries. These regular deliveries allow it to benefit from the results of the project from the first iterations, as soon as the first functionalities are put into service. The project as a whole does not cost him less, but for an equivalent budget, the product meets his expectations much more. Also, users benefit from regular handling of the product.

From the point of view of the Data Scientist, the advantage is that he will be able to explore, test several approaches, tools... Each project is therefore an opportunity for him to discover a new tool and to acquire new knowledge and skills, and therefore to propose ever more innovative solutions. In addition, operation by Sprint allows for better visibility and better framing of the actions to be carried out.

Finally, it would be utopian to think that the Scrum framework offers a magic formula for the success of any project! The project can only succeed if the entire Scrum framework is respected and if the principles and values of Agility are understood and embodied by the actors in the project. The Scrum Master is there to support the team in this direction, each Sprint being an opportunity to improve!

So let's be Agile!

Latest blog posts

Discover our articles on the latest trends, advances, or applications of AI today.

Caroline
Data Scientist
Aqsone
Squad Com'
Technical

Introduction to Retrieval Augmented Generation (RAG)

Learn more
Louis
Data Scientist
Aqsone
Squad Com'
Technical

Interpretability of LLMs: The Role of Sparse Autoencoders

Learn more
Diane
Business Developer
Aqsone
Squad Com'
Innovation

Artificial Intelligence in Industrial Procurement

Learn more