Artificial Intelligence and Big Data to help pharmaceutical innovation

Some people like them, others don't. For some, they represent a dark future where Man is the slave of the machine (until Keanu Reeves saves us). For others, it is the possibility of a more serene future. Yet Artificial Intelligence and Big Data are here, and they are already starting to transform our lives... and especially our medicines.

Connected watches, our smartphones, your laptop are constantly collecting data. And this data is used for more or less praiseworthy purposes: from geolocation that allows you to find your lost phone, to the production of targeted ads to make you buy goods based on your consumption habits. In the world of health care, too, huge amounts of data are collected, but for very different purposes. During a blood donation (the reserves are empty by the way, a good resolution to take in 2022?), when the donor declares a fever within 24 hours, the bags can be recovered for research. In another case, during clinical trial protocols, collections called "biotheques" are created from samples from tens, hundreds or thousands of patients. These samples are used to perform analyses that are sometimes extremely vast and go far beyond the scope of a single discipline, to generate databases containing sometimes thousands of pieces of information on thousands of anonymous samples. How to deal with these quantities of data? How can it be useful for pharmaceutical innovation? A journey to the heart of drug development, version 4.0...

The "omics": genome, transcriptome, proteome, microbiome...

In the human body, there are cells. We are literally a lot of cells. Some of them are "ours", our cells resulting from the fusion of the spermatozoon and the ovum. These cells contain in their nucleus our DNA, our genome (the whole of our genes), which come for half from the father and for the other half from the mother (except for one detail, see references for the most curious). Here, the suffix "-ome" refers to a whole. The 22000 or so genes coding for proteins are then transcribed into messenger RNAs. We refer to the whole as the transcipt-ome. The messenger RNAs are then translated into proteins, and the set of proteins constitutes the proteome, etc. But in a human body, human cells are in the minority! That's right! Our organism, our body contains more non-human cells than human cells. Our bacterial flora, our microbiome, which we are beginning to understand how it interacts with our cells and impacts our health. On the order of 500 different species for a given person... And depending on the families of bacteria represented in the microbiome of an individual, it is sometimes possible to associate a pathology or to predict the quality of the response to a given treatment.

Big Data is like onions...

Besides the fact that it can make you cry (mainly students), the common point between the aromatic bulb and databases is the layers. A database is layers of multi-dimensional information that you can't see through with the naked eye. Take a layer of an onion. Put it in front of a light source. You can look at the streaks that run through it, locate them, count them. By superimposing them, you may be able to compare with the next one. Now, put the whole bulb in front of the lamp, you will see that the eye-lamp combo is inadequate to explore the whole thickness of the layers of the onion. The term analysis, literally cutting into pieces, takes on its full meaning in this culinary metaphor. Except that if you can actually take each layer and look at the drawings separately and then compare the drawings two by two to finally come up with information about the complete structure of the bulb, by the time you are done, your guests will have long since gone to eat out. A very long time. So Big Data alone is not very useful. It lacks a tool, capable of analyzing faster than the human eye. To make simple comparisons with great speed, without losing focus, making mistakes or having to take a break to sleep.

Algorithms, artificial intelligence, the difference is learning

Algorithms are stupid and nasty. For those of you who know the book-game series "a book you are the hero of", it is an algorithm.

1. Take an onion

2. If it has skin on it, go to 3. If not, go to 6.

3. Make sure the skin is clean. If it is, go to 5, if not go to 4

4. Rinse the onion, then go to 5

5. Remove the skin, then go to 6

6. Chop the onion, then go to 7. If you don't like the onion, go to 8

7. Cook the onion, then go to 8

8. Wash your hands.

According to the CNIL, or Commission Nationale de l'Informatique et des Libertés, an algorithm is "a finite and unambiguous sequence of instructions for arriving at a result from input data".

So, what is the difference with Artificial Intelligence? According to the mathematician Cédric Villani, "there is no possible definition". Ah... And the CNIL? They say the same thing. If we had to oversimplify, it would be possible to say that the difference lies in learning. Rules of the algorithm dedicated to observe and take into account many parameters, to keep in memory the previous cuts of onion to anticipate how to wash and cut the next one in the best way, or why not change the order of the parameters of the algorithm according to external instructions, therefore to take into account the context, to memorize and learn. Except that each of these steps is the sum of simple, finite and unambiguous instructions that lead to a result from the input data. In short, algorithms...

Headache ? Eat an onion. It won't do anything, it's just a matter of taking a break before we get to the heart of the matter: how AI and Big Data can help develop new, safer drugs or make better use of the ones we have.

AI and Big Data analysis

32 million. That's the number of articles (by a landslide) contained on Pubmed, the database of scientific articles.

The query "onion" on Pubmed gives 8025 results. With 15 papers read every day, it takes a year and a half to read everything. It remains to remember the first article once you have reached the end!

The development of a drug, whether it is a new molecule or an old one that is to be used in a new indication (to treat a disease other than the one for which it was developed), starts with a "literature review". This means reading the publications that deal with the subject in order to summarize them. Contextualize. For example, if I want to do a literature review on onions, I type "onion" on Pubmed, and I have to read 8025 articles. I am a researcher, I read English well, I process about 15 articles a day. At this rate, it takes me a year and a half to deal with this part. The other solution is to hire 500 researchers, and have them read 15 papers each, the analysis then takes only one day. But before doing that, you have to tell them what to look for, what information to compile, so you have to give them instructions to get a result from the input data. An algorithm! And since reading text can be learned (we learn to read, then we learn science) by nesting algorithms within algorithms, it is possible to achieve automation of the process. And the program that does this is called Artificial Intelligence.

Applications that go far beyond reading text

An algorithm or an artificial intelligence is not limited to looking for information in text. Interpreting an image, counting, calculating, decrypting... all these operations are virtually accessible to humans, but they require time and resources that make the task difficult. Describing an interaction interface between two molecules atom by atom, predicting the affinity of a drug to its target, searching for risk factors of a disease in genomic databases, all these tasks have become accessible in a relatively short period of time on the sole condition of having a sufficiently powerful computer to run artificial intelligence programs, and thus better anticipate the efficacy profile or the risks of undesirable effects of a drug candidate, without recruiting the slightest patient, and using only one mouse...

References :