TractoCarlo: Tractography using reinforcement learning powered by Monte Carlo exploration

Table of Contents

Extension of TrackToLearn (open source). Source code on GitHub.

For more detailed information about this project, you can read the following report on the subject (only in French):
Final report

Context #

The context given in the sections of this project is extremely simplified and is there only to explain briefly the context of the project. There are lots of nuances that I would encourage you to read about in the references articles or courses below.

Introduction #

The human connectome project is an important challenge in the domain of neurosciences where the objective is to map the entire set of the brain’s neural connections. Tractography, which uses diffusion magnetic resonance imaging (dMRI), is the only non-invasive technique that enables to model programmatically, the structural connectivity of the human brain and a few other fiber structures. This technique is gaining in popularity and is slowly being used as a tool in clinical intervention. Tractography is also used as an important tool to empower related brain research, but it’s also used to plan and conduct epilepsy or oncological surgery. Axons from the brain (white matter fibers) are particularly interesting to visualise in three dimensions since we are able to identify multiple important networks of information between gray matter regions. The propagation of the information in the brain isn’t random, the information follows a predefined path within white matter networks.

An MRI image capture the three of four dimensional information about the anatomical information of a patient under a given resolution. This resolution is clearly defined by the instruments and machines used to capture such an image. A voxel, which is the equivalent of a pixel, but in three dimensions. Instead of having one value/intensity per voxel for a 3D image, we can have a voxel containing multiple values thus resulting in a 4D image. For example, in diffusion MRI, along the fourth dimension, we will find the intensity of the movement of water inside the brain of the patient just as a normal MRI, but we will have multiple intensities associated with different gradients of the electromagnetic field during the capture. However, the axons that tractography methods are looking to model are very tiny structures with a diameter of only 1 to 15 \(\mu m\)¹, whilst the voxels generated from common MRI methods are usually the size of a few millimeters (i.e. 2mm x 2mm x 2mm or 1mm x 1mm x 1mm). Knowing that, it becomes particularly complex to model perfectly the internal connectivity structure of one’s brain if we don’t even have accurate information on each individual anatomical fiber. Diffusion MRI is an imaging technique that leverages the fact that we know that the brain is made of 90% water. Using this technique, we are able to quantify the displacement of water inside the brain by varying the magnetic field during the imaging process. The diffusion of water principle states that water moves continually randomly in all directions. However, if the molecules of water are constrained by the structure of a fiber, the molecules of water will move randomly within the constraints of the fiber, which will end up following the structure of the fiber itself. This means that quantifying the displacement of water enables us to reconstruct the white matter fibers by simply following the principal direction of the displacement in a given voxel.

The diffusion of water in each voxel can be modeled or approximated by several different types of models (tensors, fiber orientation diffusion function, etc.)². These models allow us to estimate the displacement of water and at the same time estimation the orientation of the anatomical fibers according to different voxels. With such estimations, using very simple deterministic tractography methods, we could reconstruct a certain number of streamlines (reconstructed fibers) by simply following the main direction of the displacement of water in each voxel. However, we must make sure that the streamlines are valid according to our anatomical knowledge. For example, the length of each streamline must conform to a minimum and maximum length that is already known in biology. Also, the angle of each step taken when building the streamline must be lower than a maximum predefined angle (i.e. 45 degrees) and the streamline cannot be traced outside the region defined as the white matter (that wouldn’t make sense since we are modelling the white matter). There are a number of other techniques and criterions to try and remove as much invalid streamlines as possible, however, most of them still suffer from a too great number of anatomically invalid streamlines.

Since artificial neural networks are experiencing a resurgence in computer science in general, it was proposed to use machine learning to hopefully increase the accuracy of classic tractography methods. It seems like classic tractography methods are slowly reaching a ceiling in terms of false positives, so there’s still room for improvement and that’s where machine learning techniques could come in. However, the first problem we come across is that machine learning typically requires a lot of labeled data to perform well. In order to label each region of the brain to identify bundles of streamlines, we need an actual neuroscientist to take a look at each brain of a dataset and carefully label each group of streamlines one by one. There’s currently no straightforward way of labeling such data other than carefully going through each brain one by one which take a great amount of time. Also, if several experts are labeling the same brain, the result will probably be different in most cases. Thus, the variability between experts would be an additional limitation to machine learning methods. Finally, even if we have labeled data, this labeling would be limited to the expert’s knowledge which might not be complete since there are still some less known regions of the brain. Machine learning techniques in general are learning to generalize the results from a dataset and don’t really extrapolate to generate new and valid results that we haven’t seen before [1], which would be a desirable feature in order to get more accurate maps of the brain.

That’s where reinforcement learning (RL) methods are coming into play to hopefully bypass the need for labeled data to improve tractography methods. Instead of relying on labeled data, those methods train an agent by only needing to define a behavior that we want the agent to adopt when interacting with its environment.

TrackToLearn #

With a simple definition of the behavior to exhibit, the agent explores the environment by itself without actual human supervision. This behavior is defined through a reward function which outputs a high reward when the agent finds itself in a desired state and lower/negative rewards in undesired states. RL methods are particularly used in the context of games such as Atari [2], Dota [3] or even StarCraft II [4] and are also used more recently to fine-tune LLMs like ChatGPT [5]. Antoine Théberge posed the tractography problem in a reinforcement learning by developing the TrackToLearn framework in Python [6]. This framework allowed to assess the performance of several reinforcement learning algorithms which demonstrated good performance in complex environments. By creating a tractogram deterministically, we would create it by following the main direction of the diffusion of water in each voxel. In this environment, our agent is a logical entity that learns to explore by himself the brain’s connectivity. More formally, the state \(s_t\) in which the agent finds himself is defined as the current traced streamline, with it’s current spatial position (the tip of the streamline) and the direction of the diffusion of water in each voxel. The agent chooses an action \(a_t\) defined as a step in a given direction during the tracking process which maximizes the rewards that it receives by ending up in states that exhibit our predefined desired behavior. At the end, the produced results were competitive for each RL algorithm tested, but the better performing one was Soft-Actor Critic (SAC) [7].

Théberge realizes quickly that the quality and validity of creating a tractogram using such methods directly dependent on the reward model that’s given to the agent during it’s interaction with the environment. In this research, it was suggested that the streamline validity evaluation was probably too simplistic in regards to the stopping criterions. So, it was proposed that this reward function would be modified to be based on classification and validation algorithms of axonal tracts (i.e. FINTA [8], Extractor [9]). This new function, named TractoOracle [10], is a neural network that approximate the quality of each streamline by attributing it a score. The goal of this function is to be able to stop and reject the tracking of streamlines that are going in a wrong direction compared to what is known in the anatomy. That would allow to reduce the number of false positives in current methods. TractOracle will play a crucial role in the performance of the algorithm presented at the methodology section which is at the core of this project.

Particle Filtering Tractography (PFT) #

Particle filtering tractography (PFT) [11] is a technique that allows to reduce the number of prematurely stopping streamlines outside of the white matter. Indeed, since the objectif of tracing the white matter fibers is to connect the gray matter regions, the algorithm doesn’t consider a streamline that stops at the border of the gray matter as a streamline that stopped prematurely since it reached its objective. PFT initiates itself just before making a step that would reach outside of the white matter mask in order to force the failing streamline to continue its trace within the white matter. We denote \(\delta_b\) mm as the distance of backtracking and \(\delta_f\) mm as the distance to propagate forward. The objective is to estimate a streamline initialized at a distance of \(\delta_b\) mm before the premature stopping event and to propagate this streamline \(\delta_f\) mm further to make sure that the premature stopping event is resolved. If the streamline ever reached gray matter, it is stopped, but saved as a valid streamline. Otherwise, if the streamline continues its trace within the white matter region, the tracking of this streamline continues normally following the tractography algorithm until it’s stopped again by one of the stopping criterions. PFT uses a set of N particles which are each associated with their respective weights that have a varying value according to their validity. From the initial position of \(\delta_b\) mm prior to the original stopping event, the algorithm rolls out several probabilistic streamlines such that a few possibilities of trajectories are evaluated by penalizing the streamlines that end up or pass through the region of interest.

Goals of the project #

In the same line of though of the researches presented above, my objective in this project is to explore a method that can increase the ratio of valid streamlines (thus decreasing the ratio of invalid streamlines) during the tracking process. The idea is to explore a mix of particle filtering tractography (PFT) and the method using an oracle that can evaluate the quality of each streamline. The PFT algorithm presented above explores in several different paths in parallel and the resulting streamline is drawn from the distribution of the generated paths which are weighted based on the validity of that path. In our case, instead of estimating the resulting streamline using a distribution of paths, we want to implement an approach that uses the oracle to determine the best generated path.

For the scope of this project, it is not expected to get a full implementation and conclusive results, but we a looking for a proof of concept to evaluate the benefit that such an approach could bring to a reinforcement learning setting. In this proof of concept, one should be able to visualize and render the results to clearly show the behavior of the algorithm.

Methodology #

Parametrization #

Results #

NR	BS	FS	Streamlines	VS	VS_ratio	Amélioration
0	0	0	4067	2797	0.6877	-
2	3	5	3949	2958	0.7491	6.1320 %
5	5	10	3987	2945	0.7387	5.0920 %
2	3	10	3985	2935	0.7365	4.8781 %
5	3	5	3973	2915	0.7337	4.5972 %
2	5	10	3948	2891	0.7323	4.4539 %
10	1	5	3974	2885	0.7260	3.8238 %
2	5	5	3926	2840	0.7234	3.5652 %
5	5	5	3929	2840	0.7228	3.5060 %

Table 1: Best improvements on the FiberCup after using the rollout_env. VS is the number of valid streamlines, VS_ratio is the ratio of valid streamlines compared to the total number of streamlines.

Conclusion #

Even if the results above are issued from a proof of concept implementation, they are still promising. These results are obtained in the Fibercup structure, which is a relatively simple structure compared to the complete brain’s connectivity structure. There’s a lot more streamlines needed to reconstruct correctly the brain’s white matter. If the trend is also maintained for the tracking environment of the brain, we could increase significantly the ratio of valid streamlines. Since it’s just a simple tweak of the prior tracking algorithm, it could also be possible to reuse that same rollout algorithm almost everywhere whenever we are tracking something with minimal effort. However, the effectiveness of this approach in other settings should be explored to check if this method could sometimes produce worst results. Also, a few hundreds of streamlines stop prematurely after just the first step of tracking, which seems like a waste of time and memory since they will all be wiped afterwards anyway. For now, they are not considered in the backtracking algorithm to avoid backtracking too much causing the initial seed to be erased. It would probably be interesting to investigate or rework the algorithm to consider these failing streamlines which would further increase the number of valid streamlines overall. Finally, it would be interesting to explore whether we could improve the reinforcement learning techniques in the tractography context by using model-based methods which would approximate directly the transition distribution of the diffusion environment of the brain. Since our algorithms are dependent on the models used to model the diffusion of water in the brain, maybe could we benefit from the flexibility and the capacity of artificial neural networks to approximate more accurately the diffusion of water in the brain?

References #

[1] Poulin, P., Jörgens, D., Jodoin, P. M., & Descoteaux, M. (2019). Tractography and machine learning: Current state and open challenges. Magnetic resonance imaging, 64, 37-48.

[2] Schulman, J., Wolski, F., Dhariwal, P., Radford, A., & Klimov, O. (2017). Proximal policy optimization algorithms. arXiv preprint arXiv:1707.06347.

[3] Berner, C., Brockman, G., Chan, B., Cheung, V., Dębiak, P., Dennison, C., … & Zhang, S. (2019). Dota 2 with large scale deep reinforcement learning. arXiv preprint arXiv:1912.06680.

[4] Vinyals, O., Babuschkin, I., Czarnecki, W. M., Mathieu, M., Dudzik, A., Chung, J., … & Silver, D. (2019). Grandmaster level in StarCraft II using multi-agent reinforcement learning. Nature, 575(7782), 350-354.

[5] Ouyang, L., Wu, J., Jiang, X., Almeida, D., Wainwright, C., Mishkin, P., … & Lowe, R. (2022). Training language models to follow instructions with human feedback. Advances in Neural Information Processing Systems, 35, 27730-27744.

[6] Théberge, A., Desrosiers, C., Descoteaux, M., & Jodoin, P. M. (2021). Track-to-learn: A general framework for tractography with deep reinforcement learning. Medical Image Analysis, 72, 102093.

[7] Haarnoja, T., Zhou, A., Abbeel, P., & Levine, S. (2018, July). Soft actor-critic: Off-policy maximum entropy deep reinforcement learning with a stochastic actor. In International conference on machine learning (pp. 1861-1870). PMLR.

[8] Legarreta, J. H., Petit, L., Rheault, F., Theaud, G., Lemaire, C., Descoteaux, M., & Jodoin, P. M. (2021). Filtering in tractography using autoencoders (FINTA). Medical Image Analysis, 72, 102126.

[9] Petit, L., Ali, K. M., Rheault, F., Boré, A., Cremona, S., Corsini, F., … & Sarubbo, S. (2023). The structural connectivity of the human angular gyrus as revealed by microdissection and diffusion tractography. Brain Structure and Function, 228(1), 103-120.

[10] Théberge, A., Descoteaux, M., & Jodoin, P. M. (2023). TractOracle: anatomically-plausible reward function for reinforcement learning-based tractography. Work in progress.

[11] Girard, G., Whittingstall, K., Deriche, R., & Descoteaux, M. (2014). Towards quantitative connectivity analysis: reducing tractography biases. Neuroimage, 98, 266-278.

Footnotes #

¹ https://fr.wikipedia.org/wiki/Axone

² Take a look at Maxime Descoteaux’s lecture: https://drive.google.com/file/d/1xrn1NYsFfVCnP0xvlnfxAasbwCpy0krx/view