(a)–(f) Snapshots of trajectory positions of 10 000 gyrotactic particles (red) and 10 000 smart particles (blue) at time for parameters in the regimes (A)–(F) in Fig. 1 in this Letter. Particle positions are plotted modulo . Dynamics for smart particles are obtained using the best policy found in 20 training sessions of 5000 episodes each, with , , and with optimistic initial condition: All elements of the table were initialized to . The flow has characteristic flow velocity and vorticity (the vorticity field is shown in gray scale). Translational noise and rotational noise , time step , and running time of an episode to ensure much more than state changes in episodes. (a) , , (b) , , (c) , , (d) , , (e) , , and (f) , . Right: a set of representative trajectories at different stages in the learning process for smart gravitactic particles (blue) compared with typical trajectories for naive gyrotactic particles (red) confined in a trapping dynamics (case C above).
Dependence of the learning gain [see Eq. (3) in this Letter] vs episode number for ten different learning processes (gray curves). Point , region C in Fig. 1. The values of are smoothed using local averages of different window size depending on . The insets highlight which preferred directions the smart particle take for each of the 12 states according to three final approximately optimal strategies where the chosen direction gives the largest value . Learning is done with learning rate decaying from and -greedy policy with exploration rate decaying from .
Left: learning gain of the best policy [see Eq. (3) in this Letter] found in 20 training sessions for different regions in the parameter space. Values of below the resolution threshold are replaced by in the upper left corner (in this regime, we did not find strategies that are better than the naive one, meaning that is zero within numerical precision). Right: example of the optimal actions for the smart gravitactic particle that succeeded to escape the confinement, parameters in region C in Fig. 1. The data are based on the same setup leading to Fig. 1.