This page contains supplemental materials for the paper

A.Censi, R.Murray --

A group-theoretic approach to formalizing bootstrapping problems(PDF)

**Data**: The original data is from the Rawseeds project.

**Videos format**: The videos are in MP4 format with H264 encoding. They were encoded as .avi/mpeg using mencoder, then converted to .mp4/h264 with ffmpeg. They should play on any recent/decent player, so let us know if it doesn't work for you. Free players that are known to work include: MPlayer, VLC.

Click "play" to play the video in the browser using a Flash widget. Or right-click "download" for the direct link to the .mp4 file.

**License**: You are welcome to use these videos under the terms of the Creative Commons Attribution-ShareAlike 3.0 Unported License.

- What the world looks like for a bootstrapping agent
- Range-finder data
- Range-finder data - statistics
- Range-finder data - embedding and population code
- Range-finder data - BGDS tensors learning
- Camera data
- Camera data - mean
- Camera data - variance
- Camera data - gray-scale - BGDS tensors learning
- Camera data - contrast - BGDS tensors learning
- Camera data - grayscale - observations and predictions
- Camera data - grayscale - anomaly detection signal

This animation shows the initial state of knowledge for a bootstrapping agent.
At the beginning, we only have an uninterpreted stream of observations and commands,
and we must make sense of them. The bootstrapping agent must interpret this initial confusion. The only
semantics assumed is that the commands somehow have a causal effect on the observations.
Can you tell which sensor is this?

At left, you see the observations, which in the paper are called `y`

(white: low, black: high, whathever "high" and "low" mean for the unknown sensors); in the middle, the derivative `dy/dt`

of the observations (red: positive, white: zero, blue: negative); at right, the uninterpreted commands `u`

.
In this case, the commands correspond to linear and angular velocities.

Can you tell which changes in the observations are due to the agent actions (motion) or other things moving in the environment? This is the **anomaly detection** task considered in the paper.
This is a *passive* task that can be done on logged data, compared to the servoing task,
considered in our previous work, which being *active* is more representative of the model learned, but cannot be evaluated on static data.

This video displays the laser data obtained by the two Sick range-finders mounted at approximately 0deg and 180deg with respect to the robot front.

On the right, the raw readings are displayed by a simple plot. The first 181 readings are from the front laser, and the rest from the back laser.

On the left, the laser readings are plotted in polar form superimposed to the data from the omnidirectional camera. (Note that the alignment is only approximative).

In both cases the maximum distance is capped at 20m (for visualization purposes).

This video displays some second-order statistics of the laser data, namely the sample covariance of the readings (left), of the readings derivative (center), and of the sign of the readings derivative (right).

Eventually, when averaged over long trajectories in various environments, all of the three statistics are a function of the distance between the sensels. However, their convergence properties vary. The covariance of the readings is slow to converge because the robot is driven along stereotypical trajectories (e.g., straight in the middle of a corridor). Instead, the derivative of the readings converges faster (informally, deriving something tends to get rid of the slow phenomena).

From second-order statistics, it is possible to infer *similarities* of the sensels, and from those similarities, to obtain an **embedding** of the sensels on the sensel space. The *metric* information is not recovered precisely, but the *topology* can be reliably estimated. In this case, this means that, even starting from shuffled values, it is possible to recover the ordering of the sensels. The remaining uncertainty can be considered a **diffeomorphism nuisance**.

Actually, the most reliable statistics for embedding purposes is the *information distance* between the sensels, which is not shown here. Here is a Python pickle file containing the information distance matrix estimated from the data.

Using statistics of the data we can recover the ordering of sensels.
The next step is **percentile normalization**: the value of each sensel is normalized in the [0,100] range according to its percentile in the sequence. This step normalizes a diffeomorphism nuisance acting on the values. For example, suppose that the data of a range-finder are modified by an invertible nonlinearity, such as `x --> 1/x`

, so that the values represent nearness instead of distances. The percentile normalization step equalizes the effect of such nonlinearity.

Also, it has the effect that the data is represented more densely for more probable values, which makes it an efficient representation. For example, the range-finder readings are mostly in the <10m range, while very few samples are >20m. The percentile representation gives more space to the immediate surroundings.

The next step is the **population code** computation. N cells are assigned to each sensel. Each cell is activated if the sensel value is close to its reference point, according to a certain kernel.
The result is a 362xN array which we can display as an image for easy visualization.

Now, notice that all these operations are generic and data-agnostic. However, for the case of the range-finder, the end result is an image which is diffeomorphic to a polar map of the environment. On the *y* axis we have the sensel position in the sensel space, which is the angle (up to a diffeomorphism); on the *y* axis we have the percentile as a pop. code, which is diffeomorphic to the range.

In the paper we prove that the range-finder data thus preprocessed can be approximated by a BGDS model.

This video shows the tensor H being learned for the range-finder data. Its four slices are shown side-by-side in false colors (red: positive, white: zero, blue: negative). The video is only for 1 log out of 11, so the final results are not as smooth as those shown in the paper's figures.

This is the composite frame used as the data in the experiments.

On the left, the data from the omnidirectional camera. Top right is a frontal camera with large field of view, and bottom right one of the Three triclops cameras.

Unfortunately, there might be some glitches due to the fact that different cameras have different framerates.

This video shows the computation of the mean values for each pixel. Notice how simple statistics identify the role

This video shows the computation of the variance for each pixel. Here white = low variance, and dark = high variance. In general, it is not necessarily true that high variance is equivalent to more information, but in this case extremely low-variance pixels are non informative and could be discarded. Those correspond to "dead" parts of the image: borders or the robot itself reflected in the images.

This video shows the tensor H being learned for the camera data. Its four slices are shown side-by-side in false colors (red: positive, white: zero, blue: negative).

The video is only for 1 log out of 11, so the final results are not as smooth as those shown in the paper's figures.

In this video the false-color image of each tensor slice is generated independently from the other slices. This visualization exaggerates the the importance of the (vertical gradient, angular velocity, at the bottom right) slide, which is 0 theoretically and just noise in practice, and would appear white if all slices are normalized together as in the paper's figures.

This is the same as the previous video, with the difference that we are pre-filtering the camera data using a contrast operation before learning. The results are very similar. In general, filtering the data with a local operation should not change the learning result.

This video shows side-by-side the derivative of the data and the prediction based on the learning tensors.

Note that fast rotations cause problems because of motion blur and the fact that the time resolution becomes relevant, so that the observations cannot be explained by a continuous dynamics. Here we should take a multi-scale approach, reducing the resolution of the image for fast motions.

This video shows the anomaly detection signal (white: no anomaly, black: anomaly).

The false colors are normalized per-frame. This means that in the first part of the sequence, when there are no moving objects, you are looking mainly at noise.

Skip to time 45s or 105s (in log time, displayed in the bottom right) to see the strong signal when people walk past the robot.

Also, genuine anomalies that do not correspond to moving objects but to model failures are:

- occlusions
- motion blur due to large rotations
- when the robot tilts due to uneven pavement, which produces a motion not explained by linear/angular velocities