Challenge

Data Collection

Hyperspectral Sensor
We use the LightGene Hyperspectral Sensor for the data collect. Fig. 3 is a brief review of the LightGene camera sensor. In particular, the camera can provide hyperspectral dataset in the range of 450 to 950nm with a spectral resolution at 4nm. In total, the camera can provide 128 spectral channels. The spectral resolution of each channel is approximately 1400 by 1800 pixels. And therefore, in total, each frame of the hyperspectral image is with the size: 1400 * 1800 * 128.

Outdoor data collection in Shanghai
The dataset collection is in Shanghai for three days in June. The we collected data in a variety of environment including: crowded traffic area, famous buildings and structures, CBD, highways, quiet suburbs, overpasses and underground parking. The weather condition includes sunny and cloudy days. And the lighting condition includes day, night and sunset. We use standard color board for color calibration. We collecting data, the car is driving at a speed in the range of 20-50km/h. The hyperspectral camera is working at 1fps. The field of view (FOV) is 16 × 12 degrees in current lens configuration. The camera is vertical mounted to enable capturing of a wider dynamic range.

Fig.3. Specifications of the hyperspectral data

Dataset Labeling

Data selection
The V2.0 dataset is focusing on semantic segmentation using coarse labeling. It is aiming to exploit the rich information in the hyperspectral data. Therefore, we manually selected 1054(training) plus 200(testing) hyperspectral images which are considered suitable for the semantic segmentation task.

Coarse Labeling
For the training dataset, we provide only the coarse labeling. The labeling is deliberate conducted in a quick and rough fashion from different people. And the level of detail is various from people to people. Therefore, we are encouraging the users to learn information from the rich hyperspectral information rather than from human labeling. Fig.1 shows an example of the coarse labeling. We label 1054 images from the dataset and consider them as training set.

We use Computer Vision Annotation Tool (CVAT) as the labeling tool. And the labels and colors are consistent with CityScape[1] dataset.

Fine-grained Labeling
For testing purpose, we also fine-grained labeled 300 images. During the challenge, only the input hyperspectral images are available and the groundtruth will not be released.

Hyperspectral Image Compression

Unlike RGB images which only has 3 channels, each hyperspectral images in our dataset has 128 channels. Without compression, each hyperspectral cubic will take more than 1Gigabytes storage and will make it unreasonable for online transfer. In our current release, we compress the image using H.264 encoder and decoder with a quality setting at 90%. The compression ratio is smaller than 2% and the final dataset size is around 150 Gigabytes.