[ Top page ]

« A Python program for fluctuated 3D printing | Main | Method for packing 8-bit (int8) arrays into GPU memory by Theano »

Computer Vision

Converting Caltech pedestrian dataset for Python

Recently, a href="http://www.vision.caltech.edu/Image_Datasets/CaltechPedestrians/" target="_blank">Caltech pedestrian dataset is often used as a benchmark for Computer Vision. However, this dataset is in an extraordinary format, and so it is not easy to handle it. You can handle it easier by using Matlab, but it is troublesome if you intend to convert it for Python for the sake of, for example, deep learning. I developed conversion tools, so I publish them here.

Pedestrian detection problem, especially this dataset, is known as a difficult problem/benchmark. This dataset is much larger than other pedestrian databases, and thus it is suited when very many data is required, such as deep learning cases.

Video conversion

Caltech video is in so-called "seq" format. A program that converts it to a format readable by Python programs is available at the following URL. reading .seq files from caltech pedestrian dataset

I used this program and found that it cannot read the last frame of each file correctly, so an error occurs. However, other frames can be read correctly, so I imitate this program for handling the files.

Conversion of bounding boxes

Caltech dataset includes a file of annotations. This file contains bounding box information; that is, rectangles that enclose pedestrians. They are in so-called "vbb" format, which is a binary Matlab format. A binary format is difficult to be handled, so I converted the files into text format. The program called code3 in Matlab (two functions in file named "vbb.m" in the directory), which is linked from the Caltech dataset page, can be used for converting them. The two functions are one that reads binary vbb file (A = vbbLoad(file)) and one that writes text-format vbb file (vbbSaveTxt(A, textFileName, timeStamp)).

The files converted to text format can be handled (further converted) by my programs. Because pattern matching in Python is complicated, I used Perl for converting them to Python format. The following Perl program generates a Python program.

### Bounding box extractor for textual VBB file ###
#   Public domain program
#   coded by Yasusi Kanada
#   2015-6-22

open(input, "annotations/${ARGV[0]}/${ARGV[1]}vbb.txt");
print "${ARGV[0]}_${ARGV[1]}=[\n";
while (<input>) {
   if (/^lbl='(person(-fa|\?)?|people)'\s+str=(\d+)\s+end=(\d+)\s+hide=(\d+)/) {
      $type = $1; $str = $3; $end = $4; $hide = $5;
      $pos = ''; $posv = ''; $occl = ''; $lock = '';
   if (/^(pos|posv)\s*=\s*\[(([-\d\w\.\;\s])*)\]/) {
      $name = $1;
      $text = $2;
      $text =~ s/;\s+/\] \[/g;
      $text =~ s/\s+/, /g;
      $text = "[[${text}]]";
      $text =~ s/, \[\]//;
      if ($name eq 'pos') {
          $pos = $text;
      } else {
          $posv = $text;
   } elsif (/^(occl|lock)\s*=\s*\[(([-\d\w\.\s])*)\]/) {
      $name = $1;
      $text = $2;
      $text =~ s/\s+/, /g;
      $text = "[${text}]";
      if ($name eq 'occl') {
          $occl = $text;
      } else {
          $lock = $text;

          print " \{'type':'$type', 'firstFrame':$str, 'lastFrame':$end, 'hide':$hide,\n";
          print "  'pos':$pos,\n  'posv':$posv,\n";
          print "  'occluded':$occl,\n  'lock':$lock\},\n";
print "]\n";

The generated program contains a list per pedestrian. The original file contains separated pedestrian (person) data, and also contains data of people and "person-fa". For each pedestrian, this file contains the first frame number, the last frame number, and the bounding box information of the frames in between: the location (x and y) and the size (width and height). However, this program still have problem (bug); it cannot read some of the files.



TrackBack URL for this entry:

Post a comment


This page contains a single entry from the blog posted on June 24, 2015 10:34 PM.

Many more can be found on the main index page or by looking through the archives.

Creative Commons License
This weblog is licensed under a Creative Commons License.
Powered by
Movable Type 3.36