Images and Video

File Formats

Digital images come in a variety of formats, each with their own properties.

Vector Graphics

Vector formats define a set of points and instructions on how to draw them. The instructions are run by a program to raster the image in order to view it.

Some of the more common vector formats are SVG, EPS, PDF, and AI.

If we open the following SVG file in a text editor, we will notice that it is fairly easy to read the format. It almost reads like a Processing program 😉

Shapes SVG
<?xml version="1.0" encoding="UTF-8" standalone="no"?>
<svg
   ...
   height="512"
   width="512">
  ...
  <g
     transform="translate(0,-161.53332)"
     id="layer1">
    <circle
       style="stroke-width:0.26458332;fill:#00ffff;fill-opacity:1"
       r="52.916664"
       cy="229.26665"
       cx="67.73333"
       id="path3713" />
    <rect
       y="228.20831"
       x="5.2916665"
       height="63.5"
       width="63.5"
       id="rect4520"
       style="fill:#ff0000;fill-opacity:1;stroke-width:0.25843021" />
    <path
       id="path4524"
       d="M 49.514879,171.88985 123.5982,282.2589 Z"
       style="fill:none;stroke:#00b400;stroke-width:2.64583325;stroke-linecap:butt;stroke-linejoin:miter;stroke-miterlimit:4;stroke-dasharray:none;stroke-opacity:1" />
  </g>
</svg>

Pros:

  • Small file sizes, because minimal information is being stored.
  • Images can be scaled up without any quality loss or increase in file size. This is because the instruction set does not change, the only thing that changes is the point values.

Cons:

  • Low level of detail.
  • Limited types of effects, because we don’t have all the image data available in the format.

Raster Graphics

Raster formats define pixel values in a rectangular grid of pixels. The bigger the image, the greater the data set, and thus the larger the file size.

Some of the more common vector formats are JPG, PNG, GIF, and TIF.

Shapes PNG

Pros:

  • High quality and detail, especially at high resolutions.
  • More advanced image effects, because every pixel can be edited.

Cons:

  • File sizes tend to be bigger.
  • Images lose quality when scaled up.

In order not to end up with huge file sizes, many raster formats are compressed. Some compression methods are lossy, meaning that some of the data is lost when it is compressed, and others are lossless, meaning that all the data is recovered once the data is uncompressed.

Video

Videos are just a series of images that need to be processed and displayed very quickly.

Video formats are always rasters and are mostly compressed.

  • Some formats are simply extensions of their image counterparts, like Motion JPG for example, which is just a series of JPG-compressed frames.
  • Others are specific to video, like H.264, which has a form of compression over time, where some pixels are predicted based on known pixels in previous and future key frames. This is called temporal compression.

Efficient compression is necessary for video because of the huge amount of data that it carries. While film used to run at 24 frames per second, high definition video now runs standard at 60 frames per second, and sometimes goes as high as 240 fps! Combining these fast frame rates with large resolutions like 4K means that hundreds of millions of pixels need to be processed every second for a video to play smoothly.

Processing Images

When working with image data, we will usually want to work with rasterized uncompressed images. This is because many algorithms require looping efficiently through all pixels in an image, or doing quick look-ups between neighboring pixels.

The good news is that this usually happens in the image loader or video codec, before an image or video frame gets to us. For example in OF, FreeImage will automatically decompress JPG or PNG images and provide us the “final” pixels in the frame.

While we will almost never have to worry about decoding an image or a video frame ourselves, we should still be mindful of what format the data comes in, and make sure that it is suitable for our application.

Images in OF

The data Folder

The simplest way to access files in an OF app is to include them in the project’s data folder. If this looks familiar, it’s because this idea is borrowed from Processing. The data folder is located in <project>/bin/data and each project will have its own dedicated data folder.

If we drop our files in the data folder, they can be accessed in the app without having to figure out the full path on disk where the file is located, which can be very handy.

ofImage

ofImage is the general type to use to work with images in openFrameworks. ofImage includes methods to load files from disk, draw images to the screen, access pixel data, etc.

ofImage is a type, which we can create like any other variable type.

// ofApp.h
#pragma once

#include "ofMain.h"

class ofApp : public ofBaseApp
{
public:
  void setup();
  void draw();

  ofImage dogImg;
};

We will load an image named dog-grass.jpg from our data folder in the ofApp::setup() function. We only need to load the image into memory once, so we do it when the app starts up.

We want to draw the image every frame, so we will do that in the ofApp::draw() function.

// ofApp.cpp
#include "ofApp.h"

void ofApp::setup()
{
  dogImg.load("dog-grass.jpg");
}

void ofApp::draw()
{
  dogImg.draw(0, 0);
}

If we navigate under the hood and see what ofImage::load() is actually doing, we see that it calls many functions from the FreeImage library to determine the file’s format, uncompress the data, and load it into values for each pixel.

Image Attributes

An image data structure usually comprises of:

  • a size (a width and height)
  • a pixel format
  • a value for each pixel

Pixel Arrays

This structure looks a lot like the arrays we have been exploring in the previous section. This makes arrays great options to represent image data in a computer program.

Even though an image has two dimensions (a width and a height), the pixel array is usually one-dimensional, packing the rows one after the other in sequence.

Grid Pixels

Some frameworks allow accessing pixels using the column x and row y, like PImage.get() in Processing and ofImage.getColor() in openFrameworks. These convenience functions are very useful as they take care of figuring out all the index arithmetic for us.

The following example draws an image one pixel at a time, using nested for-loops to iterate through each row and column.

// ofApp.cpp
#include "ofApp.h"

void ofApp::setup()
{
  // Load the dog image.
  dogImg.load("dog-grass.jpg");

  // Set the window size to match the image.
  ofSetWindowShape(dogImg.getWidth(), dogImg.getHeight());
}

void ofApp::draw()
{
  for (int y = 0; y < dogImg.getHeight(); y++)
  {
    for (int x = 0; x < dogImg.getWidth(); x++)
    {
      ofColor color = dogImg.getColor(x, y);
      ofSetColor(color);
      ofDrawRectangle(x, y, 1, 1);
    }
  }
}
  • ofSetWindowShape() resizes the window to the size of the loaded image. Note that this function can be called any time while the app is running, and can override the starting window dimensions that are set in main.cpp.
  • ofImage.getColor() returns the ofColor value at a specified column and row index. ofColor is a data structure used to access the different channels that make up a color value.
How would we read the value of a pixel under the mouse cursor?

We can use ofImage.getColor() and pass the mouse coordinates as the column and row index.

// ofApp.cpp
#include "ofApp.h"

void ofApp::setup()
{
  // Load the dog image.
  dogImg.load("dog-grass.jpg");

  // Set the window size to match the image.
  ofSetWindowShape(dogImg.getWidth(), dogImg.getHeight());
}

void ofApp::draw()
{
  // Draw the image as the background.
  ofSetColor(255);
  dogImg.draw(0, 0);

  // Get a reference to the image pixels.
  ofPixels dogPix = dogImg.getPixels();
  // Get the color value under the mouse.
  ofColor color = dogPix.getColor(mouseX, mouseY);

  // Draw a rectangle under the mouse using the pixel color.
  ofFill();
  ofSetColor(color);
  ofDrawRectangle(mouseX - 25, mouseY - 25, 50, 50);
  // Add an outline so we can see the rectangle better.
  ofNoFill();
  ofSetColor(0);
  ofDrawRectangle(mouseX - 25, mouseY - 25, 50, 50);
}

Note that this only works if the window and image have equal resolutions. If they didn’t, we would need to remap the mouse coordinates to the window coordinates. We will cover this in a later class.

Size and Scale

In the previous examples, the drawn image is anchored in the top-left corner of the window (0, 0) and by default, it is drawn at full resolution. This means that the image might be smaller or larger than our window.

If we want the image to fill and fit in the exact window bounds, we have two options:

We can resize the window to match the image resolution. This is what we have been doing in the previous examples.

// ofApp.cpp
#include "ofApp.h"

void ofApp::setup()
{
  // Load the dog image.
  dogImg.load("dog-grass.jpg");

  // Set the window size to match the image.
  ofSetWindowShape(dogImg.getWidth(), dogImg.getHeight());
}

void ofApp::draw()
{
  dogImg.draw(0, 0);
}

We can scale the image to match the window size.

// ofApp.cpp
#include "ofApp.h"

void ofApp::setup()
{
  // Load the dog image.
  dogImg.load("dog-grass.jpg");

  ofSetWindowShape(1280, 720);
}

void ofApp::draw()
{
  dogImg.draw(0, 0, ofGetWidth(), ofGetHeight());
}
  • If ofImage::draw() is called with 4 arguments, the first 2 set the top-left coordinates and the last 2 set the width and height.

If we use an image that is smaller than the window, it will be scaled up to fit the window. We can tell OF how to upscale the image by setting a filter on the texture.

When an image is scaled up, it needs additional pixels to fill in the extra resolution. Conversely, when an image is scaled down, it removes some of its original pixels because the resolution is smaller. The min and mag filters define how the renderer should handle these situations.

  • The default mode uses linear interpolation GL_LINEAR. This blends the nearby pixels together to make new pixels and may look blurry.
  • The nearest neighbor mode GL_NEAREST uses the nearest pixel value for the added pixels without any blending. This keeps the image sharp at any resolution, but it may look pixelated.
// ofApp.h
#pragma once

#include "ofMain.h"

class ofApp : public ofBaseApp
{
public:
  void setup();
  void draw();

  void mousePressed(int button, int x, int y);
  void mouseReleased(int button, int x, int y);

  ofImage dogImg;
};
// ofApp.cpp
#include "ofApp.h"

void ofApp::setup()
{
  dogImg.load("dog-grass-low.jpg");

  ofSetWindowShape(1280, 720);
}

void ofApp::draw()
{
  dogImg.draw(0, 0, ofGetWidth(), ofGetHeight());
}

void ofApp::mousePressed(int button, int x, int y)
{
  dogImg.getTexture().setTextureMinMagFilter(GL_NEAREST, GL_NEAREST);
}

void ofApp::mouseReleased(int button, int x, int y)
{
  dogImg.getTexture().setTextureMinMagFilter(GL_LINEAR, GL_LINEAR);
}
Dog using GL_LINEAR
Dog using GL_LINEAR
Dog using GL_NEAREST
Dog using GL_NEAREST

Pixel Access

A standard color pixel will have 3 color channels: red, green, and blue (RGB). While Processing packs all channels into a single int, this is not common practice.

The color values are usually packed sequentially in the array. Instead of each pixel holding a single value, it will hold 3.

Grid RGB

The pixel array then has total size:

size = width * height * channels

In order to access the pixel in a 1D array using a 2D index, we first need to convert it.

index = y * width + x

How do we access a pixel index in an RGB image?

Because each pixel has three color values (for each RGB channel), we need to multiply our pixel index by 3 to take that offset into account.

pixel = y * width + x

index = pixel * 3
index = (y * width + x) * 3
ofPixels.getColor() can also accept a single argument for the index (instead of two arguments for the column and row). How can we modify the previous example to use the single index version of getColor()?

We can use the formula above to convert our column and row to an index value in the color array.

// ofApp.cpp

// ...

void ofApp::draw()
{
  // ...

  // Get a reference to the image pixels.
  ofPixels dogPix = dogImg.getPixels();
  // Get the color value under the mouse.
  //ofColor color = dogPix.getColor(mouseX, mouseY);
  int index = (mouseY * dogPix.getWidth() + mouseX) * dogPix.getNumChannels();
  ofColor color = dogPix.getColor(index);

  // ...
}

Note the use of ofPixels.getNumChannels() instead of the literal 3. This ensures the code will work with all image types and not just RGB images.

Conversely, if we want to get a 2D value from a 1D index, we can use integer division:

x = index % width
y = index / width

The following example reads the value of a pixel sequentially, based on the sketch frame number.

// ofApp.cpp
#include "ofApp.h"

void ofApp::setup()
{
    // Load the dog image.
    dogImg.load("dog-grass.jpg");

    // Set the window size to match the image.
    ofSetWindowShape(dogImg.getWidth(), dogImg.getHeight());
}

void ofApp::draw()
{
  // Draw the image as the background.
  ofSetColor(255);
  dogImg.draw(0, 0);

  // Cache the image dimensions in variables for easy access.
  int imgWidth = dogImg.getWidth();
  int imgHeight = dogImg.getHeight();

  // Use the modulo operator to make sure the frame index is never 
  // greater than the max number of pixels in the image.
  int frameIndex = ofGetFrameNum() % (imgWidth * imgHeight);
  int x = frameIndex % imgWidth;
  int y = frameIndex / imgWidth;

  // Get a reference to the image pixels.
  ofPixels dogPix = dogImg.getPixels();
  // Get the color value for this frame.
  int pixelIndex = frameIndex * dogPix.getNumChannels();
  ofColor color = dogPix.getColor(pixelIndex);

  // Draw a rectangle under the mouse using the pixel color.
  ofFill();
  ofSetColor(color);
  ofDrawRectangle(x - 25, y - 25, 50, 50);
  // Add an outline so we can see the rectangle better.
  ofNoFill();
  ofSetColor(0);
  ofDrawRectangle(x - 25, y - 25, 50, 50);
}

Data Formats

Image Format

The most common image type we will work with is RGB color images.

We will also work with single-channel formats, usually called grayscale or luminance. These are particularly handy for devices that only capture a brightness level, like infrared cameras or depth sensors.

Some images also have an alpha channel for transparency, like RGBA. Our example image happens to have transparency, but we will encounter this rarely in this class as most sensors do not use the alpha channel.

Another format worth mentioning is YUV, which is a color encoding that is based on the range of human perception. Instead of using three channels for color, it uses one for brightness and two for color shift. This gives similar results to RGB but at much smaller sizes (usually a third), and this is why YUV formats are often used for webcam streams.

Pixel Format

Pixel color values can be stored in a few different formats. The more bits a format can hold, the more range the values can have, and the larger the size of the frame gets.

  • unsigned char is the most common format. It uses integers and each channel has 8 bits of data and values range from 0 to 255.
  • float uses floating point 32 bit data. The usual range is from 0.0 to 1.0 but this format can be used for HDR effects, where the values can extend past 1.0 or for storing non-color data, where we can even use negative values. We will use float when working with depth sensors and when storing non-color data inside our pixels.
  • unsigned short is another integer format but with 16 bits of data, meaning values range from 0 to 65535. We will also use this format when working with depth sensors, where precision is very important and we need more than the 256 distinct values that we get from unsigned char.

The following example demonstrates how to access the pixel array data directly, using ofPixels.getData().

This is a bit more complicated, and may not be necessary in most applications. However, it tends to be the fastest way to manipulate pixel values and is the recommended approach when having to process large images pixel by pixel.

// ofApp.cpp

// ...

void ofApp::draw()
{
  // ...

  // Get a reference to the image pixels.
  unsigned char* dogData = dogImg.getPixels().getData();
  // Get the color value for this frame.
  int numChannels = dogImg.getPixels().getNumChannels();
  int pixelIndex = mouseY * dogImg.getWidth() + mouseX;
  ofColor color = ofColor(
    dogData[pixelIndex * numChannels + 0], // R
    dogData[pixelIndex * numChannels + 1], // G
    dogData[pixelIndex * numChannels + 2]  // B
  );

  // ...
}