INFORMATION TECHNOLOGY –

GENERIC CODING OF AUDIO-VISUAL OBJECTS

Part 1: Systems

ISO/IEC 14496-1

Final Committee Draft of International Standard

The Systems part of this Final Committee Draft of International Standard describes a system for communicating

interactive audiovisual scenes. Such scenes consist of:

1. the coded representation of natural or synthetic, 2D or 3D objects that can be manifested audibly and/or visually

(media objects);

2. the coded representation of the spatio-temporal positioning of media objects as well as their behavior in response

to interaction (scene description); and

3. the coded representation of information related to the management of information streams (synchronization,

identification, description and association of stream content).

The overall operation of a system communicating such audiovisual scenes is as follows. At the sending side,

audiovisual scene information is compressed, supplemented with synchronization information and passed to a

delivery layer that multiplexes it in one or more coded binary streams that are transmitted or stored. At the receiver

these streams are demultiplexed and decompressed. The media objects are composed according to the scene

description and synchronization information and presented to the end user. The end user may have the option to

interact with the presentation. Interaction information can be processed locally or transmitted to the sender. This

specification defines the semantic and syntactic rules of bitstreams that convey such scene information, as well as the

details of their decoding processes.

In particular, the Systems part of this Final Committee Draft of International Standard specifies the following tools:

· a terminal model for time and buffer management;

· a coded representation of interactive audiovisual scene description information (Binary Format for Scenes –

BIFS);

· a coded representation of identification and description of audiovisual streams as well as the logical dependencies

between stream information (Object and other Descriptors);

· a coded representation of synchronization information (Sync Layer – SL);

· a multiplexed representation of individual streams in a single stream (FlexMux); and

· a coded representation of descriptive audiovisual content information (Object Content Information – OCI).

These various elements are described functionally in this clause and specified in the normative clauses that follow.