Decoding Process
Overview
AV1 contains two operating modes:

General decoding (input is a sequence of OBUs, output is decoded frames)

Large scale tile decoding (input is a tile list OBU plus additional side information, output is a decoded frame)
The general decoding process is specified in General decoding process.
The large scale tile decoding process is specified in Large scale tile decoding process.
General decoding process
When film_grain_params_present is equal to 0, decoders shall produce output frames that are identical in all respects and have the same output order as those produced by the decoding process specified herein.
When film_grain_params_present is equal to 1, a decoder shall implement a film grain synthesis process that modifies the output arrays OutY, OutU, OutV. The reference film grain synthesis process is described in Film grain synthesis process.
When film_grain_params_present is equal to 1, a conformant decoder shall satisfy at least one of the following two options:

A conformant decoder shall produce output frames that are identical in all respects and have the same output order as those produced by the decoding process specified herein including applying the exact film grain synthesis process as specified in Film grain synthesis process.

A conformant decoder shall produce intermediate frames that are identical in all respects and have the same order as the frames produced by the process specified in Intermediate output preparation process. In addition to that, a conformant decoder shall produce output frames that are in the same order and do not have perceptually significant differences with the frames produced by the reference film grain synthesis process specified in [Film grain synthesis proces] when applied to the input frames of the film grain synthesis process with the film grain parameters signaled for these frames. The decoder may also include optional processing steps which are applied to the intermediate frames produced by the process specified in [section Intermediate output preparation process] and before the film grain synthesis process, resulting in the input frames of the film grain synthesis process. Such optional processing steps are beyond the scope of this specification. Otherwise, the intermediate frames are the input frames of the film grain synthesis process. The definition of "perceptually significant differences" is beyond the scope of this specification and may be specified, for example, by a service provider as part of their accreditation program. The film grain synthesis process applied by a conformant decoder should be feature complete with regards to the reference film grain synthesis process of Film grain synthesis process including scaling strength of the film grain as a function of intensity according to the signaled parameters, same maximum AR lag, and similar modeling of correlation between luma and chroma and smoothing of transitions between blocks of grain when applicable.
Note: To ensure conformance, decoder manufacturers are advised to implement the film grain synthesis process as specified in Film grain synthesis process. One reason to choose the second conformance option is implementation of optional processing steps between the output of Intermediate output preparation process and the film grain synthesis process, in which case there could be minor differences in the output with the reference film grain synthesis process of Film grain synthesis process. Examples of these optional processing steps are algorithms improving output picture quality, such as debanding filtering and coding artefacts removal.
Note: Some applications, such as transcoding from AV1 to AV1, may use intermediate output frames of Intermediate output preparation process for transcoding. In such cases, the original film grain synthesis information may be adapted and inserted in the transcoded bitstream.
The input to this process is a sequence of open bitstream units (OBUs).
The output from this process is a sequence of decoded frames.
For each OBU in turn the syntax elements are extracted as specified in OBU syntax.
The syntax tables include function calls indicating when the remaining decode processes are triggered.
Large scale tile decoding process
General
The large scale tile decoding process is used to decode a random subset of tiles taken from a number of coded frames. The list of tiles is specified by a tile list OBU. One possible use case for this process is described in Annex D.
Note: A decoder is recommended to support decoding of tile list OBUs, but this is not a requirement for decoder conformance.
The inputs to this process are:

contents of all syntax elements and variables produced when parsing a sequence header OBU,

contents of all syntax elements and variables produced when parsing a frame header OBU (including CDF tables optionally loaded from a reference frame),

an array AnchorFrames containing up to 128 frames,

a tile list OBU.
The output from this process is:
 an output frame containing decoded tiles in raster order.
Note: The syntax elements from the sequence header and frame header may be produced by decoding a sequence header OBU and a frame header OBU, but this is not a requirement of decoder conformance. The AnchorFrames may be produced by decoding an AV1 bitstream, but this is not a requirement of bitstream conformance.
The following figure shows the arrangement of data required to decode a single tile list OBU. Those data shown on a green background are normatively defined in this specification. Data items shown on a yellow background are defined by a process or processes beyond the scope of this specification.
For each tile list entry in the tile list OBU, the following ordered steps are applied:

Parse the syntax elements within the tile_list_entry

Set the bitstream position indicator to point to the the start of the coded_tile_data syntax element

Set the variable last equal to ref_frame_idx[ 0 ]

Set FrameStore[ last ] equal to AnchorFrames[ anchor_frame_idx ]

RefValid[ last ] is set equal to 1.

RefUpscaledWidth[ last ] is set equal to UpscaledWidth.

RefFrameWidth[ last ] is set equal to FrameWidth.

RefFrameHeight[ last ] is set equal to FrameHeight.

RefMiCols[ last ] is set equal to MiCols.

RefMiRows[ last ] is set equal to MiRows.

RefSubsamplingX[ last ] is set equal to subsampling_x.

RefSubsamplingY[ last ] is set equal to subsampling_y.

RefBitDepth[ last ] is set equal to BitDepth.

Invoke the decode camera tile process specified in Decode camera tile process and write the decoded tiles into an output frame in raster order, in the order that they occur in the tile list OBU.
The output from this process is the output frame that is built up in the final step above.
The variable outputW is defined as ( 1 + output_frame_width_in_tiles_minus_1 ) * TileWidth.
The variable outputH is defined as ( 1 + output_frame_height_in_tiles_minus_1 ) * TileHeight.
The operation of writing a decoded tile (with zerobased index given by the variable tile) into the output frame in raster order is defined as follows:
destX = TileWidth * ( tile % (output_frame_width_in_tiles_minus_1 + 1) )
destY = TileHeight * ( tile / (output_frame_width_in_tiles_minus_1 + 1) )
w = TileWidth
h = TileHeight
for ( y = 0; y < h; y++ ) {
for ( x = 0; x < w; x++ ) {
OutputFrameY[ y + destY ][ x + destX ] = OutY[ y ][ x ]
}
}
w = w >> subsampling_x
h = h >> subsampling_y
destX = destX >> subsampling_x
destY = destY >> subsampling_y
for ( y = 0; y < h; y++ ) {
for ( x = 0; x < w; x++ ) {
OutputFrameU[ y + destY ][ x + destX ] = OutU[ y ][ x ]
OutputFrameV[ y + destY ][ x + destX ] = OutV[ y ][ x ]
}
}
OutputFrameY (representing the luma plane of the output frame) is outputW samples across by outputH samples down.
OutputFrameU (representing the U plane of the output frame) is ( outputW >> subsampling_x ) samples across by ( outputH >> subsampling_y ) samples down.
OutputFrameV (representing the V plane of the output frame) is ( outputW >> subsampling_x ) samples across by ( outputH >> subsampling_y ) samples down.
The bitdepth of each output sample is given by BitDepth.
The output frame may not be fully covered with decoded tiles. The decoder should not modify samples in the output frame outside of the boundaries of the decoded tiles.
Decoders that support large scale tile decoding shall produce output frames that are identical in all respects as those produced by this decoding process.
It is a requirement of bitstream conformance that the following conditions are met:

enable_superres is equal to 0

enable_order_hint is equal to 0

still_picture is equal to 0

film_grain_params_present is equal to 0

timing_info_present_flag is equal to 0

decoder_model_info_present_flag is equal to 0

initial_display_delay_present_flag is equal to 0

enable_restoration is equal to 0

enable_cdef is equal to 0

mono_chrome is equal to 0

TileHeight is equal to (use_128x128_superblock ? 128 : 64) for all tiles (i.e. the tile is exactly one superblock high)

TileWidth is identical for all tiles and is an integer multiple of TileHeight (i.e. the tile is an integer number of superblocks wide)

FrameWidth is equal to MiCols * MI_SIZE

FrameHeight is equal to MiRows * MI_SIZE

show_existing_frame is equal to 0

frame_type is equal to INTER_FRAME

show_frame is equal to 1

error_resilient_mode is equal to 0

disable_cdf_update is equal to 1

disable_frame_end_update_cdf is equal to 1

delta_lf_present is equal to 0

delta_q_present is equal to 0

frame_size_override_flag is equal to 0

refresh_frame_flags is equal to 0

use_ref_frame_mvs is equal to 0

segmentation_temporal_update is equal to 0

reference_select is equal to 0

loop_filter_level[ 0 ] and loop_filter_level[ 1 ] are equal to 0

tile_count_minus_1 + 1 is less than or equal to (output_frame_width_in_tiles_minus_1 + 1) * (output_frame_height_in_tiles_minus_1 + 1).
Decode camera tile process
This process decodes a single tile within a frame.
The output of this process are arrays OutY, OutU, OutV containing the decoded samples for the tile.
Note: The decoding process defined here does not invoke the postprocessing steps of deblock, cdef, superres, loop restoration and reference frame update. Implementations may choose to implement this process by using the general decode process with these tools disabled.
The process is specified as:
CurrentQIndex = base_q_idx
init_symbol( tile_data_size_minus_1 + 1 )
clear_above_context( )
sbSize = use_128x128_superblock ? BLOCK_128X128 : BLOCK_64X64
sbSize4 = Num_4x4_Blocks_Wide[ sbSize ]
MiRowStart = MiRowStarts[ anchor_tile_row ]
MiRowEnd = MiRowStarts[ anchor_tile_row + 1 ]
MiColStart = MiColStarts[ anchor_tile_col ]
MiColEnd = MiColStarts[ anchor_tile_col + 1 ]
for ( r = MiRowStart; r < MiRowEnd; r += sbSize4 ) {
clear_left_context( )
for ( c = MiColStart; c < MiColEnd; c += sbSize4 ) {
ReadDeltas = delta_q_present
clear_block_decoded_flags( c < ( MiColEnd  1 ) )
decode_partition( r, c, sbSize )
}
}
exit_symbol( )
w = (MiColEnd  MiColStart) * MI_SIZE
h = (MiRowEnd  MiRowStart) * MI_SIZE
x0 = MiColStart * MI_SIZE
y0 = MiRowStart * MI_SIZE
subX = subsampling_x
subY = subsampling_y
xC0 = ( MiColStart * MI_SIZE ) >> subX
yC0 = ( MiRowStart * MI_SIZE ) >> subY
Note: The intention is that the same decoding process for tile data can be used as for the general decoding process.
It is a requirement of bitstream conformance that the following conditions are met whenever the parsing process returns from the read_ref_frames syntax:

RefFrame[ 0 ] = LAST_FRAME

RefFrame[ 1 ] = NONE
Note: It is allowed to use intra blocks, they are not forbidden by this constraint because intra blocks do not invoke the read_ref_frames syntax.
Arrays OutY, OutU, OutV (representing the decoded samples for the tile) are specified as:

The array OutY is w samples across by h samples down and the sample at location x samples across and y samples down is given by OutY[ y ][ x ] = CurrFrame[ 0 ][ y0 + y ][ x0 + x ] with x = 0..w  1 and y = 0..h  1.

The array OutU is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutU[ y ][ x ] = CurrFrame[ 1 ][ yC0 + y ][ xC0 + x ] with x = 0..(w >> subX)  1 and y = 0..(h >> subY)  1.

The array OutV is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutV[ y ][ x ] = CurrFrame[ 2 ][ yC0 + y ][ xC0 + x ] with x = 0..(w >> subX)  1 and y = 0..(h >> subY)  1.
The output of this process is arrays OutY, OutU, OutV representing the Y, U, and V samples.
Decode frame wrapup process
This process is triggered by a call to decode_frame_wrapup from within the syntax tables.
At this stage, all the tile level decode has been done, and this process performs any frame level decode that is required.
If show_existing_frame is equal to 0, the process first performs any post processing filtering by the following ordered steps:

If loop_filter_level[ 0 ] is not equal to 0 or loop_filter_level[ 1 ] is not equal to 0, the loop filter process specified in Loop filter process is invoked (this process modifies the contents of CurrFrame).

The CDEF process specified in CDEF process is invoked (this process takes CurrFrame and produces CdefFrame).

The upscaling process specified in Upscaling process is invoked with CdefFrame as input and the output is assigned to UpscaledCdefFrame.

The upscaling process specified in Upscaling process is invoked with CurrFrame as input and the output is assigned to UpscaledCurrFrame.

The loop restoration process specified in Loop restoration process is invoked (this process takes UpscaledCurrFrame and UpscaledCdefFrame and produces LrFrame).

The motion field motion vector storage process specified in Motion field motion vector storage process is invoked.

If segmentation_enabled is equal to 1 and segmentation_update_map is equal to 0, SegmentIds[ row ][ col ] is set equal to PrevSegmentIds[ row ][ col ] for row = 0..MiRows1, for col = 0..MiCols1.
Otherwise (show_existing_frame is equal to 1), if frame_type is equal to KEY_FRAME, the reference frame loading process as specified in Reference frame loading process is invoked (this process loads frame state from the reference frames into the current frame state variables).
The following ordered steps now apply:

The reference frame update process as specified in Reference frame update process is invoked (this process saves the current frame state into the reference frames).

If show_frame is equal to 1 or show_existing_frame is equal to 1, the output process as specified in Output process is invoked (this will output the current frame or a saved frame).
Note: Although it is specified that all samples in CurrFrame are upscaled, at most 2 lines above and below each stripe (defined by StripeStartY and StripeEndY) will end up being read. Implementations may wish to avoid upscaling the unused lines.
Ordering of OBUs
A bitstream conforming to this specification consists of one or more coded video sequences.
A coded video sequence consists of one or more temporal units. A temporal unit consists of a series of OBUs starting from a temporal delimiter, optional sequence headers, optional metadata OBUs, a sequence of one or more frame headers, each followed by zero or more tile group OBUs as well as optional padding OBUs.
A new coded video sequence is defined to start at each temporal unit which satisfies both of the following conditions:

A sequence header OBU appears before the first frame header.

The first frame header has frame_type equal to KEY_FRAME, show_frame equal to 1, show_existing_frame equal to 0, and temporal_id equal to 0.
If scalability is not being used (OperatingPointIdc equal to 0), then all frames are part of the operating point. The following constraints must hold:

The first frame header must have frame_type equal to KEY_FRAME and show_frame equal to 1.

Each temporal unit must have exactly one shown frame.
If scalability is being used (OperatingPointIdc not equal to 0), then only a subset of frames are part of the operating point. For each operating point, the following constraints must hold:

The first frame header that will be decoded must have frame_type equal to KEY_FRAME and show_frame equal to 1.

Every layer that has a coded frame in a temporal unit must have exactly one shown frame that is the last frame of that layer in the temporal unit.
Note: A shown frame is either a frame with show_frame equal to 1, or with show_existing_frame equal to 1.
A frame header and its associated tile group OBUs within a temporal unit must use the same value of obu_extension_flag (i.e., either both include or both not include the optional OBU extension header).
All OBU extension headers that are contained in the same temporal unit and have the same spatial_id value must have the same temporal_id value.
If a coded video sequence contains at least one enhancement layer (OBUs with spatial_id greater than 0 or temporal_id greater than 0) then all frame headers and tile group OBUs associated with base (spatial_id equals 0 and temporal_id equals 0) and enhancement layer (spatial_id greater than 0 or temporal_id greater than 0) data must include the OBU extension header.
OBUs with spatial level IDs (spatial_id) greater than 0 must appear within a temporal unit in increasing order of the spatial level ID values.
The first temporal unit of a coded video sequence must contain one or more sequence header OBUs before the first frame header OBU.
Note: There is not a requirement that every temporal unit with a key frame also contains a sequence header, just that the sequence header has been sent before the first key frame. However, note that temporal units without sequence header OBUs are not considered to be random access points.
Sequence header OBUs may appear in any order within a coded video sequence. Within a particular coded video sequence, the contents of sequence_header_obu must be bitidentical each time the sequence header appears except for the contents of operating_parameters_info. A new coded video sequence is required if the sequence header parameters change. Any sequence header in a bitstream which changes the parameters must be contained in a temporal unit with temporal_id equal to zero.
If a temporal unit contains one or more sequence header OBUs, the first appearance of the sequence header OBU must be before the first frame header OBU.
One or more metadata and padding OBUs may appear in any order within an OBU sequence (unless constrained by semantics provided elsewhere in this specification). Specific metadata types may be required or recommended to be placed in specific locations, as identified in their corresponding definitions.
OBU types that are not defined in this specification can be ignored by a decoder.
Note: Some applications may choose to use bitstreams that are not fully conformant to the requirements described in this section. For example, a bitstream received in a streaming use case may never contain key frames, but instead rely on gradual intra refresh.
Random access decoding
General
In general, random access points are places in a bitstream where decoding can be started.
This section defines the types of random access point that must be supported by all conformant decoders.
The purpose of this section is to define a minimum level of functionality that must be supported, not a maximum. In other words, decoders may choose to support more types of random access point.
The random access points are defined in Definitions.
The conformance requirements are specified in Conformance requirements.
The consequences for encoders are specified in Encoder consequences.
The consequences for decoders are specified in Decoder consequences.
Definitions
This section defines the following terms:

key frame random access point,

delayed random access point,

key frame dependent recovery point.
A key frame random access point is defined as being a frame:

with frame_type equal to KEY_FRAME

with show_frame equal to 1

that is contained in a temporal unit that also contains a sequence header OBU
A delayed random access point is defined as being a frame:

with frame_type equal to KEY_FRAME

with show_frame equal to 0

that is contained in a temporal unit that also contains a sequence header OBU
A key frame dependent recovery point is defined as being a frame:

with show_existing_frame equal to 1

with frame_to_show_map_idx specifying a frame to output that was a delayed random access point
Conformance requirements
Informally, the requirement for decoder conformance is that decoding can start at any key frame random access point or delayed random access point. The rest of this section makes this requirement more precise.
Starting at a key frame random access point is trivial, because if the earlier temporal units are dropped, the remaining temporal units still constitute a valid bitstream.
Starting at a delayed random access point is harder to define because:

if all temporal units before the key frame dependent recovery point are dropped, it is impossible to decode (because the relevant delayed random access point has been dropped)

if all temporal units before the delayed random access point are dropped, it is unclear what should happen for frames between the delayed random access point and the keyframe dependent recovery point (some applications may wish these to be dropped, while others may wish them to be displayed)

in either case, the remaining temporal units do not constitute a valid standalone bitstream (because it does not start with a shown key frame)
To support the different modes of operation, a conformant decoder is required to be able to decode bitstreams consisting of:

a temporal unit containing a delayed random access point

immediately followed by a temporal unit containing the associated key frame dependent recovery point

followed by optional additional temporal units
This moves the responsibility for dropping the intermediate temporal units (between the delayed random access point and the key frame dependent recovery point) out of the normatively defined decoding process into application specific behavior. This allows applications to choose which behavior to use depending on the use case and capabilities of the specific decoder implementation.
Note: In practice, decoder implementations are expected to be able to start decoding bitstreams from a delayed random access point when the intermediate temporal units are still present. The decoder should correctly produce all output frames from the next key frame or key frame dependent recovery point onwards, while the preceding frames are implementation defined. For example: a streaming decoder may choose to decode and display all frames even when the reference frames are not available (tolerating some errors in the output), a low latency decoder may choose to decode and display all frames that are guaranteed to be correct (e.g. an inter frame that only uses inter prediction from the delayed random access point), a media player decoder may choose to decode and display only frames starting from a key frame or key frame dependent recovery point (guaranteeing smooth playback once display starts).
Encoder consequences
Random access points introduce no additional conformance requirements on encoders.
Encoders are free to insert any number of random access points.
Decoder consequences
The conformance requirement means that conformant decoders must be able to start decoding at a delayed random access point partway through a valid bitstream.
This is almost the same as decoding a bitstream from the start  the only differences are that:

The first frame has show_frame equal to 0.

If frame_id_numbers_present_flag is equal to 1, for the first frame current_frame_id should not be compared to PrevFrameID (because PrevFrameID is uninitialized).
Frame end update CDF process
This process is triggered when the function frame_end_update_cdf is called from the tile group syntax table.
The frame CDF arrays are set equal to the saved CDF arrays as follows.
A copy is made of the saved CDF values for each of the CDF arrays mentioned in the semantics for init_coeff_cdfs and init_non_coeff_cdfs. The name of the destination for the copy is the name of the CDF array with no prefix. The name of the source for the copy is the name of the CDF array prefixed with "Saved". For example, the array YModeCdf will be updated with values equal to the contents of SavedYModeCdf.
Set frame refs process
This process is triggered if the function set_frame_refs is called while reading the uncompressed header.
The syntax elements in the ref_frame_idx array are computed based on:
 the syntax elements last_frame_idx and gold_frame_idx,
 the values stored within the RefOrderHint array (these values represent the least significant bits of the expected output order of the frames).
The reference frames used for the LAST_FRAME and GOLDEN_FRAME references are sent explicitly and used to set the corresponding entries of ref_frame_idx as follows (the other entries are initialized to 1 and will be overwritten later in this process):
for ( i = 0; i < REFS_PER_FRAME; i++ )
ref_frame_idx[ i ] = 1
ref_frame_idx[ LAST_FRAME  LAST_FRAME ] = last_frame_idx
ref_frame_idx[ GOLDEN_FRAME  LAST_FRAME ] = gold_frame_idx
An array usedFrame marking which reference frames have been used is prepared as follows:
for ( i = 0; i < NUM_REF_FRAMES; i++ )
usedFrame[ i ] = 0
usedFrame[ last_frame_idx ] = 1
usedFrame[ gold_frame_idx ] = 1
A variable curFrameHint is set equal to 1 << (OrderHintBits  1).
An array shiftedOrderHints (containing the expected output order shifted such that the current frame has hint equal to curFrameHint) is prepared as follows:
for ( i = 0; i < NUM_REF_FRAMES; i++ )
shiftedOrderHints[ i ] = curFrameHint + get_relative_dist( RefOrderHint[ i ], OrderHint )
The variable lastOrderHint (representing the expected output order for LAST_FRAME) is set equal to shiftedOrderHints[ last_frame_idx ].
It is a requirement of bitstream conformance that lastOrderHint is strictly less than curFrameHint.
The variable goldOrderHint (representing the expected output order for GOLDEN_FRAME) is set equal to shiftedOrderHints[ gold_frame_idx ].
It is a requirement of bitstream conformance that goldOrderHint is strictly less than curFrameHint.
The ALTREF_FRAME reference is set to be a backward reference to the frame with highest output order as follows:
ref = find_latest_backward()
if ( ref >= 0 ) {
ref_frame_idx[ ALTREF_FRAME  LAST_FRAME ] = ref
usedFrame[ ref ] = 1
}
where find_latest_backward is defined as:
find_latest_backward() {
ref = 1
for ( i = 0; i < NUM_REF_FRAMES; i++ ) {
hint = shiftedOrderHints[ i ]
if ( !usedFrame[ i ] &&
hint >= curFrameHint &&
( ref < 0  hint >= latestOrderHint ) ) {
ref = i
latestOrderHint = hint
}
}
return ref
}
The BWDREF_FRAME reference is set to be a backward reference to the closest frame as follows:
ref = find_earliest_backward()
if ( ref >= 0 ) {
ref_frame_idx[ BWDREF_FRAME  LAST_FRAME ] = ref
usedFrame[ ref ] = 1
}
where find_earliest_backward is defined as:
find_earliest_backward() {
ref = 1
for ( i = 0; i < NUM_REF_FRAMES; i++ ) {
hint = shiftedOrderHints[ i ]
if ( !usedFrame[ i ] &&
hint >= curFrameHint &&
( ref < 0  hint < earliestOrderHint ) ) {
ref = i
earliestOrderHint = hint
}
}
return ref
}
The ALTREF2_FRAME reference is set to the next closest backward reference as follows:
ref = find_earliest_backward()
if ( ref >= 0 ) {
ref_frame_idx[ ALTREF2_FRAME  LAST_FRAME ] = ref
usedFrame[ ref ] = 1
}
The remaining references are set to be forward references in antichronological order as follows:
for ( i = 0; i < REFS_PER_FRAME  2; i++ ) {
refFrame = Ref_Frame_List[ i ]
if ( ref_frame_idx[ refFrame  LAST_FRAME ] < 0 ) {
ref = find_latest_forward()
if ( ref >= 0 ) {
ref_frame_idx[ refFrame  LAST_FRAME ] = ref
usedFrame[ ref ] = 1
}
}
}
where Ref_Frame_List is specifed as:
Ref_Frame_List[ REFS_PER_FRAME  2 ] = {
LAST2_FRAME, LAST3_FRAME, BWDREF_FRAME, ALTREF2_FRAME, ALTREF_FRAME
}
and find_latest_forward is defined as:
find_latest_forward() {
ref = 1
for ( i = 0; i < NUM_REF_FRAMES; i++ ) {
hint = shiftedOrderHints[ i ]
if ( !usedFrame[ i ] &&
hint < curFrameHint &&
( ref < 0  hint >= latestOrderHint ) ) {
ref = i
latestOrderHint = hint
}
}
return ref
}
Finally, any remaining references are set to the reference frame with smallest output order as follows:
ref = 1
for ( i = 0; i < NUM_REF_FRAMES; i++ ) {
hint = shiftedOrderHints[ i ]
if ( ref < 0  hint < earliestOrderHint ) {
ref = i
earliestOrderHint = hint
}
}
for ( i = 0; i < REFS_PER_FRAME; i++ ) {
if ( ref_frame_idx[ i ] < 0 ) {
ref_frame_idx[ i ] = ref
}
}
Note: Multiple reference frames can share the same value for OrderHint and care needs to be taken to handle this case consistently. The reference implementation uses an equivalent implementation based on sorting the reference frames based on their expected output order, with ties broken based on the reference frame index.
Motion field estimation process
General
This process is triggered by a call to motion_field_estimation while reading the uncompressed header.
A linear projection model is employed to create a motion field estimation that is able to capture high velocity temporal motion trajectories.
The motion field is estimated based on the saved motion vectors from the reference frames and the relative frame distances.
As the frame distances depend on the frame being referenced, a separate motion field is estimated for each reference frame used by the current frame.
A motion vector (for each reference frame type) is prepared at each location on an 8x8 luma sample grid.
The variable w8 (representing the width of the motion field in units of 8x8 luma samples) is set equal to MiCols >> 1.
The variable h8 (representing the height of the motion field in units of 8x8 luma samples) is set equal to MiRows >> 1.
As the linear projection can create a field with holes, the motion fields are initialized to an invalid motion vector of 32768, 32768 as follows:
for ( ref = LAST_FRAME; ref <= ALTREF_FRAME; ref++ )
for ( y = 0; y < h8 ; y++ )
for ( x = 0; x < w8; x++ )
for ( j = 0; j < 2; j++ )
MotionFieldMvs[ ref ][ y ][ x ][ j ] = 1 << 15
The variable lastIdx (representing which reference frame is used for LAST_FRAME) is set equal to ref_frame_idx[ 0 ].
The variable curGoldOrderHint (representing the expected output order for GOLDEN_FRAME of the current frame) is set equal to OrderHints[ GOLDEN_FRAME ].
The variable lastAltOrderHint (representing the expected output order for ALTREF_FRAME of LAST_FRAME) is set equal to SavedOrderHints[ lastIdx ][ ALTREF_FRAME ].
The variable useLast (representing whether to project the motion vectors from LAST_FRAME) is set equal to ( lastAltOrderHint != curGoldOrderHint ).
If useLast is equal to 1, the projection process in Projection process is invoked with src equal to LAST_FRAME, and dstSign equal to 1. (The output of this process is discarded.)
The variable refStamp (that limits how many reference frames have to be projected) is set equal to MFMV_STACK_SIZE  2.
The variable useBwd is set equal to get_relative_dist( OrderHints[ BWDREF_FRAME ], OrderHint ) > 0.
If useBwd is equal to 1, the following steps apply:

The projection process in Projection process is invoked with src equal to BWDREF_FRAME, and dstSign equal to 1, and the output assigned to projOutput.

If projOutput is equal to 1, refStamp is set equal to refStamp  1.
The variable useAlt2 is set equal to get_relative_dist( OrderHints[ ALTREF2_FRAME ], OrderHint ) > 0.
If useAlt2 is equal to 1, the following steps apply:

The projection process in Projection process is invoked with src equal to ALTREF2_FRAME, and dstSign equal to 1, and the output assigned to projOutput.

If projOutput is equal to 1, refStamp is set equal to refStamp  1.
The variable useAlt is set equal to get_relative_dist( OrderHints[ ALTREF_FRAME ], OrderHint ) > 0.
If useAlt is equal to 1 and (refStamp >= 0), the following steps apply:

The projection process in Projection process is invoked with src equal to ALTREF_FRAME, and dstSign equal to 1, and the output assigned to projOutput.

If projOutput is equal to 1, refStamp is set equal to refStamp  1.
If ( refStamp >= 0 ), the projection process in Projection process is invoked with src equal to LAST2_FRAME, and dstSign equal to 1. (The output of this process is discarded.)
Projection process
The inputs to this process are:

a variable src specifying which reference frame's motion vectors should be projected,

a variable dstSign specifying a negation multiplier for the motion vector direction.
The process projects the motion vectors from a whole reference frame and stores the results in MotionFieldMvs.
The process outputs a single boolean value representing whether the source frame was valid for this operation. If the output is zero, no modification is made to MotionFieldMvs.
The variable srcIdx (representing which reference frame is used) is set equal to ref_frame_idx[ src  LAST_FRAME ].
The variable w8 (representing the width of the motion field in units of 8x8 luma samples) is set equal to MiCols >> 1.
The variable h8 (representing the height of the motion field in units of 8x8 luma samples) is set equal to MiRows >> 1.
If RefMiRows[ srcIdx ] is not equal to MiRows, RefMiCols[ srcIdx ] is not equal to MiCols, or RefFrameType[ srcIdx ] is equal to INTRA_ONLY_FRAME or KEY_FRAME, the process exits at this point, with the output set equal to 0.
The process is specified as follows:
for ( y8 = 0; y8 < h8; y8++ ) {
for ( x8 = 0; x8 < w8; x8++ ) {
row = 2 * y8 + 1
col = 2 * x8 + 1
srcRef = SavedRefFrames[ srcIdx ][ row ][ col ]
if ( srcRef > INTRA_FRAME ) {
refToCur = get_relative_dist( OrderHints[ src ], OrderHint )
refOffset = get_relative_dist( OrderHints[ src ], SavedOrderHints[ srcIdx ][ srcRef ] )
posValid = Abs( refToCur ) <= MAX_FRAME_DISTANCE &&
Abs( refOffset ) <= MAX_FRAME_DISTANCE &&
refOffset > 0
if ( posValid ) {
mv = SavedMvs[ srcIdx ][ row ][ col ]
projMv = get_mv_projection( mv, refToCur * dstSign, refOffset )
posValid = get_block_position( x8, y8, dstSign, projMv )
if ( posValid ) {
for ( dst = LAST_FRAME; dst <= ALTREF_FRAME; dst++ ) {
refToDst = get_relative_dist( OrderHint, OrderHints[ dst ] )
projMv = get_mv_projection( mv, refToDst, refOffset )
MotionFieldMvs[ dst ][ PosY8 ][ PosX8 ] = projMv
}
}
}
}
}
}
When the function get_mv_projection is called, the get mv projection process specified in Get MV projection process is invoked and the output assigned to projMv.
When the function get_block_position is called, the get block position process specified in Get block position process is invoked and the output assigned to posValid. This process also sets up the variables PosY8 and PosX8 representing the projected location in the motion field.
The process now exits with the output set equal to 1.
Get MV projection process
The inputs to this process are:

a length 2 array mv specifying a motion vector,

a variable numerator specifying the number of frames to be covered by the projected motion vector,

a variable denominator specifying the number of frames covered by the original motion vector.
The outputs of this process are:
 a length 2 array projMv containing the projected motion vector
This process starts with a motion vector mv from a previous frame. This motion vector gives the displacement expected when moving a certain number of frames (given by the variable denominator). In order to use the motion vector for predictions using a different reference frame, the length of the motion vector must be scaled.
The variable clippedDenominator is set equal to Min( MAX_FRAME_DISTANCE, denominator ).
The variable clippedNumerator is set equal to Clip3( MAX_FRAME_DISTANCE, MAX_FRAME_DISTANCE, numerator ).
The projected motion vector is specified as follows:
for ( i = 0; i < 2; i++ ) {
scaled = Round2Signed( mv[ i ] * clippedNumerator * Div_Mult[ clippedDenominator ], 14 )
projMv[ i ] = Clip3( (1 << 14) + 1, (1 << 14)  1, scaled )
}
where Div_Mult is a constant lookup table specified as:
Div_Mult[32] = {
0, 16384, 8192, 5461, 4096, 3276, 2730, 2340, 2048, 1820, 1638,
1489, 1365, 1260, 1170, 1092, 1024, 963, 910, 862, 819, 780,
744, 712, 682, 655, 630, 606, 585, 564, 546, 528
}
Get block position process
The inputs to this process are:

variables x8 and y8 specifying a location in units of 8x8 luma samples,

a variable dstSign specifying a negation multiplier for the motion vector direction,

a length 2 array projMv specifying a projected motion vector.
The process generates global variables PosX8 and PosY8 representing the projected location in units of 8x8 luma samples.
The process returns a flag posValid that indicates if the position should be used.
Note: posValid is specified such that only blocks within a certain distance of the current location need to be projected.
The variable posValid is set equal to 1.
The variable PosY8 is set equal to project(y8, projMv[ 0 ], dstSign, MiRows >> 1, MAX_OFFSET_HEIGHT).
The variable PosX8 is set equal to project(x8, projMv[ 1 ], dstSign, MiCols >> 1, MAX_OFFSET_WIDTH).
where the function project is specified as follows:
project( v8, delta, dstSign, max8, maxOff8 ) {
base8 = (v8 >> 3) << 3
if ( delta >= 0 ) {
offset8 = delta >> ( 3 + 1 + MI_SIZE_LOG2 )
} else {
offset8 = ( ( delta ) >> ( 3 + 1 + MI_SIZE_LOG2 ) )
}
v8 += dstSign * offset8
if ( v8 < 0 
v8 >= max8 
v8 < base8  maxOff8 
v8 >= base8 + 8 + maxOff8 ) {
posValid = 0
}
return v8
}
The project function clears posValid if the resulting position is offset too far.
Motion vector prediction processes
General
The following sections define the processes used for predicting the motion vectors.
The entry point to these processes is triggered by the function call to find_mv_stack in the inter block mode info syntax described in Inter block mode info syntax. This function call invokes the Find MV Stack Process specified in Find MV stack process.
Find MV stack process
This process is triggered by a function call to find_mv_stack.
The input to this process is a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
This process constructs an array RefStackMv containing motion vector candidates.
The process also prepares the value of the contexts used when decoding inter prediction syntax elements.
The array RefStackMv will be constructed during this process. RefStackMv[ idx ][ list ][ comp ] represents component comp (0 for y or 1 for x) of a motion vector for a particular list (0 or 1) at position idx (0 to MAX_REF_MV_STACK_SIZE  1) in the stack. No initialization is needed because each entry is always written before it can be read.
The variable bw4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].
The variable bh4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].
The following ordered steps apply:

The variable NumMvFound (representing the number of motion vector candidates in RefStackMv) is set equal to 0.

The variable NewMvCount (representing the number of candidates found that used NEWMV encoding) is set equal to 0.

The setup global mv process specified in Setup global MV process is invoked with the input 0 and the output is assigned to GlobalMvs[ 0 ].

If isCompound is equal to 1, the setup global mv process specified in Setup global MV process is invoked with the input 1 and the output is assigned to GlobalMvs[ 1 ].

The variable FoundMatch is set equal to 0.

The scan row process in Setup global MV process is invoked with deltaRow equal to 1 and isCompound as inputs.

The variable foundAboveMatch is set equal to FoundMatch, and FoundMatch is set equal to 0.

The scan col process in Scan col process is invoked with deltaCol equal to 1 and isCompound as inputs.

The variable foundLeftMatch is set equal to FoundMatch, and FoundMatch is set equal to 0.

If Max( bw4, bh4 ) is less than or equal to 16, the scan point process in Scan point process is invoked with deltaRow equal to 1, deltaCol equal to bw4, and isCompound as inputs.

If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

The variable CloseMatches (representing candidates found in the immediate neighborhood) is set equal to foundAboveMatch + foundLeftMatch.

The variable numNearest (representing the number of motion vectors found in the immediate neighborhood) is set equal to NumMvFound

The variable numNew (representing the number of times a NEWMV candidate was found in the immediate neighborhood) is set equal to NewMvCount

If numNearest is greater than 0, WeightStack[ idx ] is incremented by REF_CAT_LEVEL for idx = 0..(numNearest1).

The variable ZeroMvContext is set equal to 0.

If use_ref_frame_mvs is equal to 1, the temporal scan process in Temporal scan process is invoked with isCompound as input (the temporal scan process affects ZeroMvContext).

The scan point process in Scan point process is invoked with deltaRow equal to 1, deltaCol equal to 1, and isCompound as inputs.

If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

The variable FoundMatch is set equal to 0.

The scan row process in Scan row process is invoked with deltaRow equal to 3 and isCompound as inputs.

If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

The variable FoundMatch is set equal to 0.

The scan col process in Scan col process is invoked with deltaCol equal to 3 and isCompound as inputs.

If FoundMatch is equal to 1, the variable foundLeftMatch is set equal to 1.

The variable FoundMatch is set equal to 0.

If bh4 is greater than 1, the scan row process in Scan row process is invoked with deltaRow equal to 5 and isCompound as inputs.

If FoundMatch is equal to 1, the variable foundAboveMatch is set equal to 1.

The variable FoundMatch is set equal to 0.

If bw4 is greater than 1, the scan col process in Scan col process is invoked with deltaCol equal to 5 and isCompound as inputs.

If FoundMatch is equal to 1, the variable foundLeftMatch is set equal to 1.

The variable TotalMatches (representing all found candidates) is set equal to foundAboveMatch + foundLeftMatch.

The sorting process in [Sorting process] is invoked with start equal to 0, end equal to numNearest, and isCompound as input.

The sorting process in [Sorting process] is invoked with start equal to numNearest, end equal to NumMvFound, and isCompound as input.

If NumMvFound is less than 2, the extra search process in Extra search process is invoked with isCompound as input.

The context and clamping process in Context and clamping process is invoked with isCompound and numNew as input.
Setup global MV process
The input to this process is a variable refList specifying which set of motion vectors to predict.
The output is a motion vector mv representing global motion for this block.
The variable ref (specifying the reference frame) is set equal to RefFrame[ refList ].
If ref is not equal to INTRA_FRAME, the variable typ (specifying the type of global motion) is set equal to GmType[ ref ].
The variable bw (representing the width of the block in units of luma samples) is set equal to Block_Width[ MiSize ].
The variable bh (representing the height of the block in units of luma samples) is set equal to Block_Height[ MiSize ].
The output motion vector mv is specified by projecting the central luma sample of the block as follows:
if ( ref == INTRA_FRAME  typ == IDENTITY ) {
mv[0] = 0
mv[1] = 0
} else if ( typ == TRANSLATION ) {
mv[0] = gm_params[ref][0] >> (WARPEDMODEL_PREC_BITS  3)
mv[1] = gm_params[ref][1] >> (WARPEDMODEL_PREC_BITS  3)
} else {
x = MiCol * MI_SIZE + bw / 2  1
y = MiRow * MI_SIZE + bh / 2  1
xc = (gm_params[ref][2]  (1 << WARPEDMODEL_PREC_BITS)) * x +
gm_params[ref][3] * y +
gm_params[ref][0]
yc = gm_params[ref][4] * x +
(gm_params[ref][5]  (1 << WARPEDMODEL_PREC_BITS)) * y +
gm_params[ref][1]
if ( allow_high_precision_mv ) {
mv[0] = Round2Signed(yc, WARPEDMODEL_PREC_BITS  3)
mv[1] = Round2Signed(xc, WARPEDMODEL_PREC_BITS  3)
} else {
mv[0] = Round2Signed(yc, WARPEDMODEL_PREC_BITS  2) * 2
mv[1] = Round2Signed(xc, WARPEDMODEL_PREC_BITS  2) * 2
}
}
lower_mv_precision( mv )
where the call to lower_mv_precision invokes the lower precision process specified in Lower precision process.
Scan row process
The inputs to this process are:

a variable deltaRow specifying (in units of 4x4 luma samples) how far above to look for motion vectors,

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
The variable bw4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].
The variable end4 specifying the last block to be scanned in horizontal 4x4 luma samples is set equal to Min( Min( bw4, MiCols  MiCol ), 16 ).
Note: end4 limits the number of locations to be searched for large blocks. There is a similar optimization in that the scan point process for the topright location is not invoked for large blocks. For example, for a 64 by 64 block, candidates from the row above will be examined at x offsets of 1, 0, 16, 32, 48, 64. (The 0, 16, 32, 48 locations are scanned in this process, while the 1 and 64 are scanned by the scan point process.) However, for a 128 by 64 or 64 by 128 block, candidates from the row above will only be examined at x offsets of 1, 0, 16, 32, 48 because the scan point process for the topright location is not invoked.
The variable deltaCol is set equal to 0.
The variable useStep16 is set equal to (bw4 >= 16).
Note: useStep16 is equal to 1 when the block is 64 luma samples wide or wider. This means only 4 locations will be searched in this case. However, a 32 luma samples wide block may still search 8 locations.
If Abs(deltaRow) is greater than 1, the offset is adjusted as follows:
deltaRow += MiRow & 1
deltaCol = 1  (MiCol & 1)
Note: These adjustments reduce the number of motion vectors that need to be kept in memory.
A series of motion vector locations is scanned as follows:
i = 0
while ( i < end4 ) {
mvRow = MiRow + deltaRow
mvCol = MiCol + deltaCol + i
if ( !is_inside(mvRow,mvCol) )
break
len = Min(bw4, Num_4x4_Blocks_Wide[ MiSizes[ mvRow ][ mvCol ] ])
if ( Abs(deltaRow) > 1 )
len = Max(2, len)
if ( useStep16 )
len = Max(4, len)
weight = len * 2
add_ref_mv_candidate( mvRow, mvCol, isCompound, weight)
i += len
}
where the call to add_ref_mv_candidate invokes the process in Add reference motion vector process.
Scan col process
The inputs to this process are:

a variable deltaCol specifying (in units of 4x4 luma samples) how far left to look for motion vectors,

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
The variable bh4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].
The variable end4 specifying the last block to be scanned in vertical 4x4 luma samples is set equal to Min( Min( bh4, MiRows  MiRow ), 16 ).
The variable deltaRow is set equal to 0.
The variable useStep16 is set equal to (bh4 >= 16).
If Abs(deltaCol) is greater than 1, the offset is adjusted as follows:
deltaRow = 1  (MiRow & 1)
deltaCol += MiCol & 1
A series of motion vector locations is scanned as follows:
i = 0
while ( i < end4 ) {
mvRow = MiRow + deltaRow + i
mvCol = MiCol + deltaCol
if ( !is_inside(mvRow,mvCol) )
break
len = Min(bh4, Num_4x4_Blocks_High[ MiSizes[ mvRow ][ mvCol ] ])
if ( Abs(deltaCol) > 1 )
len = Max(2, len)
if ( useStep16 )
len = Max(4, len)
weight = len * 2
add_ref_mv_candidate( mvRow, mvCol, isCompound, weight )
i += len
}
where the call to add_ref_mv_candidate invokes the process in Add reference motion vector process.
Scan point process
The inputs to this process are:

a variable deltaRow specifying (in units of 4x4 luma samples) how far above to look for a motion vector,

a variable deltaCol specifying (in units of 4x4 luma samples) how far left to look for a motion vector,

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
The variable mvRow is set equal to MiRow + deltaRow.
The variable mvCol is set equal to MiCol + deltaCol.
The variable weight is set equal to 4.
If is_inside( mvRow, mvCol ) is equal to 1 and RefFrames[ mvRow ][ mvCol ][ 0 ] has been written for this frame (this checks that the candidate location has been decoded), the add reference motion vector process in Add reference motion vector process is invoked with mvRow, mvCol, isCompound, weight as inputs.
Temporal scan process
The input to this process is a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
This process scans the motion vectors in a previous frame looking for candidates which use the same reference frame.
The variable bw4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].
The variable bh4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].
The variable stepW4 is set equal to ( bw4 >= 16 ) ? 4 : 2.
The variable stepH4 is set equal to ( bh4 >= 16 ) ? 4 : 2.
The process scans the locations within the block as follows:
for ( deltaRow = 0; deltaRow < Min( bh4, 16 ) ; deltaRow += stepH4 ) {
for ( deltaCol = 0; deltaCol < Min( bw4, 16 ) ; deltaCol += stepW4 ) {
add_tpl_ref_mv( deltaRow, deltaCol, isCompound)
}
}
where the call to add_tpl_ref_mv invokes the temporal sample process in Temporal sample process.
The process then scans positions around the block (but still within the same superblock) as follows:
allowExtension = ((bh4 >= Num_4x4_Blocks_High[BLOCK_8X8]) &&
(bh4 < Num_4x4_Blocks_High[BLOCK_64X64]) &&
(bw4 >= Num_4x4_Blocks_Wide[BLOCK_8X8]) &&
(bw4 < Num_4x4_Blocks_Wide[BLOCK_64X64]))
if ( allowExtension ) {
for ( i = 0; i < 3; i++ ) {
deltaRow = tplSamplePos[ i ][ 0 ]
deltaCol = tplSamplePos[ i ][ 1 ]
if ( check_sb_border( deltaRow, deltaCol ) ) {
add_tpl_ref_mv( deltaRow, deltaCol, isCompound)
}
}
}
where tplSamplePos contains the offsets to search (in units of 4x4 luma samples) and is specified as:
tplSamplePos[3][2] = {
{ bh4, 2 }, { bh4, bw4 }, { bh4  2, bw4 }
}
and check_sb_border checks that the position is within the same 64x64 block as follows:
check_sb_border( deltaRow, deltaCol ) {
row = (MiRow & 15) + deltaRow
col = (MiCol & 15) + deltaCol
return ( row >= 0 && row < 16 && col >= 0 && col < 16 )
}
Temporal sample process
The inputs to this process are:

variables deltaRow and deltaCol specifying (in units of 4x4 luma samples) the offset to the candidate location,

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
This process looks up a motion vector from the motion field and adds it into the stack.
The variable mvRow is set equal to (MiRow + deltaRow)  1.
The variable mvCol is set equal to (MiCol + deltaCol)  1.
If is_inside( mvRow, mvCol ) is equal to 0, this process terminates immediately.
The variable x8 is set equal to mvCol >> 1.
The variable y8 is set equal to mvRow >> 1.
(x8 and y8 represent the position of the candidate in units of 8x8 luma samples.)
The process is specified as follows:
if ( deltaRow == 0 && deltaCol == 0 ) {
ZeroMvContext = 1
}
if ( !isCompound ) {
candMv = MotionFieldMvs[ RefFrame[ 0 ] ][ y8 ][ x8 ]
if ( candMv[ 0 ] == 1 << 15 )
return
lower_mv_precision( candMv )
if ( deltaRow == 0 && deltaCol == 0 ) {
if ( Abs( candMv[ 0 ]  GlobalMvs[ 0 ][ 0 ] ) >= 16 
Abs( candMv[ 1 ]  GlobalMvs[ 0 ][ 1 ] ) >= 16 )
ZeroMvContext = 1
else
ZeroMvContext = 0
}
for ( idx = 0; idx < NumMvFound; idx++ ) {
if ( candMv[ 0 ] == RefStackMv[ idx ][ 0 ][ 0 ] &&
candMv[ 1 ] == RefStackMv[ idx ][ 0 ][ 1 ] )
break
}
if ( idx < NumMvFound ) {
WeightStack[ idx ] += 2
} else if ( NumMvFound < MAX_REF_MV_STACK_SIZE ) {
RefStackMv[ NumMvFound ][ 0 ] = candMv
WeightStack[ NumMvFound ] = 2
NumMvFound += 1
}
} else {
candMv0 = MotionFieldMvs[ RefFrame[ 0 ] ][ y8 ][ x8 ]
if ( candMv0[ 0 ] == 1 << 15 )
return
candMv1 = MotionFieldMvs[ RefFrame[ 1 ] ][ y8 ][ x8 ]
if ( candMv1[ 0 ] == 1 << 15 )
return
lower_mv_precision( candMv0 )
lower_mv_precision( candMv1 )
if ( deltaRow == 0 && deltaCol == 0 ) {
if ( Abs( candMv0[ 0 ]  GlobalMvs[ 0 ][ 0 ] ) >= 16 
Abs( candMv0[ 1 ]  GlobalMvs[ 0 ][ 1 ] ) >= 16 
Abs( candMv1[ 0 ]  GlobalMvs[ 1 ][ 0 ] ) >= 16 
Abs( candMv1[ 1 ]  GlobalMvs[ 1 ][ 1 ] ) >= 16 )
ZeroMvContext = 1
else
ZeroMvContext = 0
}
for ( idx = 0; idx < NumMvFound; idx++ ) {
if ( candMv0[ 0 ] == RefStackMv[ idx ][ 0 ][ 0 ] &&
candMv0[ 1 ] == RefStackMv[ idx ][ 0 ][ 1 ] &&
candMv1[ 0 ] == RefStackMv[ idx ][ 1 ][ 0 ] &&
candMv1[ 1 ] == RefStackMv[ idx ][ 1 ][ 1 ] )
break
}
if ( idx < NumMvFound ) {
WeightStack[ idx ] += 2
} else if ( NumMvFound < MAX_REF_MV_STACK_SIZE ) {
RefStackMv[ NumMvFound ][ 0 ] = candMv0
RefStackMv[ NumMvFound ][ 1 ] = candMv1
WeightStack[ NumMvFound ] = 2
NumMvFound += 1
}
}
where the call to lower_mv_precision invokes the lower precision process specified in Lower precision process.
Add reference motion vector process
The inputs to this process are:

variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction,

a variable weight specifying the weight attached to this motion vector.
This process examines the candidate to find matching reference frames.
If IsInters[ mvRow ][ mvCol ] is equal to 0, this process terminates immediately.
If isCompound is equal to 0, the following applies for candList = 0..1:
 If RefFrames[ mvRow ][ mvCol ][ candList ] is equal to RefFrame[ 0 ], the search stack process in Search stack process is invoked with mvRow, mvCol, weight, and candList as inputs.
Otherwise (isCompound is equal to 1), the following applies:
 If RefFrames[ mvRow ][ mvCol ][ 0 ] is equal to RefFrame[ 0 ] and RefFrames[ mvRow ][ mvCol ][ 1 ] is equal to RefFrame[ 1 ], the compound search stack process in Compound search stack process is invoked with mvRow, mvCol, and weight as inputs.
Search stack process
The inputs to this process are:

variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

a variable candList specifying which list in the candidate matches our reference frame,

a variable weight proportional to the corresponding block width or height for the candidate motion vector.
This process searches the stack for an exact match with a candidate motion vector. If present, the weight of the candidate motion vector is added to the weight of its counterpart in the stack, otherwise the process adds a motion vector to the stack.
The variable candMode is set equal to YModes[ mvRow ][ mvCol ].
The variable candSize is set equal to MiSizes[ mvRow ][ mvCol ].
The variable large is set equal to ( Min( Block_Width[ candSize ],Block_Height[ candSize ] ) >= 8 ).
The candidate motion vector candMv is set as follows:

If ( candMode == GLOBALMV  candMode == GLOBAL_GLOBALMV) and ( GmType[ RefFrame[ 0 ] ] > TRANSLATION ) and ( large == 1 ), candMv is set equal to GlobalMvs[ 0 ].

Otherwise, candMv is set equal to Mvs[ mvRow ][ mvCol ][ candList ].
The lower precision process specified in Lower precision process is invoked with candMv.
If has_newmv( candMode ) is equal to 1, NewMvCount is set equal to NewMvCount + 1.
The variable FoundMatch is set equal to 1.
The process depends on whether the candidate motion vector is already in the stack as follows:

If candMv is already equal to RefStackMv[ idx ][ 0 ] for some idx less than NumMvFound, then WeightStack[ idx ] is increased by weight

Otherwise, if NumMvFound is less than MAX_REF_MV_STACK_SIZE, the following ordered steps apply:
a. RefStackMv[ NumMvFound ][ 0 ] is set equal to candMv
b. WeightStack[ NumMvFound ] is set equal to weight
c. NumMvFound is set equal to NumMvFound + 1.

Otherwise, (NumMvFound is greater than or equal to MAX_REF_MV_STACK_SIZE), the process has no effect.
Compound search stack process
The inputs to this process are:

variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

a variable weight proportional to the corresponding block width or height for the candidate pair of motion vectors.
This process searches the stack for an exact match with a candidate pair of motion vectors. If present, the weight of the candidate pair of motion vectors is added to the weight of its counterpart in the stack, otherwise the process adds the motion vectors to the stack.
The array candMvs (containing two motion vectors) is set equal to Mvs[ mvRow ][ mvCol ].
The variable candMode is set equal to YModes[ mvRow ][ mvCol ].
The variable candSize is set equal to MiSizes[ mvRow ][ mvCol ].
If candMode is equal to GLOBAL_GLOBALMV, for refList = 0..1 the following applies:
 If GmType[ RefFrame[ refList ] ] > TRANSLATION, candMvs[ refList ] is set equal to GlobalMvs[ refList ].
For i = 0..1, the lower precision process specified in Lower precision process is invoked with candMvs[ i ].
The variable FoundMatch is set equal to 1.
The process depends on whether the candidate motion vector pair is already in the stack as follows:

If candMvs[ 0 ] is equal to RefStackMv[ idx ][ 0 ] and candMvs[ 1 ] is equal to RefStackMv[ idx ][ 1 ] for some idx less than NumMvFound, then WeightStack[ idx ] is increased by weight

Otherwise, if NumMvFound is less than MAX_REF_MV_STACK_SIZE, the following ordered steps apply:
a. RefStackMv[ NumMvFound ][ i ] is set equal to candMvs[ i ] for i = 0..1
b. WeightStack[ NumMvFound ] is set equal to weight
c. NumMvFound is set equal to NumMvFound + 1.

Otherwise, (NumMvFound is greater than or equal to MAX_REF_MV_STACK_SIZE), the process has no effect.
If has_newmv( candMode ) is equal to 1, NewMvCount is set equal to NewMvCount + 1.
The function has_newmv is defined as:
has_newmv( mode ) {
return (mode == NEWMV 
mode == NEW_NEWMV 
mode == NEAR_NEWMV 
mode == NEW_NEARMV 
mode == NEAREST_NEWMV 
mode == NEW_NEARESTMV)
}
Note: It is impossible for mode to equal NEWMV in this function because it is only called for compound modes.
Lower precision process
The input to this process is a reference candMv to a motion vector array.
This process modifies the contents of the input motion vector to remove the least significant bit when high precision is not allowed, and all three fractional bits when force_integer_mv is equal to 1.
If allow_high_precision_mv is equal to 1, this process terminates immediately.
For i = 0..1, the following applies:
if ( force_integer_mv ) {
a = Abs( candMv[ i ] )
aInt = (a + 3) >> 3
if ( candMv[ i ] > 0 )
candMv[ i ] = aInt << 3
else
candMv[ i ] = ( aInt << 3 )
} else {
if ( candMv[ i ] & 1 ) {
if ( candMv[ i ] > 0 )
candMv[ i ]
else
candMv[ i ]++
}
}
Sorting process
The inputs to this process are:

a variable start representing the first position to be sorted,

a variable end representing the length of the array,

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
This process performs a stable sort of part of the stack of motion vectors according to the corresponding weight.
Entries in RefStackMv from start (inclusive) to end (exclusive) are sorted.
The sorting process is specified as:
while ( end > start ) {
newEnd = start
for ( idx = start + 1; idx < end; idx++ ) {
if ( WeightStack[ idx  1 ] < WeightStack[ idx ] ) {
swap_stack(idx  1, idx)
newEnd = idx
}
}
end = newEnd
}
When the function swap_stack is invoked, the entries at locations idx and idx  1 should be swapped in WeightStack and RefStackMv as follows:
swap_stack( i, j ) {
temp = WeightStack[ i ]
WeightStack[ i ] = WeightStack[ j ]
WeightStack[ j ] = temp
for ( list = 0; list < 1 + isCompound; list++ ) {
for ( comp = 0; comp < 2; comp++ ) {
temp = RefStackMv[ i ][ list ][ comp ]
RefStackMv[ i ][ list ][ comp ] = RefStackMv[ j ][ list ][ comp ]
RefStackMv[ j ][ list ][ comp ] = temp
}
}
}
Extra search process
The input to this process is a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
This process adds additional motion vectors to RefStackMv until it has 2 choices of motion vector by first searching the left and above neighbors for partially matching candidates, and second adding global motion candidates.
When doing single prediction, the motion vectors go straight on the stack.
When doing compound prediction, the motion vectors are added to arrays called RefIdMvs (counting matches from the same frame) and RefDiffMvs (counting matches from different frames).
The number of entries in these arrays are initialized to zero as follows:
for ( list = 0; list < 2; list++ ) {
RefIdCount[ list ] = 0
RefDiffCount[ list ] = 0
}
A two pass search for the partial matching candidates is specified as:
w4 = Min( 16, Num_4x4_Blocks_Wide[ MiSize ] )
h4 = Min( 16, Num_4x4_Blocks_High[ MiSize ] )
w4 = Min( w4, MiCols  MiCol )
h4 = Min( h4, MiRows  MiRow )
num4x4 = Min( w4, h4 )
for ( pass = 0; pass < 2; pass++ ) {
idx = 0
while ( idx < num4x4 && NumMvFound < 2 ) {
if ( pass == 0 ) {
mvRow = MiRow  1
mvCol = MiCol + idx
} else {
mvRow = MiRow + idx
mvCol = MiCol  1
}
if ( !is_inside( mvRow, mvCol ) )
break
add_extra_mv_candidate( mvRow, mvCol, isCompound )
if ( pass == 0 ) {
idx += Num_4x4_Blocks_Wide[ MiSizes[ mvRow ][ mvCol ] ]
} else {
idx += Num_4x4_Blocks_High[ MiSizes[ mvRow ][ mvCol ] ]
}
}
}
The first pass searches the row above, the second searches the column to the left.
The function call to add_extra_mv_candidate invokes the add extra mv candidate process specified in Add extra MV candidate process with mvRow, mvCol, isCompound as inputs.
If isCompound is equal to 1, the candidates in the RefIdMvs and RefDiffMvs arrays are added to the stack as follows (using the temporary array combinedMvs):
for ( list = 0; list < 2; list++ ) {
compCount = 0
for ( idx = 0; idx < RefIdCount[ list ]; idx++ ) {
combinedMvs[ compCount ][ list ] = RefIdMvs[ list ][ idx ]
compCount++
}
for ( idx = 0; idx < RefDiffCount[ list ] && compCount < 2; idx++ ) {
combinedMvs[ compCount ][ list ] = RefDiffMvs[ list ][ idx ]
compCount++
}
while ( compCount < 2 ) {
combinedMvs[ compCount ][ list ] = GlobalMvs[ list ]
compCount++
}
}
if ( NumMvFound == 1 ) {
if ( combinedMvs[ 0 ][ 0 ] == RefStackMv[ 0 ][ 0 ] &&
combinedMvs[ 0 ][ 1 ] == RefStackMv[ 0 ][ 1 ] ) {
RefStackMv[ NumMvFound ][ 0 ] = combinedMvs[ 1 ][ 0 ]
RefStackMv[ NumMvFound ][ 1 ] = combinedMvs[ 1 ][ 1 ]
} else {
RefStackMv[ NumMvFound ][ 0 ] = combinedMvs[ 0 ][ 0 ]
RefStackMv[ NumMvFound ][ 1 ] = combinedMvs[ 0 ][ 1 ]
}
WeightStack[ NumMvFound ] = 2
NumMvFound++
} else {
for ( idx = 0; idx < 2; idx++ ) {
RefStackMv[ NumMvFound ][ 0 ] = combinedMvs[ idx ][ 0 ]
RefStackMv[ NumMvFound ][ 1 ] = combinedMvs[ idx ][ 1 ]
WeightStack[ NumMvFound ] = 2
NumMvFound++
}
}
If isCompound is equal to 0, the candidates have already been added to RefStackMv, and this process simply extends with global motion candidates as follows:
for ( idx = NumMvFound; idx < 2; idx++ ) {
RefStackMv[ idx ][ 0 ] = GlobalMvs[ 0 ]
}
Note: For single prediction, NumMvFound is not incremented by the addition of global motion candidates, whereas for compound prediction NumMvFound will always be greater or equal to 2 by this point.
Add extra MV candidate process
The inputs to this process are:

variables mvRow and mvCol specifying (in units of 4x4 luma samples) the candidate location,

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction.
This process may modify the contents of the global variables RefIdMvs, RefIdCount, RefDiffMvs, RefDiffCount, RefStackMv, WeightStack, and NumMvFound.
This process examines the candidate location to find possible motion vectors as follows:
if ( isCompound ) {
for ( candList = 0; candList < 2; candList++ ) {
candRef = RefFrames[ mvRow ][ mvCol ][ candList ]
if ( candRef > INTRA_FRAME ) {
for ( list = 0; list < 2; list++ ) {
candMv = Mvs[ mvRow ][ mvCol ][ candList ]
if ( candRef == RefFrame[ list ] && RefIdCount[ list ] < 2 ) {
RefIdMvs[ list ][ RefIdCount[ list ] ] = candMv
RefIdCount[ list ]++
} else if ( RefDiffCount[ list ] < 2 ) {
if ( RefFrameSignBias[ candRef ] != RefFrameSignBias[ RefFrame[list] ] ) {
candMv[ 0 ] *= 1
candMv[ 1 ] *= 1
}
RefDiffMvs[ list ][ RefDiffCount[ list ] ] = candMv
RefDiffCount[ list ]++
}
}
}
}
} else {
for ( candList = 0; candList < 2; candList++ ) {
candRef = RefFrames[ mvRow ][ mvCol ][ candList ]
if ( candRef > INTRA_FRAME ) {
candMv = Mvs[ mvRow ][ mvCol ][ candList ]
if ( RefFrameSignBias[ candRef ] != RefFrameSignBias[ RefFrame[ 0 ] ] ) {
candMv[ 0 ] *= 1
candMv[ 1 ] *= 1
}
for ( idx = 0; idx < NumMvFound; idx++ ) {
if ( candMv == RefStackMv[ idx ][ 0 ] )
break
}
if ( idx == NumMvFound ) {
RefStackMv[ idx ][ 0 ] = candMv
WeightStack[ idx ] = 2
NumMvFound++
}
}
}
}
Context and clamping process
The inputs to this process are:

a variable isCompound containing 0 for single prediction, or 1 to signal compound prediction,

a variable numNew specifying the number of NEWMV candidates found in the immediate neighborhood.
This process computes contexts to be used when decoding syntax elements, and clamps the candidates in RefStackMv.
The variable bw (representing the width of the block in units of luma samples) is set equal to Block_Width[ MiSize ].
The variable bh (representing the height of the block in units of luma samples) is set equal to Block_Height[ MiSize ].
Note: It only matters whether numNew is zero or nonzero because the value is clipped at 1 when it is used. Implementations may therefore choose to implement numNew and NewMvCount as a boolean instead of a counter.
The variable numLists specifying the number of reference frames used for this block is set equal to ( isCompound ? 2 : 1 ).
The array DrlCtxStack is set as follows:
for ( idx = 0; idx < NumMvFound ; idx++ ) {
z = 0
if ( idx + 1 < NumMvFound ) {
w0 = WeightStack[ idx ]
w1 = WeightStack[ idx + 1 ]
if ( w0 >= REF_CAT_LEVEL ) {
if ( w1 < REF_CAT_LEVEL ) {
z = 1
}
} else {
z = 2
}
}
DrlCtxStack[ idx ] = z
}
The motion vectors are clamped as follows:
for ( list = 0; list < numLists; list++ ) {
for ( idx = 0; idx < NumMvFound ; idx++ ) {
refMv = RefStackMv[ idx ][ list ]
refMv[ 0 ] = clamp_mv_row( refMv[ 0 ], MV_BORDER + bh * 8)
refMv[ 1 ] = clamp_mv_col( refMv[ 1 ], MV_BORDER + bw * 8)
RefStackMv[ idx ][ list ] = refMv
}
}
The variables RefMvContext and NewMvContext are set as follows:
if ( CloseMatches == 0 ) {
NewMvContext = Min( TotalMatches, 1 ) // 0,1
RefMvContext = TotalMatches
} else if ( CloseMatches == 1 ) {
NewMvContext = 3  Min( numNew, 1 ) // 2,3
RefMvContext = 2 + TotalMatches
} else {
NewMvContext = 5  Min( numNew, 1 ) // 4,5
RefMvContext = 5
}
Has overlappable candidates process
This process is triggered by a call to has_overlappable_candidates.
It returns 1 to indicate that the block has neighbors suitable for use by overlapped motion compensation, or 0 otherwise.
The process looks to see if there are any inter blocks to the left or above.
The check is only made at 8x8 granularity.
The process is specified as:
has_overlappable_candidates( ) {
if ( AvailU ) {
w4 = Num_4x4_Blocks_Wide[ MiSize ]
for ( x4 = MiCol; x4 < Min( MiCols, MiCol + w4 ); x4 += 2 ) {
if ( RefFrames[ MiRow  1 ][ x4  1 ][ 0 ] > INTRA_FRAME )
return 1
}
}
if ( AvailL ) {
h4 = Num_4x4_Blocks_High[ MiSize ]
for ( y4 = MiRow; y4 < Min( MiRows, MiRow + h4 ); y4 += 2 ) {
if ( RefFrames[ y4  1 ][ MiCol  1 ][ 0 ] > INTRA_FRAME )
return 1
}
}
return 0
}
Find warp samples process
General
This process is triggered when the find_warp_samples function is invoked.
The process examines the neighboring inter predicted blocks and estimates a local warp transformation based on the motion vectors.
The process produces a variable NumSamples containing the number of valid candidates found, and an array CandList containing sorted candidates.
The variables NumSamples and NumSamplesScanned are both set equal to 0.
Note: NumSamplesScanned counts the number of distinct candidates found by the add sample process  even if the motion vectors are too large. NumSamples counts the number of distinct valid candidates found by the add sample process (i.e. only counting cases where the motion vector is small enough to be considered valid). As a special case, if no small motion vectors are found, then the process returns the first large motion vector found (by setting NumSamples to 1).
The variable w4 specifying the width of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_Wide[ MiSize ].
The variable h4 specifying the height of the block in 4x4 luma samples is set equal to Num_4x4_Blocks_High[ MiSize ].
The process is specified as:
doTopLeft = 1
doTopRight = 1
if ( AvailU ) {
srcSize = MiSizes[ MiRow  1 ][ MiCol ]
srcW = Num_4x4_Blocks_Wide[ srcSize ]
if ( w4 <= srcW ) {
colOffset = (MiCol & (srcW  1))
if ( colOffset < 0 )
doTopLeft = 0
if ( colOffset + srcW > w4 )
doTopRight = 0
add_sample( 1, 0 )
} else {
for ( i = 0; i < Min( w4, MiCols  MiCol ); i += miStep ) {
srcSize = MiSizes[ MiRow  1 ][ MiCol + i ]
srcW = Num_4x4_Blocks_Wide[ srcSize ]
miStep = Min(w4, srcW)
add_sample( 1, i )
}
}
}
if ( AvailL ) {
srcSize = MiSizes[ MiRow ][ MiCol  1 ]
srcH = Num_4x4_Blocks_High[ srcSize ]
if ( h4 <= srcH ) {
rowOffset = (MiRow & (srcH  1))
if ( rowOffset < 0 )
doTopLeft = 0
add_sample( 0, 1 )
} else {
for ( i = 0; i < Min( h4, MiRows  MiRow); i += miStep ) {
srcSize = MiSizes[ MiRow + i ][ MiCol  1 ]
srcH = Num_4x4_Blocks_High[ srcSize ]
miStep = Min(h4, srcH)
add_sample( i, 1 )
}
}
}
if ( doTopLeft ) {
add_sample( 1, 1 )
}
if ( doTopRight ) {
if ( Max( w4, h4 ) <= 16 ) {
add_sample( 1, w4 )
}
}
if ( NumSamples == 0 && NumSamplesScanned > 0 )
NumSamples = 1
where the call to add_sample specifies that the add sample process in Add sample process should be invoked.
Add sample process
The inputs to this process are:

a variable deltaRow specifying (in units of 4x4 luma samples) how far above to look for a motion vector,

a variable deltaCol specifying (in units of 4x4 luma samples) how far left to look for a motion vector.
The output of this process is to add a new sample to the list of candidates if it is a valid candidate and has not been seen before.
If NumSamplesScanned is greater than or equal to LEAST_SQUARES_SAMPLES_MAX, this process immediately exits.
The variable mvRow is set equal to MiRow + deltaRow.
The variable mvCol is set equal to MiCol + deltaCol.
If is_inside( mvRow, mvCol ) is equal to 0, then this process immediately returns.
If RefFrames[ mvRow ][ mvCol ][ 0 ] has not been written for this frame, then this process immediately returns.
If RefFrames[ mvRow ][ mvCol ][ 0 ] is not equal to RefFrame[ 0 ], then this process immediately returns.
If RefFrames[ mvRow ][ mvCol ][ 1 ] is not equal to NONE, then this process immediately returns.
The variable candSz is set equal to MiSizes[ mvRow ][ mvCol ].
The variable candW4 is set equal to Num_4x4_Blocks_Wide[ candSz ].
The variable candH4 is set equal to Num_4x4_Blocks_High[ candSz ].
The variable candRow is set equal to mvRow & ~(candH4  1).
The variable candCol is set equal to mvCol & ~(candW4  1).
The variable midY is set equal to candRow * 4 + candH4 * 2  1.
The variable midX is set equal to candCol * 4 + candW4 * 2  1.
The variable threshold is set equal to Clip3( 16, 112, Max( Block_Width[ MiSize ], Block_Height[ MiSize ] ) ).
The variable mvDiffRow is set equal to Abs( Mvs[ candRow ][ candCol ][ 0 ][ 0 ]  Mv[ 0 ][ 0 ] ).
The variable mvDiffCol is set equal to Abs( Mvs[ candRow ][ candCol ][ 0 ][ 1 ]  Mv[ 0 ][ 1 ] ).
The variable valid is set equal to ( ( mvDiffRow + mvDiffCol ) <= threshold ).
Note: candRow and candCol give the topleft position of the candidate block in units of 4x4 blocks. midX and midY give the central position of the candidate block in units of luma samples.
A candidate array (representing source and destination locations in units of 1/8 luma samples) is specified as:
cand[ 0 ] = midY * 8
cand[ 1 ] = midX * 8
cand[ 2 ] = midY * 8 + Mvs[ candRow ][ candCol ][ 0 ][ 0 ]
cand[ 3 ] = midX * 8 + Mvs[ candRow ][ candCol ][ 0 ][ 1 ]
The following ordered steps apply:

NumSamplesScanned is increased by 1.

If valid is equal to 0 and NumSamplesScanned is greater than 1, the process exits.

CandList[ NumSamples ][ j ] is set equal to cand[ j ] for j=0..3.

If valid is equal to 1, NumSamples is increased by 1.
Prediction processes
General
The following sections define the processes used for predicting the sample values.
These processes are triggered at points defined by function calls to predict_intra, predict_inter, predict_chroma_from_luma, and predict_palette in the residual syntax table described in Residual syntax.
Intra prediction process
General
The intra prediction process is invoked for intra coded blocks to predict a part of the block corresponding to a transform block. When the transform size is smaller than the block size, this process can be invoked multiple times within a single block for the same plane, and the invocations are in raster order within the block.
This process is triggered by a call to predict_intra.
The inputs to this process are:

a variable plane specifying which plane is being predicted,

variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the current transform block,

a variable haveLeft that is equal to 1 if there are valid samples to the left of this transform block,

a variable haveAbove that is equal to 1 if there are valid samples above this transform block,

a variable haveAboveRight that is equal to 1 if there are valid samples above the transform block to the right of this transform block,

a variable haveBelowLeft that is equal to 1 if there are valid samples to the left of the transform block below this transform block,

a variable mode specifying the type of intra prediction to apply,

a variable log2W specifying the base 2 logarithm of the width of the region to be predicted,

a variable log2H specifying the base 2 logarithm of the height of the region to be predicted.
The process makes use of the already reconstructed samples in the current frame CurrFrame to form a prediction for the current block.
The outputs of this process are intra predicted samples in the current frame CurrFrame.
The variable w is set equal to 1 << log2W.
The variable h is set equal to 1 << log2H.
The variable maxX is set equal to ( MiCols * MI_SIZE )  1.
The variable maxY is set equal to ( MiRows * MI_SIZE )  1.
If plane is greater than 0, then:

maxX is set equal to ( ( MiCols * MI_SIZE ) >> subsampling_x )  1.

maxY is set equal to ( ( MiRows * MI_SIZE ) >> subsampling_y )  1.
The array AboveRow[ i ] for i = 0..w + h  1 is derived as follows:

If haveAbove is equal to 0 and haveLeft is equal to 1, AboveRow[ i ] is set equal to CurrFrame[ plane ][ y ][ x  1 ].

Otherwise, if haveAbove is equal to 0 and haveLeft is equal to 0, AboveRow[ i ] is set equal to ( 1 << ( BitDepth  1 ) )  1.

Otherwise, the following applies:

The variable aboveLimit is set equal to Min( maxX, x + ( haveAboveRight ? 2 * w : w )  1 ).

AboveRow[ i ] is set equal to CurrFrame[ plane ][ y1 ][ Min(aboveLimit, x+i) ].

The array LeftCol[ i ] for i = 0..w + h  1 is derived as follows:

If haveLeft is equal to 0 and haveAbove is equal to 1, LeftCol[ i ] is set equal to CurrFrame[ plane ][ y  1 ][ x ].

Otherwise, if haveLeft is equal to 0 and haveAbove is equal to 0, LeftCol[ i ] is set equal to ( 1 << ( BitDepth  1 ) ) + 1.

Otherwise, the following applies:

The variable leftLimit is set equal to Min( maxY, y + ( haveBelowLeft ? 2 * h : h )  1 ).

LeftCol[ i ] is set equal to CurrFrame[ plane ][ Min(leftLimit, y+i) ][ x1 ].

The array AboveRow[ i ] for i = 1 is specified by:

If haveAbove is equal to 1 and haveLeft is equal to 1, AboveRow[ 1 ] is set equal to CurrFrame[ plane ][ y1 ][ x1 ].

Otherwise if haveAbove is equal to 1, AboveRow[ 1 ] is set equal to CurrFrame [ plane ][ y  1 ][ x ].

Otherwise if haveLeft is equal to 1, AboveRow[ 1 ] is set equal to CurrFrame [ plane ][ y ][ x  1 ].

Otherwise, AboveRow[ 1 ] is set equal to 1 << ( BitDepth  1 ).
The array LeftCol[ i ] for i = 1 is set equal to AboveRow[ 1 ].
A 2D array named pred containing the intra predicted samples is constructed as follows:

If plane is equal to 0 and use_filter_intra is true, the recursive intra prediction process specified in Recursive intra prediction process is invoked with w and h as inputs, and the output is assigned to pred.

Otherwise, if is_directional_mode( mode ) is true, the directional intra prediction process specified in Directional intra prediction process is invoked with plane, x, y, haveLeft, haveAbove, mode, w, h, maxX, maxY as inputs and the output is assigned to pred.

Otherwise if mode is equal to SMOOTH_PRED or SMOOTH_V_PRED or SMOOTH_H_PRED, the smooth intra prediction process specified in Smooth intra prediction process is invoked with mode, log2W, log2H, w, and h as inputs, and the output is assigned to pred.

Otherwise if mode is equal to DC_PRED, the DC intra prediction process specified in DC intra prediction process is invoked with haveLeft, haveAbove, log2W, log2H, w, and h as inputs and the output is assigned to pred.

Otherwise (mode is equal to PAETH_PRED), the basic intra prediction process specified in Basic intra prediction process is invoked with mode, w, and h as inputs, and the output is assigned to pred.
The current frame is updated as follows:
 CurrFrame[ plane ][ y + i ][ x + j ] is set equal to pred[ i ][ j ] for i = 0..h1 and j = 0..w1.
Basic intra prediction process
The inputs to this process are:

a variable w specifying the width of the region to be predicted,

a variable h specifying the height of the region to be predicted.
The output of this process is a 2D array named pred containing the intra predicted samples.
The process generates filtered samples from the samples in LeftCol and AboveRow as follows:

The following ordered steps apply for i = 0..h1, for j = 0..w1:

The variable base is set equal to AboveRow[ j ] + LeftCol[ i ]  AboveRow[ 1 ].

The variable pLeft is set equal to Abs( base  LeftCol[ i ]).

The variable pTop is set equal to Abs( base  AboveRow[ j ]).

The variable pTopLeft is set equal to Abs( base  AboveRow[ 1 ] ).

If pLeft <= pTop and pLeft <= pTopLeft, pred[ i ][ j ] is set equal to LeftCol[ i ].

Otherwise, if pTop <= pTopLeft, pred[ i ][ j ] is set equal to AboveRow[ j ].

Otherwise, pred[ i ][ j ] is set equal to AboveRow[ 1 ].

The output of the process is the array pred.
Recursive intra prediction process
The inputs to this process are:

a variable w specifying the width of the region to be predicted,

a variable h specifying the height of the region to be predicted.
The output of this process is a 2D array named pred containing the intra predicted samples.
For each block of 4x2 samples, this process first prepares an array p of 7 neighboring samples, and then produces the output block by filtering this array.
The variable w4 is set equal to w >> 2.
The variable h2 is set equal to h >> 1.
The following steps apply for i2 = 0..h21, for j4 = 0..w41:

The array p is derived as follows for i = 0..6:

If i is less than 5, p[ i ] is derived as follows:

If i2 is equal to 0, p[ i ] is set equal to AboveRow[ ( j4 << 2 ) + i  1 ].

Otherwise, if j4 is equal to 0 and i is equal to 0, p[ i ] is set equal to LeftCol[ ( i2 << 1 )  1 ].

Otherwise, p[ i ] is set equal to pred[ ( i2 << 1 )  1 ][ ( j4 << 2 ) + i  1 ].


Otherwise (i is greater than or equal to 5), p[ i ] is derived as follows:

If j4 is equal to 0, p[ i ] is set equal to LeftCol[ ( i2 << 1 ) + i  5 ].

Otherwise (j4 is not equal to 0), p[ i ] is set equal to pred[ ( i2 << 1 ) + i  5 ][ ( j4 << 2 )  1 ].



The following steps apply for i1 = 0..1, j1 = 0..3:

The variable pr is set equal to 0.

The variable pr is incremented by Intra_Filter_Taps[ filter_intra_mode ][ ( i1 << 2 ) + j1 ][ i ] * p[ i ] for i = 0..6.

pred[ ( i2 << 1 ) + i1 ][ ( j4 << 2 ) + j1 ] is set equal to Clip1( Round2Signed( pr, INTRA_FILTER_SCALE_BITS ) ).

The output of the process is the array pred.
Directional intra prediction process
The inputs to this process are:

a variable plane specifying which plane is being predicted,

variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the current transform block,

a variable haveLeft that is equal to 1 if there are valid samples to the left of this transform block,

a variable haveAbove that is equal to 1 if there are valid samples above this transform block,

a variable mode specifying the type of intra prediction to apply,

a variable w specifying the width of the region to be predicted,

a variable h specifying the height of the region to be predicted,

a variable maxX specifying the largest valid x coordinate for the current plane,

a variable maxY specifying the largest valid y coordinate for the current plane.
The output of this process is a 2D array named pred containing the intra predicted samples.
The process uses a directional filter to generate filtered samples from the samples in LeftCol and AboveRow.
The following ordered steps apply:

The variable angleDelta is derived as follows:

If plane is equal to 0, angleDelta is set equal to AngleDeltaY.

Otherwise (plane is not equal to 0), angleDelta is set equal to AngleDeltaUV.


The variable pAngle is set equal to ( Mode_To_Angle[ mode ] + angleDelta * ANGLE_STEP ).

The variables upsampleAbove and upsampleLeft are set equal to 0.

If enable_intra_edge_filter is equal to 1, the following applies:

If pAngle is not equal to 90 and pAngle is not equal to 180, the following applies:

If (pAngle > 90) and (pAngle < 180) and (w + h) >= 24), the filter corner process specified in Filter corner process is invoked and the output assigned to both LeftCol[ 1 ] and AboveRow[ 1 ].

The intra filter type process specified in Intra filter type process is invoked with the input variable plane and the output assigned to filterType.

If haveAbove is equal to 1, the following steps apply:

The intra edge filter strength selection process specified in Intra edge filter strength selection process is invoked with w, h, filterType, and pAngle  90 as inputs, and the output assigned to the variable strength.

The variable numPx is set equal to Min( w, ( maxX  x + 1 ) ) + ( pAngle < 90 ? h : 0 ) + 1.

The intra edge filter process specified in Intra edge filter process is invoked with the parameters numPx, strength, and 0 as inputs.


If haveLeft is equal to 1, the following steps apply:

The intra edge filter strength selection process specified in Intra edge filter strength selection process is invoked with w, h, filterType, and pAngle  180 as inputs, and the output assigned to the variable strength.

The variable numPx is set equal to Min( h, ( maxY  y + 1 ) ) + ( pAngle > 180 ? w : 0 ) + 1.

The intra edge filter process specified in Intra edge filter process is invoked with the parameters numPx, strength, and 1 as inputs.



The intra edge upsample selection process specified in Intra edge upsample selection process is invoked with w, h, filterType, and pAngle  90 as inputs, and the output assigned to the variable upsampleAbove.

The variable numPx is set equal to ( w + (pAngle < 90 ? h : 0) ).

If upsampleAbove is equal to 1, the intra edge upsample process specified in Intra edge upsample process is invoked with the parameters numPx and 0 as inputs.

The intra edge upsample selection process specified in Intra edge upsample selection process is invoked with w, h, filterType, and pAngle  180 as inputs, and the output assigned to the variable upsampleLeft.

The variable numPx is set equal to ( h + (pAngle > 180 ? w : 0) ).

If upsampleLeft is equal to 1, the intra edge upsample process specified in Intra edge upsample process is invoked with the parameters numPx and 1 as inputs.


The variable dx is derived as follows:

If pAngle is less than 90, dx is set equal to Dr_Intra_Derivative[ pAngle ].

Otherwise, if pAngle is greater than 90 and less than 180, dx is set equal to Dr_Intra_Derivative[ 180  pAngle ].

Otherwise, dx is undefined.


The variable dy is derived as follows:

If pAngle is greater than 90 and less than 180, dy is set equal to Dr_Intra_Derivative[ pAngle  90 ].

Otherwise, if pAngle is greater than 180, dy is set equal to Dr_Intra_Derivative[ 270  pAngle ].

Otherwise, dy is undefined.


If pAngle is less than 90, the following steps apply for i = 0..h1, for j = 0..w1:

The variable idx is set equal to ( i + 1 ) * dx.

The variable base is set equal to (idx >> ( 6  upsampleAbove ) ) + (j << upsampleAbove).

The variable shift is set equal to ( (idx << upsampleAbove) >> 1 ) & 0x1F.

The variable maxBaseX is set equal to (w + h  1) << upsampleAbove.

If base is less than maxBaseX, pred[ i ][ j ] is set equal to Round2( AboveRow[ base ] * ( 32  shift ) + AboveRow[ base + 1 ] * shift, 5 ).

Otherwise (base is greater than or equal to maxBaseX), pred[ i ][ j ] is set equal to AboveRow[ maxBaseX ].


Otherwise, if pAngle is greater than 90 and pAngle is less than 180, the following steps apply for i = 0..h1, for j = 0..w1:

The variable idx is set equal to ( j << 6 )  ( i + 1 ) * dx.

The variable base is set equal to idx >> ( 6  upsampleAbove ).

If base is greater than or equal to (1 << upsampleAbove), the following steps apply:

The variable shift is set equal to ( ( idx << upsampleAbove ) >> 1 ) & 0x1F.

pred[ i ][ j ] is set equal to Round2( AboveRow[ base ] * ( 32  shift ) + AboveRow[ base + 1 ] * shift, 5 ).


Otherwise (base is less than (1 << upsampleAbove), the following steps apply:

The variable idx is set equal to ( i << 6 )  ( j + 1 ) * dy.

The variable base is set equal to idx >> ( 6  upsampleLeft ).

The variable shift is set equal to ( ( idx << upsampleLeft ) >> 1 ) & 0x1F.

pred[ i ][ j ] is set equal to Round2( LeftCol[ base ] * ( 32  shift ) + LeftCol[ base + 1 ] * shift, 5 ).



Otherwise, if pAngle is greater than 180, the following steps apply for i = 0..h1, for j = 0..w1:

The variable idx is set equal to ( j + 1 ) * dy.

The variable base is set equal to ( idx >> ( 6  upsampleLeft ) ) + ( i << upsampleLeft ).

The variable shift is set equal to ( ( idx << upsampleLeft ) >> 1 ) & 0x1F.

pred[ i ][ j ] is set equal to Round2( LeftCol[ base ] * ( 32  shift ) + LeftCol[ base + 1 ] * shift, 5 ).


Otherwise, if pAngle is equal to 90, pred[ i ][ j ] is set equal to AboveRow[ j ] with j = 0..w1 and i = 0..h1 (each row of the block is filled with a copy of AboveRow).

Otherwise, if pAngle is equal to 180, pred[ i ][ j ] is set equal to LeftCol[ i ] with j = 0..w1 and i = 0..h1 (each column of the block is filled with a copy of LeftCol).
The output of the process is the array pred.
DC intra prediction process
The inputs to this process are:

a variable haveLeft that is equal to 1 if there are valid samples to the left of this transform block,

a variable haveAbove that is equal to 1 if there are valid samples above this transform block,

a variable log2W specifying the base 2 logarithm of the width of the region to be predicted,

a variable log2H specifying the base 2 logarithm of the height of the region to be predicted,

a variable w specifying the width of the region to be predicted,

a variable h specifying the height of the region to be predicted.
The output of this process is a 2D array named pred containing the intra predicted samples.
The process averages the available edge samples in LeftCol and AboveRow to generate the prediction as follows:

If haveLeft is equal to 1 and haveAbove is equal to 1, pred[ i ][ j ] is set equal to avg with i = 0..h1 and j = 0..w1. The variable avg (the average of the samples in union of AboveRow and LeftCol) is specified as follows:
sum = 0 for ( k = 0; k < h; k++ ) sum += LeftCol[ k ] for ( k = 0; k < w; k++ ) sum += AboveRow[ k ] sum += ( w + h ) >> 1 avg = sum / ( w + h )
Note: The reference code shows how the division by (w+h) can be implemented with multiplication and shift operations.

Otherwise if haveLeft is equal to 1 and haveAbove is equal to 0, pred[ i ][ j ] is set equal to leftAvg with i = 0..h1 and j = 0..w1. The variable leftAvg is specified as follows:
sum = 0 for ( k = 0; k < h; k++ ) { sum += LeftCol[ k ] } leftAvg = Clip1( ( sum + ( h >> 1 ) ) >> log2H )

Otherwise if haveLeft is equal to 0 and haveAbove is equal to 1, pred[ i ][ j ] is set equal to aboveAvg with i = 0..h1 and j = 0..w1. The variable aboveAvg is specified as follows:
sum = 0 for ( k = 0; k < w; k++ ) { sum += AboveRow[ k ] } aboveAvg = Clip1( ( sum + ( w >> 1 ) ) >> log2W )

Otherwise (haveLeft is equal to 0 and haveAbove is equal to 0), pred[ i ][ j ] is set equal to 1 << ( BitDepth  1 ) with i = 0..h1 and j = 0..w1.
The output of the process is the array pred.
Smooth intra prediction process
The inputs to this process are:

a variable mode specifying the type of intra prediction to apply,

a variable log2W specifying the base 2 logarithm of the width of the region to be predicted,

a variable log2H specifying the base 2 logarithm of the height of the region to be predicted,

a variable w specifying the width of the region to be predicted,

a variable h specifying the height of the region to be predicted.
The output of this process is a 2D array named pred containing the intra predicted samples.
The process uses linear interpolation to generate filtered samples from the samples in LeftCol and AboveRow as follows:

If mode is equal to SMOOTH_PRED, the following ordered steps apply for i = 0..h1, for j = 0..w1:

The array smWeightsX is set dependent on the value of log2W according to the following table:
log2W smWeightsX 2 Sm_Weights_Tx_4x4 3 Sm_Weights_Tx_8x8 4 Sm_Weights_Tx_16x16 5 Sm_Weights_Tx_32x32 6 Sm_Weights_Tx_64x64 
The array smWeightsY is set dependent on the value of log2H according to the following table:
log2H smWeightsY 2 Sm_Weights_Tx_4x4 3 Sm_Weights_Tx_8x8 4 Sm_Weights_Tx_16x16 5 Sm_Weights_Tx_32x32 6 Sm_Weights_Tx_64x64 
The variable smoothPred is set as follows:
smoothPred = smWeightsY[ i ] * AboveRow[ j ] + ( 256  smWeightsY[ i ] ) * LeftCol[ h  1 ] + smWeightsX[ j ] * LeftCol[ i ] + ( 256  smWeightsX[ j ] ) * AboveRow[ w  1 ]

pred[ i ][ j ] is set equal to Round2( smoothPred, 9 ).


Otherwise if mode is equal to SMOOTH_V_PRED, the following ordered steps apply for i = 0..h1, for j = 0..w1:

The array smWeights is set dependent on the value of log2H according to the following table:
log2H smWeights 2 Sm_Weights_Tx_4x4 3 Sm_Weights_Tx_8x8 4 Sm_Weights_Tx_16x16 5 Sm_Weights_Tx_32x32 6 Sm_Weights_Tx_64x64 
The variable smoothPred is set as follows:
smoothPred = smWeights[ i ] * AboveRow[ j ] + ( 256  smWeights[ i ] ) * LeftCol[ h  1 ]

pred[ i ][ j ] is set equal to Round2( smoothPred, 8 ).


Otherwise (mode is equal to SMOOTH_H_PRED), the following ordered steps apply for i = 0..h1, for j = 0..w1:

The array smWeights is set dependent on the value of log2W according to the following table:
log2W smWeights 2 Sm_Weights_Tx_4x4 3 Sm_Weights_Tx_8x8 4 Sm_Weights_Tx_16x16 5 Sm_Weights_Tx_32x32 6 Sm_Weights_Tx_64x64 
The variable smoothPred is set as follows:
smoothPred = smWeights[ j ] * LeftCol[ i ] + ( 256  smWeights[ j ] ) * AboveRow[ w  1 ]

pred[ i ][ j ] is set equal to Round2( smoothPred, 8 ).

The output of the process is the array pred.
Filter corner process
This process uses a three tap filter to compute the value to be used for the topleft corner.
The variable s is set equal to LeftCol[ 0 ] * 5 + AboveRow[ 1 ] * 6 + AboveRow[ 0 ] * 5.
The output of this process is Round2(s, 4).
Intra filter type process
The input to this process is a variable plane specifying the color plane being processed.
The output of this process is a variable filterType that is set to 1 if either the block above or to the left uses a smooth prediction mode.
The process is specified as follows:
get_filter_type( plane ) {
aboveSmooth = 0
leftSmooth = 0
if ( ( plane == 0 ) ? AvailU : AvailUChroma ) {
r = MiRow  1
c = MiCol
if ( plane > 0 ) {
if ( subsampling_x && !( MiCol & 1 ) )
c++
if ( subsampling_y && ( MiRow & 1 ) )
r
}
aboveSmooth = is_smooth( r, c, plane )
}
if ( ( plane == 0 ) ? AvailL : AvailLChroma ) {
r = MiRow
c = MiCol  1
if ( plane > 0 ) {
if ( subsampling_x && ( MiCol & 1 ) )
c
if ( subsampling_y && !( MiRow & 1 ) )
r++
}
leftSmooth = is_smooth( r, c, plane )
}
return aboveSmooth  leftSmooth
}
where the function is_smooth indicates if a prediction mode is one of the smooth intra modes and is specified as:
is_smooth( row, col, plane ) {
if ( plane == 0 ) {
mode = YModes[ row ][ col ]
} else {
if ( RefFrames[ row ][ col ][ 0 ] > INTRA_FRAME )
return 0
mode = UVModes[ row ][ col ]
}
return (mode == SMOOTH_PRED  mode == SMOOTH_V_PRED  mode == SMOOTH_H_PRED)
}
Intra edge filter strength selection process
The inputs to this process are:

a variable w containing the width of the transform in samples,

a variable h containing the height of the transform in samples,

a variable filterType that is 0 or 1 that controls the strength of filtering,

a variable delta containing an angle difference in degrees.
The output is an intra edge filter strength from 0 to 3 inclusive.
The variable d is set equal to Abs( delta ).
The variable blkWh (containing the sum of the dimensions) is set equal to w + h.
The output variable strength is specified as follows:
strength = 0
if ( filterType == 0 ) {
if ( blkWh <= 8 ) {
if ( d >= 56 ) strength = 1
} else if ( blkWh <= 12 ) {
if ( d >= 40 ) strength = 1
} else if ( blkWh <= 16 ) {
if ( d >= 40 ) strength = 1
} else if ( blkWh <= 24 ) {
if ( d >= 8 ) strength = 1
if ( d >= 16 ) strength = 2
if ( d >= 32 ) strength = 3
} else if ( blkWh <= 32 ) {
strength = 1
if ( d >= 4 ) strength = 2
if ( d >= 32 ) strength = 3
} else {
strength = 3
}
} else {
if ( blkWh <= 8 ) {
if ( d >= 40 ) strength = 1
if ( d >= 64 ) strength = 2
} else if ( blkWh <= 16 ) {
if ( d >= 20 ) strength = 1
if ( d >= 48 ) strength = 2
} else if ( blkWh <= 24 ) {
if ( d >= 4 ) strength = 3
} else {
strength = 3
}
}
Intra edge upsample selection process
The inputs to this process are:

a variable w containing the width of the transform in samples,

a variable h containing the height of the transform in samples,

a variable filterType that is 0 or 1 that controls the strength of filtering,

a variable delta containing an angle difference in degrees.
The output is a flag useUpsample that is true if upsampling should be applied to the edge.
The variable d is set equal to Abs( delta ).
The variable blkWh (containing the sum of the dimensions) is set equal to w + h.
The output variable useUpsample is specified as follows:
if ( d <= 0  d >= 40 ) {
useUpsample = 0
} else if ( filterType == 0 ) {
useUpsample = (blkWh <= 16)
} else {
useUpsample = (blkWh <= 8)
}
Intra edge upsample process
The inputs to this process are:

a variable numPx specifying the number of samples to filter,

a variable dir containing 0 when filtering the above samples, and 1 when filtering the left samples.
The output of this process are upsampled samples in the AboveRow and LeftCol arrays.
The variable buf is set depending on dir:

If dir is equal to 0, buf is set equal to a reference to AboveRow.

Otherwise (dir is equal to 1), buf is set equal to a reference to LeftCol.
Note: buf is a reference to either AboveRow or LeftCol. "reference" indicates that modifying values in buf modifies values in the original array.
When the process starts, entries 1 to numPx1 are valid in buf and contain the original values. When the process completes, entries 2 to 2*numPx2 are valid in buf and contain the upsampled values.
An array dup of length numPx+3 is generated by extending buf by one sample at the start and end as follows:
dup[ 0 ] = buf[ 1 ]
for ( i = 1; i < numPx; i++ ) {
dup[ i + 2 ] = buf[ i ]
}
dup[ numPx + 2 ] = buf[ numPx  1 ]
The upsampling process (modifying values in buf) is specified as follows:
buf[2] = dup[0]
for ( i = 0; i < numPx; i++ ) {
s = dup[i] + (9 * dup[i + 1]) + (9 * dup[i + 2])  dup[i + 3]
s = Clip1( Round2(s, 4) )
buf[ 2 * i  1 ] = s
buf[ 2 * i ] = dup[i + 2]
}
Intra edge filter process
The inputs to this process are:

a size sz (sz will always be less than or equal to 129),

a filter strength strength between 0 and 3 inclusive,

an edge direction left (when equal to 1, it specifies a vertical edge; when equal to 0, it specifies a horizontal edge.
The process filters the LeftCol (if left is equal to 1) or AboveRow (if left is equal to 0) arrays.
If strength is equal to 0, the process returns without doing anything.
The array edge is derived by setting edge[ i ] equal to ( left ? LeftCol[ i  1 ] : AboveRow[ i  1 ] ) for i = 0..sz1.
Otherwise (strength is not equal to 0), the following ordered steps apply for i = 1..sz1:

The variable s is set equal to 0.

The following steps now apply for j = 0..INTRA_EDGE_TAPS1.
a. The variable k is set equal to Clip3( 0, sz  1, i  2 + j ).
b. The variable s is incremented by Intra_Edge_Kernel[ strength  1 ][ j ] * edge[ k ].

If left is equal to 1, LeftCol[ i  1 ] is set equal to ( s + 8 ) >> 4.

If left is equal to 0, AboveRow[ i  1 ] is set equal to ( s + 8 ) >> 4.
The array Intra_Edge_Kernel is specified as follows:
Intra_Edge_Kernel[INTRA_EDGE_KERNELS][INTRA_EDGE_TAPS] = {
{ 0, 4, 8, 4, 0 },
{ 0, 5, 6, 5, 0 },
{ 2, 4, 4, 4, 2 }
}
Inter prediction process
General
The inter prediction process is invoked for inter coded blocks and interintra blocks. The inputs to this process are:

a variable plane specifying which plane is being predicted,

variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the region to be predicted,

variables w and h specifying the width and height of the region to be predicted,

variables candRow and candCol specifying the location (in units of 4x4 blocks) of the motion vector information to be used.
The outputs of this process are predicted samples in the current frame CurrFrame.
This process is triggered by a function call to predict_inter.
The variable isCompound is set equal to RefFrames[ candRow ][ candCol ][ 1 ] > INTRA_FRAME.
The prediction arrays are formed by the following ordered steps:

The rounding variables derivation process specified in Rounding variables derivation process is invoked with the variable isCompound as input.

If plane is equal to 0 and motion_mode is equal to LOCALWARP, the warp estimation process in Warp estimation process is invoked.

If plane is equal to 0 and motion_mode is equal to LOCALWARP and LocalValid is equal to 1, the setup shear process specified in Setup shear process is invoked with LocalWarpParams as input, and the output warpValid is assigned to LocalValid (the other outputs are discarded).

The variable refList is set equal to 0.

The variable refFrame is set equal to RefFrames[ candRow ][ candCol ][ refList ].

If (YMode == GLOBALMV  YMode == GLOBAL_GLOBALMV) and GmType[ refFrame ] > TRANSLATION, the setup shear process specified in Setup shear process is invoked with gm_params[ refFrame ] as input, and the output warpValid is assigned to globalValid (the other outputs are discarded).

The variable useWarp (a value of 1 indicates local warping, 2 indicates global warping) is derived as follows:

If w < 8 or h < 8, useWarp is set equal to 0.

Otherwise, if force_integer_mv is equal to 1, useWarp is set equal to 0.

Otherwise, if motion_mode is equal to LOCALWARP and LocalValid is equal to 1, useWarp is set equal to 1.

Otherwise, if all of the following are true, useWarp is set equal to 2.

(YMode == GLOBALMV  YMode == GLOBAL_GLOBALMV).

GmType[ refFrame ] > TRANSLATION.

is_scaled( refFrame ) is equal to 0.

globalValid is equal to 1.


Otherwise, useWarp is set equal to 0.


The motion vector array mv is set equal to Mvs[ candRow ][ candCol ][ refList ].

The variable refIdx specifying which reference frame is being used is set as follows:

If use_intrabc is equal to 0, refIdx is set equal to ref_frame_idx[ refFrame  LAST_FRAME ].

Otherwise (use_intrabc is equal to 1), refIdx is set equal to 1 and RefFrameWidth[ 1 ] is set equal to FrameWidth, RefFrameHeight[ 1 ] is set equal to FrameHeight, and RefUpscaledWidth[ 1 ] is set equal to UpscaledWidth. (These values ensure that the motion vector scaling has no effect.)


The motion vector scaling process in Motion vector scaling process is invoked with plane, refIdx, x, y, mv as inputs and the output being the initial location startX, startY, and the step sizes stepX, stepY.

If use_intrabc is equal to 1, RefFrameWidth[ 1 ] is set equal to MiCols * MI_SIZE, RefFrameHeight[ 1 ] is set equal to MiRows * MI_SIZE, and RefUpscaledWidth[ 1 ] is set equal to MiCols * MI_SIZE. (These values are needed to avoid intrabc prediction being cropped to the frame boundaries.)

If useWarp is not equal to 0, the block warp process in Block warp process is invoked with useWarp, plane, refList, x, y, i8, j8, w, h as inputs and the output is merged into the 2D array preds[ refList ] for i8 = 0..((h1) >> 3) and for j8 = 0..((w1) >> 3). (Each invocation fills in a block of output of size w by h at x offset j8 * 8 and y offset i8 * 8.)

If useWarp is equal to 0, the block inter prediction process in Block inter prediction process is invoked with plane, refIdx, startX, startY, stepX, stepY, w, h, candRow, candCol as inputs and the output is assigned to the 2D array preds[ refList ].

If isCompound is equal to 1, then the variable refList is set equal to 1 and steps 5 to 13 are repeated to form the prediction for the second reference.
An array named Mask is prepared as follows:

If compound_type is equal to COMPOUND_WEDGE and plane is equal to 0, the wedge mask process in Wedge mask process is invoked with w, h as inputs.

Otherwise if compound_type is equal to COMPOUND_INTRA, the intra mode variant mask process in Intra mode variant mask process is invoked with w, h as inputs.

Otherwise if compound_type is equal to COMPOUND_DIFFWTD and plane is equal to 0, the difference weight mask process in Difference weight mask process is invoked with preds, w, h as inputs.

Otherwise, no mask array is needed.
If compound_type is equal to COMPOUND_DISTANCE, the distance weights process in Distance weights process is invoked with candRow and candCol as inputs.
The inter predicted samples are then derived as follows:

If isCompound is equal to 0 and IsInterIntra is equal to 0, CurrFrame[ plane ][ y + i ][ x + j ] is set equal to Clip1( preds[ 0 ][ i ][ j ] ) for i = 0..h1 and j = 0..w1.

Otherwise if compound_type is equal to COMPOUND_AVERAGE, CurrFrame[ plane ][ y + i ][ x + j ] is set equal to Clip1( Round2( preds[ 0 ][ i ][ j ] + preds[ 1 ][ i ][ j ], 1 + InterPostRound ) ) for i = 0..h1 and j = 0..w1.

Otherwise if compound_type is equal to COMPOUND_DISTANCE, CurrFrame[ plane ][ y + i ][ x + j ] is set equal to Clip1( Round2( FwdWeight * preds[ 0 ][ i ][ j ] + BckWeight * preds[ 1 ][ i ][ j ], 4 + InterPostRound ) ) for i = 0..h1 and j = 0..w1.

Otherwise, the mask blend process in Mask blend process is invoked with preds, plane, x, y, w, h as inputs.
If motion_mode is equal to OBMC, the overlapped motion compensation in Overlapped motion compensation process is invoked with plane, w, h as inputs.
Rounding variables derivation process
The input to this process is a variable isCompound.
The rounding variables InterRound0, InterRound1, and InterPostRound are derived as follows:

InterRound0 (representing the amount to round by after horizontal filtering) is set equal to 3.

InterRound1 (representing the amount to round by after vertical filtering) is set equal to ( isCompound ? 7 : 11).

If BitDepth is equal to 12, InterRound0 is set equal to InterRound0 + 2.

If BitDepth is equal to 12 and isCompound is equal to 0, InterRound1 is set equal to InterRound1  2.

InterPostRound (representing the amount to round by at the end of the prediction process) is set equal to 2 * FILTER_BITS  ( InterRound0 + InterRound1 ).
Note: The rounding is chosen to ensure that the output of the horizontal filter always fits within 16 bits.
Motion vector scaling process
The inputs to this process are:

a variable plane specifying which plane is being predicted,

a variable refIdx specifying which reference frame is being used,

variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the region to be predicted,

a variable mv specifying the clamped motion vector (in units of 1/8 th of a luma sample, i.e. with 3 fractional bits).
The outputs of this process are the variables startX and startY giving the reference block location in units of 1/1024 th of a sample, and variables xStep and yStep giving the step size in units of 1/1024 th of a sample.
This process is responsible for computing the sampling locations in the reference frame based on the motion vector. The sampling locations are also adjusted to compensate for any difference in the size of the reference frame compared to the current frame.
Note: When intra block copy is being used, refIdx will be equal to 1 to signal prediction from the frame currently being decoded. The arrays RefFrameWidth, RefFrameHeight, and RefUpscaledWidth include values at index 1 giving the dimensions of the current frame.
It is a requirement of bitstream conformance that all the following conditions are satisfied:

2 * FrameWidth >= RefUpscaledWidth[ refIdx ]

2 * FrameHeight >= RefFrameHeight[ refIdx ]

FrameWidth <= 16 * RefUpscaledWidth[ refIdx ]

FrameHeight <= 16 * RefFrameHeight[ refIdx ]
A variable xScale is set equal to ( ( RefUpscaledWidth[ refIdx ] << REF_SCALE_SHIFT ) + ( FrameWidth / 2 ) ) / FrameWidth.
A variable yScale is set equal to ( ( RefFrameHeight[ refIdx ] << REF_SCALE_SHIFT ) + ( FrameHeight / 2 ) ) / FrameHeight.
(xScale and yScale specify the size of the reference frame relative to the current frame in units where (1 << 14) is equivalent to both frames having the same size.)
The variables subX and subY are set equal to the subsampling for the current plane as follows:

If plane is equal to 0, subX is set equal to 0 and subY is set equal to 0.

Otherwise, subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The variable halfSample (representing half the size of a sample in units of 1/16 th of a sample) is set equal to ( 1 << ( SUBPEL_BITS  1 ) ).
The variable origX is set equal to ( (x << SUBPEL_BITS) + ( ( 2 * mv[1] ) >> subX ) + halfSample ).
The variable origY is set equal to ( (y << SUBPEL_BITS) + ( ( 2 * mv[0] ) >> subY ) + halfSample ).
(origX and origY specify the location of the centre of the sample at the topleft corner of the reference block in the current frame's coordinate system in units of 1/16 th of a sample, i.e. with SUBPEL_BITS=4 fractional bits.)
The variable baseX is set equal to (origX * xScale  ( halfSample << REF_SCALE_SHIFT ) ).
The variable baseY is set equal to (origY * yScale  ( halfSample << REF_SCALE_SHIFT ) ).
(baseX and baseY specify the location of the topleft corner of the block in the reference frame in the reference frame's coordinate system with 18 fractional bits.)
The variable off (containing a rounding offset for the filter tap selection) is set equal to ( ( 1 << (SCALE_SUBPEL_BITS  SUBPEL_BITS) ) / 2 ).
The output variable startX is set equal to (Round2Signed( baseX, REF_SCALE_SHIFT + SUBPEL_BITS  SCALE_SUBPEL_BITS) + off).
The output variable startY is set equal to (Round2Signed( baseY, REF_SCALE_SHIFT + SUBPEL_BITS  SCALE_SUBPEL_BITS) + off).
(startX and startY specify the location of the topleft corner of the block in the reference frame in the reference frame's coordinate system with SCALE_SUBPEL_BITS=10 fractional bits.)
The output variable stepX is set equal to Round2Signed( xScale, REF_SCALE_SHIFT  SCALE_SUBPEL_BITS).
The output variable stepY is set equal to Round2Signed( yScale, REF_SCALE_SHIFT  SCALE_SUBPEL_BITS).
(stepX and stepY are the size of one current frame sample in the reference frame's coordinate system with 10 fractional bits.)
Block inter prediction process
The inputs to this process are:

a variable plane,

a variable refIdx specifying which reference frame is being used (or 1 for intra block copy),

variables x and y giving the block location with in units of 1/1024 th of a sample,

variables xStep and yStep giving the step size in units of 1/1024 th of a sample,

variables w and h giving the width and height of the block in units of samples,

variables candRow and candCol specifying the location (in units of 4x4 blocks) of the motion vector information to be used.
The output from this process is the 2D array named pred containing inter predicted samples.
A variable ref specifying the reference frame contents is set as follows:

If refIdx is equal to 1, ref is set equal to CurrFrame.

Otherwise (refIdx is greater than or equal to 0), ref is set equal to FrameStore[ refIdx ].
The variables subX and subY are set equal to the subsampling for the current plane as follows:

If plane is equal to 0, subX is set equal to 0 and subY is set equal to 0.

Otherwise, subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The variable lastX is set equal to ( (RefUpscaledWidth[ refIdx ] + subX) >> subX)  1.
The variable lastY is set equal to ( (RefFrameHeight[ refIdx ] + subY) >> subY)  1.
(lastX and lastY specify the coordinates of the bottom right sample of the reference plane.)
The variable intermediateHeight specifying the height required for the intermediate array is set equal to (((h  1) * yStep + (1 << SCALE_SUBPEL_BITS)  1) >> SCALE_SUBPEL_BITS) + 8.
The subsample interpolation is effected via two onedimensional convolutions. First a horizontal filter is used to build up a temporary array, and then this array is vertically filtered to obtain the final prediction. The fractional parts of the motion vectors determine the filtering process. If the fractional part is zero, then the filtering is equivalent to a straight sample copy.
The filtering is applied as follows:

The array intermediate is specified as follows:
interpFilter = InterpFilters[ candRow ][ candCol ][ 1 ] if ( w <= 4 ) { if ( interpFilter == EIGHTTAP  interpFilter == EIGHTTAP_SHARP ) { interpFilter = 4 } else if ( interpFilter == EIGHTTAP_SMOOTH ) { interpFilter = 5 } } for ( r = 0; r < intermediateHeight; r++ ) { for ( c = 0; c < w; c++ ) { s = 0 p = x + xStep * c for ( t = 0; t < 8; t++ ) s += Subpel_Filters[ interpFilter ][ (p >> 6) & SUBPEL_MASK ][ t ] * ref[ plane ] [ Clip3( 0, lastY, (y >> 10) + r  3 ) ] [ Clip3( 0, lastX, (p >> 10) + t  3 ) ] intermediate[ r ][ c ] = Round2(s, InterRound0) } }

The array pred is specified as follows:
interpFilter = InterpFilters[ candRow ][ candCol ][ 0 ] if ( h <= 4 ) { if ( interpFilter == EIGHTTAP  interpFilter == EIGHTTAP_SHARP ) { interpFilter = 4 } else if ( interpFilter == EIGHTTAP_SMOOTH ) { interpFilter = 5 } } for ( r = 0; r < h; r++ ) { for ( c = 0; c < w; c++ ) { s = 0 p = (y & 1023) + yStep * r for ( t = 0; t < 8; t++ ) s += Subpel_Filters[ interpFilter ][ (p >> 6) & SUBPEL_MASK ][ t ] * intermediate[ (p >> 10) + t ][ c ] pred[ r ][ c ] = Round2(s, InterRound1) } }
where the constant array Subpel_Filters is specified as:
Subpel_Filters[ 6 ][ 16 ][ 8 ] = { { { 0, 0, 0, 128, 0, 0, 0, 0 }, { 0, 2, 6, 126, 8, 2, 0, 0 }, { 0, 2, 10, 122, 18, 4, 0, 0 }, { 0, 2, 12, 116, 28, 8, 2, 0 }, { 0, 2, 14, 110, 38, 10, 2, 0 }, { 0, 2, 14, 102, 48, 12, 2, 0 }, { 0, 2, 16, 94, 58, 12, 2, 0 }, { 0, 2, 14, 84, 66, 12, 2, 0 }, { 0, 2, 14, 76, 76, 14, 2, 0 }, { 0, 2, 12, 66, 84, 14, 2, 0 }, { 0, 2, 12, 58, 94, 16, 2, 0 }, { 0, 2, 12, 48, 102, 14, 2, 0 }, { 0, 2, 10, 38, 110, 14, 2, 0 }, { 0, 2, 8, 28, 116, 12, 2, 0 }, { 0, 0, 4, 18, 122, 10, 2, 0 }, { 0, 0, 2, 8, 126, 6, 2, 0 } }, { { 0, 0, 0, 128, 0, 0, 0, 0 }, { 0, 2, 28, 62, 34, 2, 0, 0 }, { 0, 0, 26, 62, 36, 4, 0, 0 }, { 0, 0, 22, 62, 40, 4, 0, 0 }, { 0, 0, 20, 60, 42, 6, 0, 0 }, { 0, 0, 18, 58, 44, 8, 0, 0 }, { 0, 0, 16, 56, 46, 10, 0, 0 }, { 0, 2, 16, 54, 48, 12, 0, 0 }, { 0, 2, 14, 52, 52, 14, 2, 0 }, { 0, 0, 12, 48, 54, 16, 2, 0 }, { 0, 0, 10, 46, 56, 16, 0, 0 }, { 0, 0, 8, 44, 58, 18, 0, 0 }, { 0, 0, 6, 42, 60, 20, 0, 0 }, { 0, 0, 4, 40, 62, 22, 0, 0 }, { 0, 0, 4, 36, 62, 26, 0, 0 }, { 0, 0, 2, 34, 62, 28, 2, 0 } }, { { 0, 0, 0, 128, 0, 0, 0, 0 }, { 2, 2, 6, 126, 8, 2, 2, 0 }, { 2, 6, 12, 124, 16, 6, 4, 2 }, { 2, 8, 18, 120, 26, 10, 6, 2 }, { 4, 10, 22, 116, 38, 14, 6, 2 }, { 4, 10, 22, 108, 48, 18, 8, 2 }, { 4, 10, 24, 100, 60, 20, 8, 2 }, { 4, 10, 24, 90, 70, 22, 10, 2 }, { 4, 12, 24, 80, 80, 24, 12, 4 }, { 2, 10, 22, 70, 90, 24, 10, 4 }, { 2, 8, 20, 60, 100, 24, 10, 4 }, { 2, 8, 18, 48, 108, 22, 10, 4 }, { 2, 6, 14, 38, 116, 22, 10, 4 }, { 2, 6, 10, 26, 120, 18, 8, 2 }, { 2, 4, 6, 16, 124, 12, 6, 2 }, { 0, 2, 2, 8, 126, 6, 2, 2 } }, { { 0, 0, 0, 128, 0, 0, 0, 0 }, { 0, 0, 0, 120, 8, 0, 0, 0 }, { 0, 0, 0, 112, 16, 0, 0, 0 }, { 0, 0, 0, 104, 24, 0, 0, 0 }, { 0, 0, 0, 96, 32, 0, 0, 0 }, { 0, 0, 0, 88, 40, 0, 0, 0 }, { 0, 0, 0, 80, 48, 0, 0, 0 }, { 0, 0, 0, 72, 56, 0, 0, 0 }, { 0, 0, 0, 64, 64, 0, 0, 0 }, { 0, 0, 0, 56, 72, 0, 0, 0 }, { 0, 0, 0, 48, 80, 0, 0, 0 }, { 0, 0, 0, 40, 88, 0, 0, 0 }, { 0, 0, 0, 32, 96, 0, 0, 0 }, { 0, 0, 0, 24, 104, 0, 0, 0 }, { 0, 0, 0, 16, 112, 0, 0, 0 }, { 0, 0, 0, 8, 120, 0, 0, 0 } }, { { 0, 0, 0, 128, 0, 0, 0, 0 }, { 0, 0, 4, 126, 8, 2, 0, 0 }, { 0, 0, 8, 122, 18, 4, 0, 0 }, { 0, 0, 10, 116, 28, 6, 0, 0 }, { 0, 0, 12, 110, 38, 8, 0, 0 }, { 0, 0, 12, 102, 48, 10, 0, 0 }, { 0, 0, 14, 94, 58, 10, 0, 0 }, { 0, 0, 12, 84, 66, 10, 0, 0 }, { 0, 0, 12, 76, 76, 12, 0, 0 }, { 0, 0, 10, 66, 84, 12, 0, 0 }, { 0, 0, 10, 58, 94, 14, 0, 0 }, { 0, 0, 10, 48, 102, 12, 0, 0 }, { 0, 0, 8, 38, 110, 12, 0, 0 }, { 0, 0, 6, 28, 116, 10, 0, 0 }, { 0, 0, 4, 18, 122, 8, 0, 0 }, { 0, 0, 2, 8, 126, 4, 0, 0 } }, { { 0, 0, 0, 128, 0, 0, 0, 0 }, { 0, 0, 30, 62, 34, 2, 0, 0 }, { 0, 0, 26, 62, 36, 4, 0, 0 }, { 0, 0, 22, 62, 40, 4, 0, 0 }, { 0, 0, 20, 60, 42, 6, 0, 0 }, { 0, 0, 18, 58, 44, 8, 0, 0 }, { 0, 0, 16, 56, 46, 10, 0, 0 }, { 0, 0, 14, 54, 48, 12, 0, 0 }, { 0, 0, 12, 52, 52, 12, 0, 0 }, { 0, 0, 12, 48, 54, 14, 0, 0 }, { 0, 0, 10, 46, 56, 16, 0, 0 }, { 0, 0, 8, 44, 58, 18, 0, 0 }, { 0, 0, 6, 42, 60, 20, 0, 0 }, { 0, 0, 4, 40, 62, 22, 0, 0 }, { 0, 0, 4, 36, 62, 26, 0, 0 }, { 0, 0, 2, 34, 62, 30, 0, 0 } } }
Note: All the values in Subpel_Filters are even. The last two filter types are used for small blocks and only have four filter taps. The filter at index 4 has a four tap version of the EIGHTTAP filter. The filter at index 5 has a four tap version of the EIGHTAP_SMOOTH filter.
Block warp process
The inputs to this process are:

a variable useWarp (equal to 1 for local warp, or 2 for global warp),

a variable plane,

a variable refList specifying that the process should predict from RefFrame[ refList ],

variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the region to be predicted,

variables i8 and j8 specifying the offset (in units of 8 samples) relative to the top left sample,

variables w and h giving the width and height of the block in units of samples.
The output from this process is the 2D array named pred containing warped inter predicted samples.
The process only updates a section of the pred array. The size of the updated section is 8x8 samples, clipped to the size of the block. Variables i8 and j8 give the location of the section to update.
A variable refIdx specifying which reference frame is being used is set equal to ref_frame_idx[ RefFrame[ refList ]  LAST_FRAME ].
A variable ref specifying the reference frame contents is set equal to FrameStore[ refIdx ].
The variables subX and subY are set equal to the subsampling for the current plane as follows:

If plane is equal to 0, subX is set equal to 0 and subY is set equal to 0.

Otherwise, subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The variable lastX is set equal to ( (RefUpscaledWidth[ refIdx ] + subX) >> subX)  1.
The variable lastY is set equal to ( (RefFrameHeight[ refIdx ] + subY) >> subY)  1.
(lastX and lastY specify the coordinates of the bottom right sample of the reference plane.)
The variable srcX is set equal to (x + j8 * 8 + 4) << subX.
The variable srcY is set equal to (y + i8 * 8 + 4) << subY.
(srcX and srcY specify a location in the luma plane that will be projected using the warp parameters.)
The array warpParams is specified as follows:

If useWarp is equal to 1, warpParams is set equal to LocalWarpParams.

Otherwise (useWarp is equal to 2), warpParams is set equal to gm_params[ RefFrame[ refList ] ].
The variable dstX is set equal to warpParams[2] * srcX + warpParams[3] * srcY + warpParams[0].
The variable dstY is set equal to warpParams[4] * srcX + warpParams[5] * srcY + warpParams[1].
(dstX and dstY specify the destination location in the luma plane using WARPEDMODEL_PREC_BITS bits of precision).
The setup shear process specified in Setup shear process is invoked with warpParams as input, and the outputs are assigned to warpValid, alpha, beta, gamma, and delta. (warpValid will always be equal to 1 at this point.)
The subsample interpolation is effected via two onedimensional convolutions. First a horizontal filter is used to build up an intermediate array, and then this array is vertically filtered to obtain the final prediction.
The filtering is applied as follows:

The array intermediate is specified as follows:
x4 = dstX >> subX y4 = dstY >> subY ix4 = x4 >> WARPEDMODEL_PREC_BITS sx4 = x4 & ((1 << WARPEDMODEL_PREC_BITS)  1) iy4 = y4 >> WARPEDMODEL_PREC_BITS sy4 = y4 & ((1 << WARPEDMODEL_PREC_BITS)  1) for ( i1 = 7; i1 < 8; i1++ ) { for ( i2 = 4; i2 < 4; i2++ ) { sx = sx4 + alpha * i2 + beta * i1 offs = Round2(sx, WARPEDDIFF_PREC_BITS) + WARPEDPIXEL_PREC_SHIFTS s = 0 for ( i3 = 0; i3 < 8; i3++ ) { s += Warped_Filters[ offs ][ i3 ] * ref[ plane ][ Clip3( 0, lastY, iy4 + i1 ) ] [ Clip3( 0, lastX, ix4 + i2  3 + i3 ) ] } intermediate[(i1 + 7)][(i2 + 4)] = Round2(s, InterRound0) } }

The array pred is specified as follows:
for ( i1 = 4; i1 < Min(4, h  i8 * 8  4); i1++ ) { for ( i2 = 4; i2 < Min(4, w  j8 * 8  4); i2++ ) { sy = sy4 + gamma * i2 + delta * i1 offs = Round2(sy, WARPEDDIFF_PREC_BITS) + WARPEDPIXEL_PREC_SHIFTS s = 0 for ( i3 = 0; i3 < 8; i3++ ) { s += Warped_Filters[offs][i3] * intermediate[(i1 + i3 + 4)][(i2 + 4)] } pred[ i8 * 8 + i1 + 4 ][ j8 * 8 + i2 + 4 ] = Round2(s, InterRound1) } }
where the constant array Warped_Filters is specified as:
Warped_Filters[WARPEDPIXEL_PREC_SHIFTS * 3 + 1][8] = { { 0, 0, 127, 1, 0, 0, 0, 0 }, { 0,  1, 127, 2, 0, 0, 0, 0 }, { 1,  3, 127, 4,  1, 0, 0, 0 }, { 1,  4, 126, 6,  2, 1, 0, 0 }, { 1,  5, 126, 8,  3, 1, 0, 0 }, { 1,  6, 125, 11,  4, 1, 0, 0 }, { 1,  7, 124, 13,  4, 1, 0, 0 }, { 2,  8, 123, 15,  5, 1, 0, 0 }, { 2,  9, 122, 18,  6, 1, 0, 0 }, { 2, 10, 121, 20,  6, 1, 0, 0 }, { 2, 11, 120, 22,  7, 2, 0, 0 }, { 2, 12, 119, 25,  8, 2, 0, 0 }, { 3, 13, 117, 27,  8, 2, 0, 0 }, { 3, 13, 116, 29,  9, 2, 0, 0 }, { 3, 14, 114, 32, 10, 3, 0, 0 }, { 3, 15, 113, 35, 10, 2, 0, 0 }, { 3, 15, 111, 37, 11, 3, 0, 0 }, { 3, 16, 109, 40, 11, 3, 0, 0 }, { 3, 16, 108, 42, 12, 3, 0, 0 }, { 4, 17, 106, 45, 13, 3, 0, 0 }, { 4, 17, 104, 47, 13, 3, 0, 0 }, { 4, 17, 102, 50, 14, 3, 0, 0 }, { 4, 17, 100, 52, 14, 3, 0, 0 }, { 4, 18, 98, 55, 15, 4, 0, 0 }, { 4, 18, 96, 58, 15, 3, 0, 0 }, { 4, 18, 94, 60, 16, 4, 0, 0 }, { 4, 18, 91, 63, 16, 4, 0, 0 }, { 4, 18, 89, 65, 16, 4, 0, 0 }, { 4, 18, 87, 68, 17, 4, 0, 0 }, { 4, 18, 85, 70, 17, 4, 0, 0 }, { 4, 18, 82, 73, 17, 4, 0, 0 }, { 4, 18, 80, 75, 17, 4, 0, 0 }, { 4, 18, 78, 78, 18, 4, 0, 0 }, { 4, 17, 75, 80, 18, 4, 0, 0 }, { 4, 17, 73, 82, 18, 4, 0, 0 }, { 4, 17, 70, 85, 18, 4, 0, 0 }, { 4, 17, 68, 87, 18, 4, 0, 0 }, { 4, 16, 65, 89, 18, 4, 0, 0 }, { 4, 16, 63, 91, 18, 4, 0, 0 }, { 4, 16, 60, 94, 18, 4, 0, 0 }, { 3, 15, 58, 96, 18, 4, 0, 0 }, { 4, 15, 55, 98, 18, 4, 0, 0 }, { 3, 14, 52, 100, 17, 4, 0, 0 }, { 3, 14, 50, 102, 17, 4, 0, 0 }, { 3, 13, 47, 104, 17, 4, 0, 0 }, { 3, 13, 45, 106, 17, 4, 0, 0 }, { 3, 12, 42, 108, 16, 3, 0, 0 }, { 3, 11, 40, 109, 16, 3, 0, 0 }, { 3, 11, 37, 111, 15, 3, 0, 0 }, { 2, 10, 35, 113, 15, 3, 0, 0 }, { 3, 10, 32, 114, 14, 3, 0, 0 }, { 2,  9, 29, 116, 13, 3, 0, 0 }, { 2,  8, 27, 117, 13, 3, 0, 0 }, { 2,  8, 25, 119, 12, 2, 0, 0 }, { 2,  7, 22, 120, 11, 2, 0, 0 }, { 1,  6, 20, 121, 10, 2, 0, 0 }, { 1,  6, 18, 122,  9, 2, 0, 0 }, { 1,  5, 15, 123,  8, 2, 0, 0 }, { 1,  4, 13, 124,  7, 1, 0, 0 }, { 1,  4, 11, 125,  6, 1, 0, 0 }, { 1,  3, 8, 126,  5, 1, 0, 0 }, { 1,  2, 6, 126,  4, 1, 0, 0 }, { 0,  1, 4, 127,  3, 1, 0, 0 }, { 0, 0, 2, 127,  1, 0, 0, 0 }, { 0, 0, 0, 127, 1, 0, 0, 0}, { 0, 0, 1, 127, 2, 0, 0, 0}, { 0, 1, 3, 127, 4, 2, 1, 0}, { 0, 1, 5, 127, 6, 2, 1, 0}, { 0, 2, 6, 126, 8, 3, 1, 0}, {1, 2, 7, 126, 11, 4, 2, 1}, {1, 3, 8, 125, 13, 5, 2, 1}, {1, 3, 10, 124, 16, 6, 3, 1}, {1, 4, 11, 123, 18, 7, 3, 1}, {1, 4, 12, 122, 20, 7, 3, 1}, {1, 4, 13, 121, 23, 8, 3, 1}, {2, 5, 14, 120, 25, 9, 4, 1}, {1, 5, 15, 119, 27, 10, 4, 1}, {1, 5, 16, 118, 30, 11, 4, 1}, {2, 6, 17, 116, 33, 12, 5, 1}, {2, 6, 17, 114, 35, 12, 5, 1}, {2, 6, 18, 113, 38, 13, 5, 1}, {2, 7, 19, 111, 41, 14, 6, 2}, {2, 7, 19, 110, 43, 15, 6, 2}, {2, 7, 20, 108, 46, 15, 6, 2}, {2, 7, 20, 106, 49, 16, 6, 2}, {2, 7, 21, 104, 51, 16, 7, 2}, {2, 7, 21, 102, 54, 17, 7, 2}, {2, 8, 21, 100, 56, 18, 7, 2}, {2, 8, 22, 98, 59, 18, 7, 2}, {2, 8, 22, 96, 62, 19, 7, 2}, {2, 8, 22, 94, 64, 19, 7, 2}, {2, 8, 22, 91, 67, 20, 8, 2}, {2, 8, 22, 89, 69, 20, 8, 2}, {2, 8, 22, 87, 72, 21, 8, 2}, {2, 8, 21, 84, 74, 21, 8, 2}, {2, 8, 22, 82, 77, 21, 8, 2}, {2, 8, 21, 79, 79, 21, 8, 2}, {2, 8, 21, 77, 82, 22, 8, 2}, {2, 8, 21, 74, 84, 21, 8, 2}, {2, 8, 21, 72, 87, 22, 8, 2}, {2, 8, 20, 69, 89, 22, 8, 2}, {2, 8, 20, 67, 91, 22, 8, 2}, {2, 7, 19, 64, 94, 22, 8, 2}, {2, 7, 19, 62, 96, 22, 8, 2}, {2, 7, 18, 59, 98, 22, 8, 2}, {2, 7, 18, 56, 100, 21, 8, 2}, {2, 7, 17, 54, 102, 21, 7, 2}, {2, 7, 16, 51, 104, 21, 7, 2}, {2, 6, 16, 49, 106, 20, 7, 2}, {2, 6, 15, 46, 108, 20, 7, 2}, {2, 6, 15, 43, 110, 19, 7, 2}, {2, 6, 14, 41, 111, 19, 7, 2}, {1, 5, 13, 38, 113, 18, 6, 2}, {1, 5, 12, 35, 114, 17, 6, 2}, {1, 5, 12, 33, 116, 17, 6, 2}, {1, 4, 11, 30, 118, 16, 5, 1}, {1, 4, 10, 27, 119, 15, 5, 1}, {1, 4, 9, 25, 120, 14, 5, 2}, {1, 3, 8, 23, 121, 13, 4, 1}, {1, 3, 7, 20, 122, 12, 4, 1}, {1, 3, 7, 18, 123, 11, 4, 1}, {1, 3, 6, 16, 124, 10, 3, 1}, {1, 2, 5, 13, 125, 8, 3, 1}, {1, 2, 4, 11, 126, 7, 2, 1}, { 0, 1, 3, 8, 126, 6, 2, 0}, { 0, 1, 2, 6, 127, 5, 1, 0}, { 0, 1, 2, 4, 127, 3, 1, 0}, { 0, 0, 0, 2, 127, 1, 0, 0}, { 0, 0, 0, 1, 127, 0, 0, 0 }, { 0, 0, 0,  1, 127, 2, 0, 0 }, { 0, 0, 1,  3, 127, 4,  1, 0 }, { 0, 0, 1,  4, 126, 6,  2, 1 }, { 0, 0, 1,  5, 126, 8,  3, 1 }, { 0, 0, 1,  6, 125, 11,  4, 1 }, { 0, 0, 1,  7, 124, 13,  4, 1 }, { 0, 0, 2,  8, 123, 15,  5, 1 }, { 0, 0, 2,  9, 122, 18,  6, 1 }, { 0, 0, 2, 10, 121, 20,  6, 1 }, { 0, 0, 2, 11, 120, 22,  7, 2 }, { 0, 0, 2, 12, 119, 25,  8, 2 }, { 0, 0, 3, 13, 117, 27,  8, 2 }, { 0, 0, 3, 13, 116, 29,  9, 2 }, { 0, 0, 3, 14, 114, 32, 10, 3 }, { 0, 0, 3, 15, 113, 35, 10, 2 }, { 0, 0, 3, 15, 111, 37, 11, 3 }, { 0, 0, 3, 16, 109, 40, 11, 3 }, { 0, 0, 3, 16, 108, 42, 12, 3 }, { 0, 0, 4, 17, 106, 45, 13, 3 }, { 0, 0, 4, 17, 104, 47, 13, 3 }, { 0, 0, 4, 17, 102, 50, 14, 3 }, { 0, 0, 4, 17, 100, 52, 14, 3 }, { 0, 0, 4, 18, 98, 55, 15, 4 }, { 0, 0, 4, 18, 96, 58, 15, 3 }, { 0, 0, 4, 18, 94, 60, 16, 4 }, { 0, 0, 4, 18, 91, 63, 16, 4 }, { 0, 0, 4, 18, 89, 65, 16, 4 }, { 0, 0, 4, 18, 87, 68, 17, 4 }, { 0, 0, 4, 18, 85, 70, 17, 4 }, { 0, 0, 4, 18, 82, 73, 17, 4 }, { 0, 0, 4, 18, 80, 75, 17, 4 }, { 0, 0, 4, 18, 78, 78, 18, 4 }, { 0, 0, 4, 17, 75, 80, 18, 4 }, { 0, 0, 4, 17, 73, 82, 18, 4 }, { 0, 0, 4, 17, 70, 85, 18, 4 }, { 0, 0, 4, 17, 68, 87, 18, 4 }, { 0, 0, 4, 16, 65, 89, 18, 4 }, { 0, 0, 4, 16, 63, 91, 18, 4 }, { 0, 0, 4, 16, 60, 94, 18, 4 }, { 0, 0, 3, 15, 58, 96, 18, 4 }, { 0, 0, 4, 15, 55, 98, 18, 4 }, { 0, 0, 3, 14, 52, 100, 17, 4 }, { 0, 0, 3, 14, 50, 102, 17, 4 }, { 0, 0, 3, 13, 47, 104, 17, 4 }, { 0, 0, 3, 13, 45, 106, 17, 4 }, { 0, 0, 3, 12, 42, 108, 16, 3 }, { 0, 0, 3, 11, 40, 109, 16, 3 }, { 0, 0, 3, 11, 37, 111, 15, 3 }, { 0, 0, 2, 10, 35, 113, 15, 3 }, { 0, 0, 3, 10, 32, 114, 14, 3 }, { 0, 0, 2,  9, 29, 116, 13, 3 }, { 0, 0, 2,  8, 27, 117, 13, 3 }, { 0, 0, 2,  8, 25, 119, 12, 2 }, { 0, 0, 2,  7, 22, 120, 11, 2 }, { 0, 0, 1,  6, 20, 121, 10, 2 }, { 0, 0, 1,  6, 18, 122,  9, 2 }, { 0, 0, 1,  5, 15, 123,  8, 2 }, { 0, 0, 1,  4, 13, 124,  7, 1 }, { 0, 0, 1,  4, 11, 125,  6, 1 }, { 0, 0, 1,  3, 8, 126,  5, 1 }, { 0, 0, 1,  2, 6, 126,  4, 1 }, { 0, 0, 0,  1, 4, 127,  3, 1 }, { 0, 0, 0, 0, 2, 127,  1, 0 }, { 0, 0, 0, 0, 2, 127,  1, 0 } }
Setup shear process
The input for this process is an array warpParams representing an affine transformation.
The outputs from this process are the variable warpValid and variables alpha, beta, gamma, delta representing two shearing operations that combine to make the full affine transformation.
The variable alpha0 is set equal to Clip3( 32768, 32767, warpParams[ 2 ]  (1 << WARPEDMODEL_PREC_BITS) ).
The variable beta0 is set equal to Clip3( 32768, 32767, warpParams[ 3 ] ).
The resolve divisor process specified in Resolve divisor process is invoked with warpParams[ 2 ] as input, and the outputs assigned to divShift and divFactor.
The variable v is set equal to ( warpParams[ 4 ] << WARPEDMODEL_PREC_BITS ).
The variable gamma0 is set equal to Clip3( 32768, 32767, Round2Signed( v * divFactor, divShift ) ).
The variable w is set equal to ( warpParams[ 3 ] * warpParams[ 4 ] ).
The variable delta0 is set equal to Clip3( 32768, 32767, warpParams[ 5 ]  Round2Signed( w * divFactor, divShift )  (1 << WARPEDMODEL_PREC_BITS) ).
The output variables alpha, beta, gamma, delta are set as follows:
alpha = Round2Signed( alpha0, WARP_PARAM_REDUCE_BITS ) << WARP_PARAM_REDUCE_BITS
beta = Round2Signed( beta0, WARP_PARAM_REDUCE_BITS ) << WARP_PARAM_REDUCE_BITS
gamma = Round2Signed( gamma0, WARP_PARAM_REDUCE_BITS ) << WARP_PARAM_REDUCE_BITS
delta = Round2Signed( delta0, WARP_PARAM_REDUCE_BITS ) << WARP_PARAM_REDUCE_BITS
The output warpValid is set as follows:

If 4 * Abs( alpha ) + 7 * Abs( beta ) is greater than or equal to (1 << WARPEDMODEL_PREC_BITS), warpValid is set equal to 0.

If 4 * Abs( gamma ) + 4 * Abs( delta ) is greater than or equal to (1 << WARPEDMODEL_PREC_BITS), warpValid is set equal to 0.

Otherwise, warpValid is set equal to 1.
Resolve divisor process
The input for this process is a variable d.
The outputs for this process are variables divShift and divFactor that can be used to perform an approximate division by d via multiplying by divFactor and shifting right by divShift.
The variable n (representing the location of the most signficant bit in Abs(d) ) is set equal to FloorLog2( Abs(d) ).
The variable e is set equal to Abs( d )  ( 1 << n ).
The variable f is set as follows:

If n > DIV_LUT_BITS, f is set equal to Round2( e, n  DIV_LUT_BITS ).

Otherwise, f is set equal to e << ( DIV_LUT_BITS  n ).
The output variable divShift is set equal to ( n + DIV_LUT_PREC_BITS ).
The output variable divFactor is set as follows:

If d is less than 0, divFactor is set equal to Div_Lut[ f ].

Otherwise, divFactor is set equal to Div_Lut[ f ].
The lookup table Div_Lut is specified as:
Div_Lut[DIV_LUT_NUM] = {
16384, 16320, 16257, 16194, 16132, 16070, 16009, 15948, 15888, 15828, 15768,
15709, 15650, 15592, 15534, 15477, 15420, 15364, 15308, 15252, 15197, 15142,
15087, 15033, 14980, 14926, 14873, 14821, 14769, 14717, 14665, 14614, 14564,
14513, 14463, 14413, 14364, 14315, 14266, 14218, 14170, 14122, 14075, 14028,
13981, 13935, 13888, 13843, 13797, 13752, 13707, 13662, 13618, 13574, 13530,
13487, 13443, 13400, 13358, 13315, 13273, 13231, 13190, 13148, 13107, 13066,
13026, 12985, 12945, 12906, 12866, 12827, 12788, 12749, 12710, 12672, 12633,
12596, 12558, 12520, 12483, 12446, 12409, 12373, 12336, 12300, 12264, 12228,
12193, 12157, 12122, 12087, 12053, 12018, 11984, 11950, 11916, 11882, 11848,
11815, 11782, 11749, 11716, 11683, 11651, 11619, 11586, 11555, 11523, 11491,
11460, 11429, 11398, 11367, 11336, 11305, 11275, 11245, 11215, 11185, 11155,
11125, 11096, 11067, 11038, 11009, 10980, 10951, 10923, 10894, 10866, 10838,
10810, 10782, 10755, 10727, 10700, 10673, 10645, 10618, 10592, 10565, 10538,
10512, 10486, 10460, 10434, 10408, 10382, 10356, 10331, 10305, 10280, 10255,
10230, 10205, 10180, 10156, 10131, 10107, 10082, 10058, 10034, 10010, 9986,
9963, 9939, 9916, 9892, 9869, 9846, 9823, 9800, 9777, 9754, 9732,
9709, 9687, 9664, 9642, 9620, 9598, 9576, 9554, 9533, 9511, 9489,
9468, 9447, 9425, 9404, 9383, 9362, 9341, 9321, 9300, 9279, 9259,
9239, 9218, 9198, 9178, 9158, 9138, 9118, 9098, 9079, 9059, 9039,
9020, 9001, 8981, 8962, 8943, 8924, 8905, 8886, 8867, 8849, 8830,
8812, 8793, 8775, 8756, 8738, 8720, 8702, 8684, 8666, 8648, 8630,
8613, 8595, 8577, 8560, 8542, 8525, 8508, 8490, 8473, 8456, 8439,
8422, 8405, 8389, 8372, 8355, 8339, 8322, 8306, 8289, 8273, 8257,
8240, 8224, 8208, 8192
}
Warp estimation process
This process produces the array LocalWarpParams based on NumSamples candidates in CandList by performing a least squares fit.
It also produces a variable LocalValid indicating whether the process was successful.
A 2x2 matrix A, and two length 2 arrays Bx and By are constructed as follows:
for ( i = 0; i < 2; i++ ) {
for ( j = 0; j < 2; j++ ) {
A[i][j] = 0
}
Bx[i] = 0
By[i] = 0
}
w4 = Num_4x4_Blocks_Wide[MiSize]
h4 = Num_4x4_Blocks_High[MiSize]
midY = MiRow * 4 + h4 * 2  1
midX = MiCol * 4 + w4 * 2  1
suy = midY * 8
sux = midX * 8
duy = suy + Mv[0][0]
dux = sux + Mv[0][1]
for ( i = 0; i < NumSamples; i++ ) {
sy = CandList[i][0]  suy
sx = CandList[i][1]  sux
dy = CandList[i][2]  duy
dx = CandList[i][3]  dux
if ( Abs(sx  dx) < LS_MV_MAX && Abs(sy  dy) < LS_MV_MAX ) {
A[0][0] += ls_product(sx, sx) + 8
A[0][1] += ls_product(sx, sy) + 4
A[1][1] += ls_product(sy, sy) + 8
Bx[0] += ls_product(sx, dx) + 8
Bx[1] += ls_product(sy, dx) + 4
By[0] += ls_product(sx, dy) + 4
By[1] += ls_product(sy, dy) + 8
}
}
where ls_product is specified as:
ls_product(a, b) {
return ( (a * b) >> 2) + (a + b)
}
Note: The matrix A is symmetric so entry A[1][0] is omitted.
The variable det (containing the determinant of the matrix A) is set equal to A[0][0] * A[1][1]  A[0][1] * A[0][1].
The variable LocalValid is set as follows:

If det is equal to 0, LocalValid is set equal to 0.

Otherwise, LocalValid is set equal to 1.
If det is equal to 0, this process terminates at this point.
The resolve divisor process specified in Resolve divisor process is invoked with det as input, and the outputs assigned to divShift and divFactor.
The local warp parameters in LocalWarpParams are derived as follows:
divShift = WARPEDMODEL_PREC_BITS
if ( divShift < 0 ) {
divFactor = divFactor << (divShift)
divShift = 0
}
LocalWarpParams[2] = diag( A[1][1] * Bx[0]  A[0][1] * Bx[1])
LocalWarpParams[3] = nondiag( A[0][1] * Bx[0] + A[0][0] * Bx[1])
LocalWarpParams[4] = nondiag( A[1][1] * By[0]  A[0][1] * By[1])
LocalWarpParams[5] = diag( A[0][1] * By[0] + A[0][0] * By[1])
mvx = Mv[ 0 ][ 1 ]
mvy = Mv[ 0 ][ 0 ]
vx = mvx * (1 << (WARPEDMODEL_PREC_BITS  3)) 
(midX * (LocalWarpParams[2]  (1 << WARPEDMODEL_PREC_BITS)) + midY * LocalWarpParams[3])
vy = mvy * (1 << (WARPEDMODEL_PREC_BITS  3)) 
(midX * LocalWarpParams[4] + midY * (LocalWarpParams[5]  (1 << WARPEDMODEL_PREC_BITS)))
LocalWarpParams[0] = Clip3(WARPEDMODEL_TRANS_CLAMP, WARPEDMODEL_TRANS_CLAMP  1, vx)
LocalWarpParams[1] = Clip3(WARPEDMODEL_TRANS_CLAMP, WARPEDMODEL_TRANS_CLAMP  1, vy)
where diag and nondiag are specified to divide and clamp using divFactor and divShift as follows:
nondiag(v) {
return Clip3(WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
WARPEDMODEL_NONDIAGAFFINE_CLAMP  1,
Round2Signed(v * divFactor, divShift))
}
diag(v) {
return Clip3((1 << WARPEDMODEL_PREC_BITS)  WARPEDMODEL_NONDIAGAFFINE_CLAMP + 1,
(1 << WARPEDMODEL_PREC_BITS) + WARPEDMODEL_NONDIAGAFFINE_CLAMP  1,
Round2Signed(v * divFactor, divShift))
}
Overlapped motion compensation process
The inputs to this process are:

a variable plane specifying which plane is being predicted,

variables w and h specifying the width and height of the region to be predicted.
The outputs of this process are modified inter predicted samples in the current frame CurrFrame.
This process blends the inter predicted samples for the current block with inter predicted samples based on motion vectors from the above and left blocks.
The maximum number of overlapped predictions is limited based on the size of the block.
For small blocks, only the left neighbor will be used to form the prediction.
The variables subX and subY describing the subsampling of the current plane are derived as follows:

If plane is equal to 0, subX and subY are set equal to 0.

Otherwise (plane is not equal to 0), subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The process is specified as:
if ( AvailU ) {
if ( get_plane_residual_size( MiSize, plane ) >= BLOCK_8X8 ) {
pass = 0
w4 = Num_4x4_Blocks_Wide[ MiSize ]
x4 = MiCol
y4 = MiRow
nCount = 0
nLimit = Min(4, Mi_Width_Log2[ MiSize ])
while ( nCount < nLimit && x4 < Min( MiCols, MiCol + w4 ) ) {
candRow = MiRow  1
candCol = x4  1
candSz = MiSizes[ candRow ][ candCol ]
step4 = Clip3( 2, 16, Num_4x4_Blocks_Wide[ candSz ] )
if ( RefFrames[ candRow ][ candCol ][ 0 ] > INTRA_FRAME ) {
nCount += 1
predW = Min( w, ( step4 * MI_SIZE ) >> subX )
predH = Min( h >> 1, 32 >> subY )
mask = get_obmc_mask( predH )
predict_overlap( )
}
x4 += step4
}
}
}
if ( AvailL ) {
pass = 1
h4 = Num_4x4_Blocks_High[ MiSize ]
x4 = MiCol
y4 = MiRow
nCount = 0
nLimit = Min(4, Mi_Height_Log2[ MiSize ])
while ( nCount < nLimit && y4 < Min( MiRows, MiRow + h4 ) ) {
candCol = MiCol  1
candRow = y4  1
candSz = MiSizes[ candRow ][ candCol ]
step4 = Clip3( 2, 16, Num_4x4_Blocks_High[ candSz ] )
if ( RefFrames[ candRow ][ candCol ][ 0 ] > INTRA_FRAME ) {
nCount += 1
predW = Min( w >> 1, 32 >> subX )
predH = Min( h, ( step4 * MI_SIZE ) >> subY )
mask = get_obmc_mask( predW )
predict_overlap( )
}
y4 += step4
}
}
When the function predict_overlap is invoked, the following ordered steps apply to form the overlap prediction for a region of size predW by predH based on the candidate motion vector:

The motion vector mv is set equal to Mvs[ candRow ][ candCol ][ 0 ].

The variable refIdx is set equal to ref_frame_idx[ RefFrames[ candRow ][ candCol ][ 0 ]  LAST_FRAME ].

The variable predX is set equal to (x4 * 4) >> subX.

The variable predY is set equal to (y4 * 4) >> subY.

The motion vector scaling process in Motion vector scaling process is invoked with plane, refIdx, predX, predY, mv as inputs and the output being the initial location startX, startY, and the step sizes stepX, stepY.

The block inter prediction process in Block inter prediction process is invoked with plane, refIdx, startX, startY, stepX, stepY, predW, predH, candRow, candCol as inputs and the output is assigned to the 2D array obmcPred.

obmcPred[ i ][ j ] is set equal to Clip1( obmcPred[ i ][ j ] ) for i = 0..predH1 and j = 0..predW1.

The blending process in Overlap blending process is invoked with plane, predX, predY, predW, predH, pass, obmcPred, and mask as inputs.
The function get_obmc_mask returns a blending mask as follows:
get_obmc_mask( length ) {
if ( length == 2 ) {
return Obmc_Mask_2
} else if ( length == 4 ) {
return Obmc_Mask_4
} else if ( length == 8 ) {
return Obmc_Mask_8
} else if ( length == 16 ) {
return Obmc_Mask_16
} else {
return Obmc_Mask_32
}
}
The blending masks are defined as follows:
Obmc_Mask_2[2] = { 45, 64 }
Obmc_Mask_4[4] = { 39, 50, 59, 64 }
Obmc_Mask_8[8] = { 36, 42, 48, 53, 57, 61, 64, 64 }
Obmc_Mask_16[16] = { 34, 37, 40, 43, 46, 49, 52, 54,
56, 58, 60, 61, 64, 64, 64, 64 }
Obmc_Mask_32[32] = { 33, 35, 36, 38, 40, 41, 43, 44,
45, 47, 48, 50, 51, 52, 53, 55,
56, 57, 58, 59, 60, 60, 61, 62,
64, 64, 64, 64, 64, 64, 64, 64 }
Overlap blending process
The inputs to this process are:

a variable plane specifying which plane is being predicted,

variables predX and predY specifying the location of the top left sample in the CurrFrame[ plane ] array of the region to be predicted,

variables predW and predH specifying the width and height of the region to be predicted,

a variable pass equal to 0 if blending above samples, or equal to 1 if blending left samples,

a 2d array obmcPred containing the samples predicted from a neighboring motion vector,

an array mask containing the blending weights.
The outputs of this process are modified inter predicted samples in the current frame CurrFrame.
For i = 0..(predH  1) and j = 0..(predW  1), the following ordered steps apply:

The variable m specifying the blending factor is specifed as follows:

If pass is equal to 0 (blend from above), m is set equal to mask[ i ].

Otherwise (pass is equal to 1 meaning blend from left), m is set equal to mask[ j ].


CurrFrame[ plane ][ predY + i ][ predX + j ] is set equal to Round2( m * CurrFrame[ plane ][ predY + i ][ predX + j ] + (64  m) * obmcPred[ i ][ j ], 6)
Wedge mask process
The input to this process is:
 variables w and h specifying the width and height of the region to be predicted.
This process sets up a mask array for the luma samples.
The mask is specified as:
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
Mask[ i ][ j ] = WedgeMasks[ MiSize ][ wedge_sign ][ wedge_index ][ i ][ j ]
}
}
where WedgeMasks is a fixed lookup table that is generated by the following function:
initialise_wedge_mask_table( ) {
w = MASK_MASTER_SIZE
h = MASK_MASTER_SIZE
for ( j = 0; j < w; j++ ) {
shift = MASK_MASTER_SIZE / 4
for ( i = 0; i < h; i += 2 ) {
MasterMask[ WEDGE_OBLIQUE63 ][ i ][ j ] = Wedge_Master_Oblique_Even[ Clip3( 0, MASK_MASTER_SIZE  1, j  shift ) ]
shift = 1
MasterMask[ WEDGE_OBLIQUE63 ][ i + 1][ j ] = Wedge_Master_Oblique_Odd[ Clip3( 0, MASK_MASTER_SIZE  1, j  shift ) ]
MasterMask[ WEDGE_VERTICAL ][ i ][ j ] = Wedge_Master_Vertical[ j ]
MasterMask[ WEDGE_VERTICAL ][ i + 1 ][ j ] = Wedge_Master_Vertical[ j ]
}
}
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
msk = MasterMask[ WEDGE_OBLIQUE63 ][ i ][ j ]
MasterMask[ WEDGE_OBLIQUE27 ][ j ][ i ] = msk
MasterMask[ WEDGE_OBLIQUE117 ][ i ][ w  1  j ] = 64  msk
MasterMask[ WEDGE_OBLIQUE153 ][ w  1  j ][ i ] = 64  msk
MasterMask[ WEDGE_HORIZONTAL ][ j ][ i ] = MasterMask[ WEDGE_VERTICAL ][ i ][ j ]
}
}
for ( bsize = BLOCK_8X8; bsize < BLOCK_SIZES; bsize++ ) {
if ( Wedge_Bits[ bsize ] > 0 ) {
w = Block_Width[ bsize ]
h = Block_Height[ bsize ]
for ( wedge = 0; wedge < WEDGE_TYPES; wedge++ ) {
dir = get_wedge_direction(bsize, wedge)
xoff = MASK_MASTER_SIZE / 2  ((get_wedge_xoff(bsize, wedge) * w) >> 3)
yoff = MASK_MASTER_SIZE / 2  ((get_wedge_yoff(bsize, wedge) * h) >> 3)
sum = 0
for ( i = 0; i < w; i++ )
sum += MasterMask[ dir ][ yoff ][ xoff+i ]
for ( i = 1; i < h; i++ )
sum += MasterMask[ dir ][ yoff+i ][ xoff ]
avg = (sum + (w + h  1) / 2) / (w + h  1)
flipSign = (avg < 32)
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
WedgeMasks[ bsize ][ flipSign ][ wedge ][ i ][ j ] = MasterMask[ dir ][ yoff+i ][ xoff+j ]
WedgeMasks[ bsize ][ !flipSign ][ wedge ][ i ][ j ] = 64  MasterMask[ dir ][ yoff+i ][ xoff+j ]
}
}
}
}
}
}
The 1d lookup tables are defined as:
Wedge_Master_Oblique_Odd[MASK_MASTER_SIZE] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 6, 18,
37, 53, 60, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
}
Wedge_Master_Oblique_Even[MASK_MASTER_SIZE] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 4, 11, 27,
46, 58, 62, 63, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
}
Wedge_Master_Vertical[MASK_MASTER_SIZE] = {
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0,
0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 2, 7, 21,
43, 57, 62, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64,
64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64, 64
}
The get_wedge functions are defined as:
block_shape(bsize) {
w4 = Num_4x4_Blocks_Wide[bsize]
h4 = Num_4x4_Blocks_High[bsize]
if ( h4 > w4 )
return 0
else if ( h4 < w4 )
return 1
else
return 2
}
get_wedge_direction(bsize, index) {
return Wedge_Codebook[block_shape(bsize)][index][0]
}
get_wedge_xoff(bsize, index) {
return Wedge_Codebook[block_shape(bsize)][index][1]
}
get_wedge_yoff(bsize, index) {
return Wedge_Codebook[block_shape(bsize)][index][2]
}
Wedge_Codebook[3][16][3] = {
{
{ WEDGE_OBLIQUE27, 4, 4 }, { WEDGE_OBLIQUE63, 4, 4 },
{ WEDGE_OBLIQUE117, 4, 4 }, { WEDGE_OBLIQUE153, 4, 4 },
{ WEDGE_HORIZONTAL, 4, 2 }, { WEDGE_HORIZONTAL, 4, 4 },
{ WEDGE_HORIZONTAL, 4, 6 }, { WEDGE_VERTICAL, 4, 4 },
{ WEDGE_OBLIQUE27, 4, 2 }, { WEDGE_OBLIQUE27, 4, 6 },
{ WEDGE_OBLIQUE153, 4, 2 }, { WEDGE_OBLIQUE153, 4, 6 },
{ WEDGE_OBLIQUE63, 2, 4 }, { WEDGE_OBLIQUE63, 6, 4 },
{ WEDGE_OBLIQUE117, 2, 4 }, { WEDGE_OBLIQUE117, 6, 4 },
},
{
{ WEDGE_OBLIQUE27, 4, 4 }, { WEDGE_OBLIQUE63, 4, 4 },
{ WEDGE_OBLIQUE117, 4, 4 }, { WEDGE_OBLIQUE153, 4, 4 },
{ WEDGE_VERTICAL, 2, 4 }, { WEDGE_VERTICAL, 4, 4 },
{ WEDGE_VERTICAL, 6, 4 }, { WEDGE_HORIZONTAL, 4, 4 },
{ WEDGE_OBLIQUE27, 4, 2 }, { WEDGE_OBLIQUE27, 4, 6 },
{ WEDGE_OBLIQUE153, 4, 2 }, { WEDGE_OBLIQUE153, 4, 6 },
{ WEDGE_OBLIQUE63, 2, 4 }, { WEDGE_OBLIQUE63, 6, 4 },
{ WEDGE_OBLIQUE117, 2, 4 }, { WEDGE_OBLIQUE117, 6, 4 },
},
{
{ WEDGE_OBLIQUE27, 4, 4 }, { WEDGE_OBLIQUE63, 4, 4 },
{ WEDGE_OBLIQUE117, 4, 4 }, { WEDGE_OBLIQUE153, 4, 4 },
{ WEDGE_HORIZONTAL, 4, 2 }, { WEDGE_HORIZONTAL, 4, 6 },
{ WEDGE_VERTICAL, 2, 4 }, { WEDGE_VERTICAL, 6, 4 },
{ WEDGE_OBLIQUE27, 4, 2 }, { WEDGE_OBLIQUE27, 4, 6 },
{ WEDGE_OBLIQUE153, 4, 2 }, { WEDGE_OBLIQUE153, 4, 6 },
{ WEDGE_OBLIQUE63, 2, 4 }, { WEDGE_OBLIQUE63, 6, 4 },
{ WEDGE_OBLIQUE117, 2, 4 }, { WEDGE_OBLIQUE117, 6, 4 },
}
}
The wedge direction constants used above are defined as follows:
Constant  Value 

WEDGE_HORIZONTAL  0 
WEDGE_VERTICAL  1 
WEDGE_OBLIQUE27  2 
WEDGE_OBLIQUE63  3 
WEDGE_OBLIQUE117  4 
WEDGE_OBLIQUE153  5 
Difference weight mask process
The input to this process is:

an array preds containing the predicted samples,

variables w and h specifying the width and height of the region to be predicted.
This process prepares an array Mask containing the blending weights for the luma samples.
The process sets the array based on the difference between the two predictions as follows:
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
diff = Abs(preds[ 0 ][ i ][ j ]  preds[ 1 ][ i ][ j ])
diff = Round2(diff, (BitDepth  8) + InterPostRound)
m = Clip3(0, 64, 38 + diff / 16)
if ( mask_type )
Mask[ i ][ j ] = 64  m
else
Mask[ i ][ j ] = m
}
}
Intra mode variant mask process
The input to this process is:
 variables w and h specifying the width and height of the region to be predicted.
This process prepares an array Mask containing the blending weights for the luma samples.
The process sets the array based on the mode used for intra prediction as follows:
sizeScale = MAX_SB_SIZE / Max( h, w )
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
if ( interintra_mode == II_V_PRED ) {
Mask[ i ][ j ] = Ii_Weights_1d[ i * sizeScale ]
} else if ( interintra_mode == II_H_PRED ) {
Mask[ i ][ j ] = Ii_Weights_1d[ j * sizeScale ]
} else if ( interintra_mode == II_SMOOTH_PRED ) {
Mask[ i ][ j ] = Ii_Weights_1d[ Min(i, j) * sizeScale ]
} else {
Mask[ i ][ j ] = 32
}
}
}
where the table Ii_Weights_1d is defined as:
Ii_Weights_1d[MAX_SB_SIZE] = {
60, 58, 56, 54, 52, 50, 48, 47, 45, 44, 42, 41, 39, 38, 37, 35, 34, 33, 32,
31, 30, 29, 28, 27, 26, 25, 24, 23, 22, 22, 21, 20, 19, 19, 18, 18, 17, 16,
16, 15, 15, 14, 14, 13, 13, 12, 12, 12, 11, 11, 10, 10, 10, 9, 9, 9, 8,
8, 8, 8, 7, 7, 7, 7, 6, 6, 6, 6, 6, 5, 5, 5, 5, 5, 4, 4,
4, 4, 4, 4, 4, 4, 3, 3, 3, 3, 3, 3, 3, 3, 3, 2, 2, 2, 2,
2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 2, 1, 1, 1, 1, 1, 1, 1, 1,
1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
}
Mask blend process
The inputs to this process are

an array preds containing the predicted samples,

a variable plane specifying which plane is being predicted,

variables dstX and dstY specifying the location of the top left sample in the CurrFrame[ plane ] array of the region to be predicted,

variables w and h specifying the width and height of the region to be predicted.
The process combines two predictions according to the mask. It makes use of an array Mask containing the blending weights to apply (the weights are defined for the current plane samples if compound_type is equal to COMPOUND_INTRA, or the luma plane otherwise).
The variables subX and subY describing the subsampling of the current plane are derived as follows:

If plane is equal to 0, subX and subY are set equal to 0.

Otherwise (plane is not equal to 0), subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The process is specified as follows:
for ( y = 0; y < h; y++ ) {
for ( x = 0; x < w; x++ ) {
if ( (!subX && !subY) 
(interintra && !wedge_interintra) ) {
m = Mask[ y ][ x ]
} else if ( subX && !subY ) {
m = Round2( Mask[ y ][ 2*x ] + Mask[ y ][ 2*x+1 ], 1 )
} else if ( !subX && subY ) {
m = Round2( Mask[ 2*y ][ x ] + Mask[ 2*y+1 ][ x ], 1 )
} else {
m = Round2( Mask[ 2*y ][ 2*x ] + Mask[ 2*y ][ 2*x+1 ] +
Mask[ 2*y+1 ][ 2*x ] + Mask[ 2*y+1 ][ 2*x+1 ], 2 )
}
if ( interintra ) {
pred0 = Clip1( Round2( preds[ 0 ][ y ][ x ], InterPostRound ) )
pred1 = CurrFrame[plane][y+dstY][x+dstX]
CurrFrame[plane][y+dstY][x+dstX] = Round2( m * pred1 + (64  m) * pred0, 6 )
} else {
pred0 = preds[ 0 ][ y ][ x ]
pred1 = preds[ 1 ][ y ][ x ]
CurrFrame[plane][y+dstY][x+dstX] = Clip1( Round2( m * pred0 + (64  m) * pred1, 6 + InterPostRound ) )
}
}
}
Distance weights process
The inputs to this process are variables candRow and candCol specifying the location (in units of 4x4 blocks) of the motion vector information to be used.
This process computes weights to be used for blending predictions together based on the expected output times of the reference frames.
The weights are computed as follows:
for ( refList = 0; refList < 2; refList++ ) {
h = OrderHints[ RefFrames[ candRow ][ candCol ][ refList ] ]
dist[ refList ] = Clip3( 0, MAX_FRAME_DISTANCE, Abs( get_relative_dist( h, OrderHint ) ) )
}
d0 = dist[ 1 ]
d1 = dist[ 0 ]
order = d0 <= d1
if ( d0 == 0  d1 == 0 ) {
FwdWeight = Quant_Dist_Lookup[ 3 ][ order ]
BckWeight = Quant_Dist_Lookup[ 3 ][ 1  order ]
} else {
for ( i = 0; i < 3; i++ ) {
c0 = Quant_Dist_Weight[ i ][ order ]
c1 = Quant_Dist_Weight[ i ][ 1  order ]
if ( order ) {
if ( d0 * c0 > d1 * c1 )
break
} else {
if ( d0 * c0 < d1 * c1 )
break
}
}
FwdWeight = Quant_Dist_Lookup[ i ][ order ]
BckWeight = Quant_Dist_Lookup[ i ][ 1  order ]
}
where the tables Quant_Dist_Lookup and Quant_Dist_Weight are specified as:
Quant_Dist_Weight[ 4 ][ 2 ] = {
{ 2, 3 }, { 2, 5 }, { 2, 7 }, { 1, MAX_FRAME_DISTANCE }
}
Quant_Dist_Lookup[ 4 ][ 2 ] = {
{ 9, 7 }, { 11, 5 }, { 12, 4 }, { 13, 3 }
}
Palette prediction process
The palette prediction process is invoked for palette coded intra blocks to predict a part of the block using the limited palette.
The inputs to this process are:
 a variable plane specifying which plane is being predicted,
 variables startX and startY specifying the location of the top left sample in the CurrFrame[ plane ] array of the current transform block,
 variables x and y specifying the location in 4x4 units relative to the top left sample of the current transform block,
 a variable txSz, specifying the size of the current transform block.
The outputs of this process are palette predicted samples in the current frame CurrFrame.
The variable w specifying the width of the transform block is set equal to Tx_Width[ txSz ].
The variable h specifying the height of the transform block is set equal to Tx_Height[ txSz ].
The variable palette is specified as follows:
 If plane is 0, palette is set to palette_colors_y.
 Otherwise, if plane is 1, palette is set to palette_colors_u.
 Otherwise (plane is 2), palette is set to palette_colors_v.
The variable map is specified as follows:
 If plane is 0, map is set to ColorMapY.
 Otherwise (plane is not 0), map is set to ColorMapUV.
The current frame is updated as follows:
 CurrFrame[ plane ][ startY + i ][ startX + j ] is set equal to palette[ map[ y * 4 + i ][ x * 4 + j ] ] for i = 0..h1 and j = 0..w1.
Predict chroma from luma process
The chroma from luma process uses reconstructed luma samples to form a prediction for the chroma samples. The high frequencies are taken from the reconstructed luma samples and combined with DC predicted chroma samples.
The inputs to this process are:

a variable plane (greater than zero) specifying which plane is being predicted,

variables startX and startY specifying the location of the top left sample in the CurrFrame[ plane ] array of the current transform block,

a variable txSz, specifying the size of the current transform block.
The outputs of this process are modified chroma predicted samples in the current frame CurrFrame.
The variable w specifying the width of the transform block is set equal to Tx_Width[ txSz ].
The variable h specifying the height of the transform block is set equal to Tx_Height[ txSz ].
The variable subX is set equal to subsampling_x.
The variable subY is set equal to subsampling_y.
The variable alpha depends on the plane as follows:

If plane is equal to 1, alpha is set equal to CflAlphaU.

Otherwise (plane is equal to 2), alpha is set equal to CflAlphaV.
An array L (containing subsampled reconstructed luma samples with 3 fractional bits of precision) and lumaAvg (representing the average reconstructed luma intensity with 3 fractional bits of precision) is specified as:
lumaAvg = 0
for ( i = 0; i < h; i++ ) {
lumaY = (startY + i) << subY
lumaY = Min( lumaY, MaxLumaH  (1 << subY) )
for ( j = 0; j < w; j++ ) {
lumaX = (startX + j) << subX
lumaX = Min( lumaX, MaxLumaW  (1 << subX) )
t = 0
for ( dy = 0; dy <= subY; dy += 1 )
for ( dx = 0; dx <= subX; dx += 1 )
t += CurrFrame[ 0 ][ lumaY + dy ]
[ lumaX + dx ]
v = t << ( 3  subX  subY )
L[ i ][ j ] = v
lumaAvg += v
}
}
lumaAvg = Round2( lumaAvg, Tx_Width_Log2[ txSz ] + Tx_Height_Log2[ txSz ] )
The predicted chroma samples are specified as:
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
dc = CurrFrame[ plane ][ startY + i ][ startX + j ]
scaledLuma = Round2Signed( alpha * ( L[ i ][ j ]  lumaAvg ), 6 )
CurrFrame[ plane ][ startY + i ][ startX + j ] = Clip1( dc + scaledLuma )
}
}
Reconstruction and dequantization
General
This section details the process of reconstructing a block of coefficients using dequantization and inverse transforms.
Dequantization functions
This section defines the functions get_dc_quant and get_ac_quant that are needed by the dequantization process.
The quantization parameters are derived from lookup tables.
The function dc_q( b ) is specified as Dc_Qlookup[ (BitDepth8) >> 1 ][ Clip3( 0, 255, b ) ] where Dc_Qlookup is defined as follows:
Dc_Qlookup[ 3 ][ 256 ] = {
{
4, 8, 8, 9, 10, 11, 12, 12, 13, 14, 15, 16,
17, 18, 19, 19, 20, 21, 22, 23, 24, 25, 26, 26,
27, 28, 29, 30, 31, 32, 32, 33, 34, 35, 36, 37,
38, 38, 39, 40, 41, 42, 43, 43, 44, 45, 46, 47,
48, 48, 49, 50, 51, 52, 53, 53, 54, 55, 56, 57,
57, 58, 59, 60, 61, 62, 62, 63, 64, 65, 66, 66,
67, 68, 69, 70, 70, 71, 72, 73, 74, 74, 75, 76,
77, 78, 78, 79, 80, 81, 81, 82, 83, 84, 85, 85,
87, 88, 90, 92, 93, 95, 96, 98, 99, 101, 102, 104,
105, 107, 108, 110, 111, 113, 114, 116, 117, 118, 120, 121,
123, 125, 127, 129, 131, 134, 136, 138, 140, 142, 144, 146,
148, 150, 152, 154, 156, 158, 161, 164, 166, 169, 172, 174,
177, 180, 182, 185, 187, 190, 192, 195, 199, 202, 205, 208,
211, 214, 217, 220, 223, 226, 230, 233, 237, 240, 243, 247,
250, 253, 257, 261, 265, 269, 272, 276, 280, 284, 288, 292,
296, 300, 304, 309, 313, 317, 322, 326, 330, 335, 340, 344,
349, 354, 359, 364, 369, 374, 379, 384, 389, 395, 400, 406,
411, 417, 423, 429, 435, 441, 447, 454, 461, 467, 475, 482,
489, 497, 505, 513, 522, 530, 539, 549, 559, 569, 579, 590,
602, 614, 626, 640, 654, 668, 684, 700, 717, 736, 755, 775,
796, 819, 843, 869, 896, 925, 955, 988, 1022, 1058, 1098, 1139,
1184, 1232, 1282, 1336
},
{
4, 9, 10, 13, 15, 17, 20, 22, 25, 28, 31, 34,
37, 40, 43, 47, 50, 53, 57, 60, 64, 68, 71, 75,
78, 82, 86, 90, 93, 97, 101, 105, 109, 113, 116, 120,
124, 128, 132, 136, 140, 143, 147, 151, 155, 159, 163, 166,
170, 174, 178, 182, 185, 189, 193, 197, 200, 204, 208, 212,
215, 219, 223, 226, 230, 233, 237, 241, 244, 248, 251, 255,
259, 262, 266, 269, 273, 276, 280, 283, 287, 290, 293, 297,
300, 304, 307, 310, 314, 317, 321, 324, 327, 331, 334, 337,
343, 350, 356, 362, 369, 375, 381, 387, 394, 400, 406, 412,
418, 424, 430, 436, 442, 448, 454, 460, 466, 472, 478, 484,
490, 499, 507, 516, 525, 533, 542, 550, 559, 567, 576, 584,
592, 601, 609, 617, 625, 634, 644, 655, 666, 676, 687, 698,
708, 718, 729, 739, 749, 759, 770, 782, 795, 807, 819, 831,
844, 856, 868, 880, 891, 906, 920, 933, 947, 961, 975, 988,
1001, 1015, 1030, 1045, 1061, 1076, 1090, 1105, 1120, 1137, 1153, 1170,
1186, 1202, 1218, 1236, 1253, 1271, 1288, 1306, 1323, 1342, 1361, 1379,
1398, 1416, 1436, 1456, 1476, 1496, 1516, 1537, 1559, 1580, 1601, 1624,
1647, 1670, 1692, 1717, 1741, 1766, 1791, 1817, 1844, 1871, 1900, 1929,
1958, 1990, 2021, 2054, 2088, 2123, 2159, 2197, 2236, 2276, 2319, 2363,
2410, 2458, 2508, 2561, 2616, 2675, 2737, 2802, 2871, 2944, 3020, 3102,
3188, 3280, 3375, 3478, 3586, 3702, 3823, 3953, 4089, 4236, 4394, 4559,
4737, 4929, 5130, 5347
},
{
4, 12, 18, 25, 33, 41, 50, 60,
70, 80, 91, 103, 115, 127, 140, 153,
166, 180, 194, 208, 222, 237, 251, 266,
281, 296, 312, 327, 343, 358, 374, 390,
405, 421, 437, 453, 469, 484, 500, 516,
532, 548, 564, 580, 596, 611, 627, 643,
659, 674, 690, 706, 721, 737, 752, 768,
783, 798, 814, 829, 844, 859, 874, 889,
904, 919, 934, 949, 964, 978, 993, 1008,
1022, 1037, 1051, 1065, 1080, 1094, 1108, 1122,
1136, 1151, 1165, 1179, 1192, 1206, 1220, 1234,
1248, 1261, 1275, 1288, 1302, 1315, 1329, 1342,
1368, 1393, 1419, 1444, 1469, 1494, 1519, 1544,
1569, 1594, 1618, 1643, 1668, 1692, 1717, 1741,
1765, 1789, 1814, 1838, 1862, 1885, 1909, 1933,
1957, 1992, 2027, 2061, 2096, 2130, 2165, 2199,
2233, 2267, 2300, 2334, 2367, 2400, 2434, 2467,
2499, 2532, 2575, 2618, 2661, 2704, 2746, 2788,
2830, 2872, 2913, 2954, 2995, 3036, 3076, 3127,
3177, 3226, 3275, 3324, 3373, 3421, 3469, 3517,
3565, 3621, 3677, 3733, 3788, 3843, 3897, 3951,
4005, 4058, 4119, 4181, 4241, 4301, 4361, 4420,
4479, 4546, 4612, 4677, 4742, 4807, 4871, 4942,
5013, 5083, 5153, 5222, 5291, 5367, 5442, 5517,
5591, 5665, 5745, 5825, 5905, 5984, 6063, 6149,
6234, 6319, 6404, 6495, 6587, 6678, 6769, 6867,
6966, 7064, 7163, 7269, 7376, 7483, 7599, 7715,
7832, 7958, 8085, 8214, 8352, 8492, 8635, 8788,
8945, 9104, 9275, 9450, 9639, 9832, 10031, 10245,
10465, 10702, 10946, 11210, 11482, 11776, 12081, 12409,
12750, 13118, 13501, 13913, 14343, 14807, 15290, 15812,
16356, 16943, 17575, 18237, 18949, 19718, 20521, 21387
}
}
The function ac_q( b ) is specified as Ac_Qlookup[ (BitDepth8) >> 1 ][ Clip3( 0, 255, b ) ] where Ac_Qlookup is defined as follows:
Ac_Qlookup[ 3 ][ 256 ] = {
{
4, 8, 9, 10, 11, 12, 13, 14, 15, 16, 17, 18,
19, 20, 21, 22, 23, 24, 25, 26, 27, 28, 29, 30,
31, 32, 33, 34, 35, 36, 37, 38, 39, 40, 41, 42,
43, 44, 45, 46, 47, 48, 49, 50, 51, 52, 53, 54,
55, 56, 57, 58, 59, 60, 61, 62, 63, 64, 65, 66,
67, 68, 69, 70, 71, 72, 73, 74, 75, 76, 77, 78,
79, 80, 81, 82, 83, 84, 85, 86, 87, 88, 89, 90,
91, 92, 93, 94, 95, 96, 97, 98, 99, 100, 101, 102,
104, 106, 108, 110, 112, 114, 116, 118, 120, 122, 124, 126,
128, 130, 132, 134, 136, 138, 140, 142, 144, 146, 148, 150,
152, 155, 158, 161, 164, 167, 170, 173, 176, 179, 182, 185,
188, 191, 194, 197, 200, 203, 207, 211, 215, 219, 223, 227,
231, 235, 239, 243, 247, 251, 255, 260, 265, 270, 275, 280,
285, 290, 295, 300, 305, 311, 317, 323, 329, 335, 341, 347,
353, 359, 366, 373, 380, 387, 394, 401, 408, 416, 424, 432,
440, 448, 456, 465, 474, 483, 492, 501, 510, 520, 530, 540,
550, 560, 571, 582, 593, 604, 615, 627, 639, 651, 663, 676,
689, 702, 715, 729, 743, 757, 771, 786, 801, 816, 832, 848,
864, 881, 898, 915, 933, 951, 969, 988, 1007, 1026, 1046, 1066,
1087, 1108, 1129, 1151, 1173, 1196, 1219, 1243, 1267, 1292, 1317, 1343,
1369, 1396, 1423, 1451, 1479, 1508, 1537, 1567, 1597, 1628, 1660, 1692,
1725, 1759, 1793, 1828
},
{
4, 9, 11, 13, 16, 18, 21, 24, 27, 30, 33, 37,
40, 44, 48, 51, 55, 59, 63, 67, 71, 75, 79, 83,
88, 92, 96, 100, 105, 109, 114, 118, 122, 127, 131, 136,
140, 145, 149, 154, 158, 163, 168, 172, 177, 181, 186, 190,
195, 199, 204, 208, 213, 217, 222, 226, 231, 235, 240, 244,
249, 253, 258, 262, 267, 271, 275, 280, 284, 289, 293, 297,
302, 306, 311, 315, 319, 324, 328, 332, 337, 341, 345, 349,
354, 358, 362, 367, 371, 375, 379, 384, 388, 392, 396, 401,
409, 417, 425, 433, 441, 449, 458, 466, 474, 482, 490, 498,
506, 514, 523, 531, 539, 547, 555, 563, 571, 579, 588, 596,
604, 616, 628, 640, 652, 664, 676, 688, 700, 713, 725, 737,
749, 761, 773, 785, 797, 809, 825, 841, 857, 873, 889, 905,
922, 938, 954, 970, 986, 1002, 1018, 1038, 1058, 1078, 1098, 1118,
1138, 1158, 1178, 1198, 1218, 1242, 1266, 1290, 1314, 1338, 1362, 1386,
1411, 1435, 1463, 1491, 1519, 1547, 1575, 1603, 1631, 1663, 1695, 1727,
1759, 1791, 1823, 1859, 1895, 1931, 1967, 2003, 2039, 2079, 2119, 2159,
2199, 2239, 2283, 2327, 2371, 2415, 2459, 2507, 2555, 2603, 2651, 2703,
2755, 2807, 2859, 2915, 2971, 3027, 3083, 3143, 3203, 3263, 3327, 3391,
3455, 3523, 3591, 3659, 3731, 3803, 3876, 3952, 4028, 4104, 4184, 4264,
4348, 4432, 4516, 4604, 4692, 4784, 4876, 4972, 5068, 5168, 5268, 5372,
5476, 5584, 5692, 5804, 5916, 6032, 6148, 6268, 6388, 6512, 6640, 6768,
6900, 7036, 7172, 7312
},
{
4, 13, 19, 27, 35, 44, 54, 64,
75, 87, 99, 112, 126, 139, 154, 168,
183, 199, 214, 230, 247, 263, 280, 297,
314, 331, 349, 366, 384, 402, 420, 438,
456, 475, 493, 511, 530, 548, 567, 586,
604, 623, 642, 660, 679, 698, 716, 735,
753, 772, 791, 809, 828, 846, 865, 884,
902, 920, 939, 957, 976, 994, 1012, 1030,
1049, 1067, 1085, 1103, 1121, 1139, 1157, 1175,
1193, 1211, 1229, 1246, 1264, 1282, 1299, 1317,
1335, 1352, 1370, 1387, 1405, 1422, 1440, 1457,
1474, 1491, 1509, 1526, 1543, 1560, 1577, 1595,
1627, 1660, 1693, 1725, 1758, 1791, 1824, 1856,
1889, 1922, 1954, 1987, 2020, 2052, 2085, 2118,
2150, 2183, 2216, 2248, 2281, 2313, 2346, 2378,
2411, 2459, 2508, 2556, 2605, 2653, 2701, 2750,
2798, 2847, 2895, 2943, 2992, 3040, 3088, 3137,
3185, 3234, 3298, 3362, 3426, 3491, 3555, 3619,
3684, 3748, 3812, 3876, 3941, 4005, 4069, 4149,
4230, 4310, 4390, 4470, 4550, 4631, 4711, 4791,
4871, 4967, 5064, 5160, 5256, 5352, 5448, 5544,
5641, 5737, 5849, 5961, 6073, 6185, 6297, 6410,
6522, 6650, 6778, 6906, 7034, 7162, 7290, 7435,
7579, 7723, 7867, 8011, 8155, 8315, 8475, 8635,
8795, 8956, 9132, 9308, 9484, 9660, 9836, 10028,
10220, 10412, 10604, 10812, 11020, 11228, 11437, 11661,
11885, 12109, 12333, 12573, 12813, 13053, 13309, 13565,
13821, 14093, 14365, 14637, 14925, 15213, 15502, 15806,
16110, 16414, 16734, 17054, 17390, 17726, 18062, 18414,
18766, 19134, 19502, 19886, 20270, 20670, 21070, 21486,
21902, 22334, 22766, 23214, 23662, 24126, 24590, 25070,
25551, 26047, 26559, 27071, 27599, 28143, 28687, 29247
}
}
The function get_qindex( ignoreDeltaQ, segmentId ) returns the quantizer index for the current block and is specified by the following:

If seg_feature_active_idx( segmentId, SEG_LVL_ALT_Q ) is equal to 1 the following ordered steps apply:

Set the variable data equal to FeatureData[ segmentId ][ SEG_LVL_ALT_Q ].

Set qindex equal to base_q_idx + data.

If ignoreDeltaQ is equal to 0 and delta_q_present is equal to 1, set qindex equal to CurrentQIndex + data.

Return Clip3( 0, 255, qindex ).


Otherwise, if ignoreDeltaQ is equal to 0 and delta_q_present is equal to 1, return CurrentQIndex.

Otherwise, return base_q_idx.
Note: When using both delta quantization and lossless segments, care should be taken that get_qindex returns 0 for the lossless segments. One approach is to set FeatureData[ segmentId ][ SEG_LVL_ALT_Q ] to 255 for the lossless segments.
The function get_dc_quant( plane ) returns the quantizer value for the dc coefficient for a particular plane and is derived as follows:

If plane is equal to 0, return dc_q( get_qindex( 0, segment_id ) + DeltaQYDc ).

Otherwise if plane is equal to 1, return dc_q( get_qindex( 0, segment_id ) + DeltaQUDc ).

Otherwise (plane is equal to 2), return dc_q( get_qindex( 0, segment_id ) + DeltaQVDc ).
The function get_ac_quant( plane ) returns the quantizer value for the ac coefficient for a particular plane and is derived as follows:

If plane is equal to 0, return ac_q( get_qindex( 0, segment_id ) ).

Otherwise if plane is equal to 1, return ac_q( get_qindex( 0, segment_id ) + DeltaQUAc ).

Otherwise (plane is equal to 2), return ac_q( get_qindex( 0, segment_id ) + DeltaQVAc ).
Reconstruct process
The reconstruct process is invoked to perform dequantization, inverse transform and reconstruction. This process is triggered at a point defined by a function call to reconstruct in the transform block syntax table described in Transform block syntax.
The inputs to this process are:

a variable plane specifying which plane is being reconstructed,

variables x and y specifying the location of the top left sample in the CurrFrame[ plane ] array of the current transform block,

a variable txSz, specifying the size of the transform block.
The outputs of this process are reconstructed samples in the current frame CurrFrame.
The reconstruction and dequantization process is defined as follows:
The variable dqDenom is derived as follows:

If txSz is equal to TX_32X32, TX_16X32, TX_32X16, TX_16X64, or TX_64X16, dqDenom is set equal to 2.

Otherwise, if txSz is equal to TX_64X64, TX_32X64, or TX_64X32, dqDenom is set equal to 4.

Otherwise, dqDenom is set equal to 1.
The variable log2W (specifying the base 2 logarithm of the width of the transform block) is set equal to Tx_Width_Log2[ txSz ].
The variable log2H (specifying the base 2 logarithm of the height of the transform block) is set equal to Tx_Height_Log2[ txSz ].
The variable w (specifying the width of the transform block) is set equal to 1 << log2W.
The variable h (specifying the height of the transform block) is set equal to 1 << log2H.
The variable tw is set equal to Min( 32, w ).
The variable th is set equal to Min( 32, h ).
The variable flipUD is derived as follows. If PlaneTxType is equal to one of FLIPADST_DCT, FLIPADST_ADST, V_FLIPADST, or FLIPADST_FLIPADST, flipUD is set equal to 1. Otherwise, flipUD is set equal to 0.
The variable flipLR is derived as follows. If PlaneTxType is equal to one of DCT_FLIPADST, ADST_FLIPADST, H_FLIPADST, or FLIPADST_FLIPADST, flipLR is set equal to 1. Otherwise, flipLR is set equal to 0.
The following ordered steps apply:

For i = 0..(th1), for j = 0..(tw1), the following ordered steps apply:
a. The variable q is derived as follows:

If i is equal to 0 and j is equal to 0, the variable q is set equal to get_dc_quant( plane ).

Otherwise (i, j or both are not equal to 0), the variable q is set equal to get_ac_quant( plane ).
b. The variable q2 is derived as follows:

If using_qmatrix is equal to 1, PlaneTxType is less than IDTX, and SegQMLevel[ plane ][ segment_id ] is less than 15, q2 is set equal to Round2( q * Quantizer_Matrix[ SegQMLevel[ plane ][ segment_id ] ][ plane > 0 ][ Qm_Offset[ txSz ] + i * tw + j ], 5 ).

Otherwise, q2 is set equal to q.
c. The variable dq is set equal to Quant[ i * tw + j ] * q2.
d. The variable sign is set equal to ( dq < 0 ) ? 1 : 1.
e. The variable dq2 is set equal to sign * ( Abs( dq ) & 0xFFFFFF ) / dqDenom.
f. Dequant[ i ][ j ] is set equal to Clip3(  ( 1 << ( 7 + BitDepth ) ), ( 1 << ( 7 + BitDepth ) )  1, dq2 ).


Invoke the 2D inverse transform block process defined in 2D inverse transform process with the variable txSz as input. The inverse transform outputs are stored in the Residual buffer.

For i = 0..(h1), for j = 0..(w1), the following applies:

The variable xx is set equal to flipLR ? ( w  j  1 ) : j.

The variable yy is set equal to flipUD ? ( h  i  1 ) : i.

CurrFrame[ plane ][ y + yy ][ x + xx ] is set equal to Clip1( CurrFrame[ plane ][ y + yy ][ x + xx ] + Residual[ i ][ j ] ).

If Lossless is equal to 1, it is a requirement of bitstream conformance that the values written into the Residual array in step 2 are representable by a signed integer with 1 + BitDepth bits.
Note: When Lossless is equal to 0, values written into Residual may not be representable by 1 + BitDepth bits (for example, due to quantization noise). The constraints in other parts of the specification ensure that the values will always be representable by a signed integer with Max( BitDepth + 5, 15 ) bits.
Inverse transform process
General
This section details the inverse transforms used during the reconstruction processes detailed in Reconstruction and dequantization.
1D transforms
Butterfly functions
This section defines the butterfly functions B and H used by the 1D transform processes.
The inverse transform process works by writing values into an array T.
The function brev(numBits, x) returns the bitreversal of numBits of x and is specified as follows:
brev( numBits, x ) {
t = 0
for ( i = 0; i < numBits; i++ ) {
bit = (x >> i) & 1
t += bit << (numBits  1  i)
}
return t
}
The function B( a, b, angle, 0, r ) performs a butterfly rotation specified by the following ordered steps:

The variable x is set equal to T[ a ] * cos128( angle )  T[ b ] * sin128( angle ).

The variable y is set equal to T[ a ] * sin128( angle ) + T[ b ] * cos128( angle ).

T[ a ] is set equal to Round2( x, 12 ).

T[ b ] is set equal to Round2( y, 12 ).
It is a requirement of bitstream conformance that the values saved into the array T by this function are representable by a signed integer using r bits of precision.
The function cos128( angle ) is specified for integer values of the input angle by the following ordered steps:

Set a variable angle2 equal to angle & 255.

If angle2 is greater than or equal to 0 and less than or equal to 64, return Cos128_Lookup[ angle2 ].

If angle2 is greater than 64 and less than or equal to 128, return Cos128_Lookup[ 128  angle2 ] * 1.

If angle2 is greater than 128 and less than or equal to 192, return Cos128_Lookup[ angle2  128 ] * 1.

Otherwise (if angle2 is greater than 192 and less than 256), return Cos128_Lookup[ 256  angle2 ].
Where Cos128_Lookup is a constant lookup table defined as:
Cos128_Lookup[ 65 ] = {
4096, 4095, 4091, 4085, 4076, 4065, 4052, 4036,
4017, 3996, 3973, 3948, 3920, 3889, 3857, 3822,
3784, 3745, 3703, 3659, 3612, 3564, 3513, 3461,
3406, 3349, 3290, 3229, 3166, 3102, 3035, 2967,
2896, 2824, 2751, 2675, 2598, 2520, 2440, 2359,
2276, 2191, 2106, 2019, 1931, 1842, 1751, 1660,
1567, 1474, 1380, 1285, 1189, 1092, 995, 897,
799, 700, 601, 501, 401, 301, 201, 101, 0
}
The function sin128( angle ) is defined to be cos128( angle  64 ).
Note: The cos128 function implements the expression 4096 * cos( angle * pi / 128 ) rounded to the nearest integer. The sin128 function implements the expression 4096 * sin( angle * pi / 128 ) rounded to the nearest integer.
When the angle is equal to 32 + 64 * k for integer k the butterfly rotation can be equivalently performed with two fewer multiplications (because the magnitude of cos128( 32 + 64 * k ) is always equal to that of sin128( 32 + 64 * k )) by the following process:

The variable v is set equal to (angle & 64) ? T[ a ] + T[ b ] : T[ a ]  T[ b ].

The variable w is set equal to (angle & 64) ? T[ a ] + T[ b ] : T[ a ] + T[ b ].

The variable x is set equal to v * cos128( angle ).

The variable y is set equal to w * cos128( angle ).

T[ a ] is set equal to Round2( x, 12 ).

T[ b ] is set equal to Round2( y, 12 ).
It is a requirement of bitstream conformance that the values saved into the array T by this function are representable by a signed integer using r bits of precision.
The function B( a ,b, angle, 1, r ) performs a butterfly rotation and flip specified by the following ordered steps:

The function B( a, b, angle, 0, r ) is invoked.

The contents of T[ a ] and T[ b ] are exchanged.
The function H( a, b, 0, r ) performs a Hadamard rotation specified by the following ordered steps:

The variable x is set equal to T[ a ].

The variable y is set equal to T[ b ].

T[ a ] is set equal to Clip3(  ( 1 << ( r  1 ) ), ( 1 << ( r  1 ) )  1, x + y ).

T[ b ] is set equal to Clip3(  ( 1 << ( r  1 ) ), ( 1 << ( r  1 ) )  1, x  y ).
The function H( a, b, 1, r ) performs a Hadamard rotation with flipped indices and is specified as follows:
 The function H( b, a, 0, r ) is invoked.
Inverse DCT array permutation process
This process performs an inplace permutation of the array T of length 2^{n} for 2 â‰¤ n â‰¤ 6 which is required before execution of the inverse DCT process.
The input to this process is a variable n that specifies the base 2 logarithm of the length of the input array.
A temporary array named copyT is set equal to T.
T[ i ] is set equal to copyT[ brev( n, i ) ] for i = 0..((1 << n)  1).
Inverse DCT process
This process performs an inplace inverse discrete cosine transform of the permuted array T which is of length 2^{n} for 2 â‰¤ n â‰¤ 6.
The inputs to this process are:

a variable n that specifies the base 2 logarithm of the length of the input array,

a variable r that specifies the intermediate clamping range.
The following ordered steps apply:

Invoke the inverse DCT permutation process as specified in Inverse DCT array permutation process with the input variable n.

If n is equal to 6, invoke B( 32 + i, 63  i, 63  4 * brev( 4, i ), 0, r ) for i = 0..15.

If n is greater than or equal to 5, invoke B( 16 + i, 31  i, 6 + ( brev( 3, 7  i ) << 3 ), 0, r ) for i = 0..7.

If n is equal to 6, invoke H( 32 + i * 2, 33 + i * 2, i & 1, r ) for i = 0..15.

If n is greater than or equal to 4, invoke B( 8 + i, 15  i, 12 + ( brev( 2, 3  i ) << 4 ), 0, r ) for i = 0..3.

If n is greater than or equal to 5, invoke H( 16 + 2 * i, 17 + 2 * i, i & 1, r ) for i = 0..7.

If n is equal to 6, invoke B( 62  i * 4  j, 33 + i * 4 + j, 60  16 * brev( 2, i ) + 64 * j, 1, r ) for i = 0..3, for j = 0..1.

If n is greater than or equal to 3, invoke B( 4 + i, 7  i, 56  32 * i, 0, r ) for i = 0..1.

If n is greater than or equal to 4, invoke H( 8 + 2 * i, 9 + 2 * i, i & 1, r ) for i = 0..3.

If n is greater than or equal to 5, invoke B( 30  4 * i  j, 17 + 4 * i + j, 24 + (j << 6) + ( ( 1  i ) << 5 ), 1, r ) for i = 0..1, for j=0..1.

If n is equal to 6, invoke H( 32 + i * 4 + j, 35 + i * 4  j, i & 1, r ) for i = 0..7, for j = 0..1.

Invoke B( 2 * i, 2 * i + 1, 32 + 16 * i, 1  i, r ) for i = 0..1.

If n is greater than or equal to 3, invoke H( 4 + 2 * i, 5 + 2 * i, i, r ) for i = 0..1.

If n is greater than or equal to 4, invoke B( 14  i, 9 + i, 48 + 64 * i, 1, r ) for i = 0..1.

If n is greater than or equal to 5, invoke H( 16 + 4 * i + j, 19 + 4 * i  j, i & 1, r ) for i = 0..3, for j = 0..1.

If n is equal to 6, invoke B( 61  i * 8  j, 34 + i * 8 + j, 56  i * 32 + ( j >> 1 ) * 64, 1, r ) for i = 0..1, for j = 0..3.

Invoke H( i, 3  i, 0, r ) for i = 0..1.

If n is greater than or equal to 3, invoke B( 6, 5, 32, 1, r ).

If n is greater than or equal to 4, invoke H( 8 + 4 * i + j, 11 + 4 * i  j, i, r ) for i = 0..1, for j = 0..1.

If n is greater than or equal to 5, invoke B( 29  i, 18 + i, 48 + ( i >> 1 ) * 64, 1, r ) for i = 0..3.

If n is equal to 6, invoke H( 32 + 8 * i + j, 39 + 8 * i  j, i & 1, r ) for i = 0..3, for j = 0..3.

If n is greater than or equal to 3, invoke H( i, 7  i, 0, r ) for i = 0..3.

If n is greater than or equal to 4, invoke B( 13  i, 10 + i, 32, 1, r ) for i = 0..1.

If n is greater than or equal to 5, invoke H( 16 + i * 8 + j, 23 + i * 8  j, i, r ) for i = 0..1, for j = 0..3.

If n is equal to 6, invoke B( 59  i, 36 + i, i < 4 ? 48 : 112, 1, r ) for i = 0..7.

If n is greater than or equal to 4, invoke H( i, 15  i, 0, r ) for i = 0..7.

If n is greater than or equal to 5, invoke B( 27  i, 20 + i, 32, 1, r ) for i = 0..3.

If n is equal to 6, the following steps apply for i = 0..7:

Invoke H( 32 + i, 47  i, 0, r ).

Invoke H( 48 + i, 63  i, 1, r ).


If n is greater than or equal to 5, invoke H( i, 31  i, 0, r ) for i = 0..15.

If n is equal to 6, invoke B( 55  i, 40 + i, 32, 1, r ) for i = 0..7.

If n is equal to 6, invoke H( i, 63  i, 0, r ) for i = 0..31.
Inverse ADST input array permutation process
This process performs the inplace permutation of the array T of length 2^{n} which is required as the first step of the inverse ADST, where 3 â‰¤ n â‰¤ 4.
The input to this process is a variable n that specifies the base 2 logarithm of the length of the input array.
The variable n0 is set equal to 1 << n.
A temporary array named copyT is set equal to T.
The following steps apply for i = 0..(n01):

The variable idx is set equal to ( i & 1 ) ? ( i  1 ) : ( n0  i  1 ).

T[ i ] is set equal to copyT[ idx ].
Inverse ADST output array permutation process
This process performs the inplace permutation of the array T of length 2^{n} which is required before the final step of the inverse ADST, where 3 â‰¤ n â‰¤ 4.
The input to this process is a variable n that specifies the base 2 logarithm of the length of the input array.
The variable n0 is set equal to 1 << n.
A temporary array named copyT is set equal to T.
The following steps apply for i = 0..(n01):

The variable a is set equal to ( ( i >> 3 ) & 1 ).

The variable b is set equal to ( ( i >> 2 ) & 1 ) ^ ( ( i >> 3 ) & 1 ).

The variable c is set equal to ( ( i >> 1 ) & 1 ) ^ ( ( i >> 2 ) & 1 ).

The variable d is set equal to ( i & 1 ) ^ ( ( i >> 1 ) & 1 ).

The variable idx is set equal to ( ( d << 3 )  ( c << 2 )  ( b << 1 )  a ) >> ( 4  n ).

T[ i ] is set equal to ( i & 1 ) ? (  copyT[ idx ] ) : copyT[ idx ].
Inverse ADST4 process
This process performs an inplace inverse ADST process on the array T of size 4.
The input to this process is a variable r, specifying the intermediate clamping range.
The following applies:
s[ 0 ] = SINPI_1_9 * T[ 0 ]
s[ 1 ] = SINPI_2_9 * T[ 0 ]
s[ 2 ] = SINPI_3_9 * T[ 1 ]
s[ 3 ] = SINPI_4_9 * T[ 2 ]
s[ 4 ] = SINPI_1_9 * T[ 2 ]
s[ 5 ] = SINPI_2_9 * T[ 3 ]
s[ 6 ] = SINPI_4_9 * T[ 3 ]
a7 = T[ 0 ]  T[ 2 ]
b7 = a7 + T[ 3 ]
s[ 0 ] = s[ 0 ] + s[ 3 ]
s[ 1 ] = s[ 1 ]  s[ 4 ]
s[ 3 ] = s[ 2 ]
s[ 2 ] = SINPI_3_9 * b7
s[ 0 ] = s[ 0 ] + s[ 5 ]
s[ 1 ] = s[ 1 ]  s[ 6 ]
x[ 0 ] = s[ 0 ] + s[ 3 ]
x[ 1 ] = s[ 1 ] + s[ 3 ]
x[ 2 ] = s[ 2 ]
x[ 3 ] = s[ 0 ] + s[ 1 ]
x[ 3 ] = x[ 3 ]  s[ 3 ]
T[ 0 ] = Round2( x[ 0 ], 12 )
T[ 1 ] = Round2( x[ 1 ], 12 )
T[ 2 ] = Round2( x[ 2 ], 12 )
T[ 3 ] = Round2( x[ 3 ], 12 )
where the constants used are defined as follows:
Symbol  Value 

SINPI_1_9  1321 
SINPI_2_9  2482 
SINPI_3_9  3344 
SINPI_4_9  3803 
It is a requirement of bitstream conformance that all values stored in the s and x arrays by this process are representable by a signed integer using r + 12 bits of precision.
It is a requirement of bitstream conformance that values stored in the variable a7 by this process are representable by a signed integer using r + 1 bits of precision.
It is a requirement of bitstream conformance that values stored in the variable b7 by this process are representable by a signed integer using r bits of precision.
Inverse ADST8 process
This process performs an inplace inverse ADST process on the array T of size 8.
The input to this process is a variable r, specifying the intermediate clamping range.
The following ordered steps apply:

Invoke the ADST input array permutation process specified in Inverse ADST input array permutation process with the input variable n set to 3.

Invoke B( 2 * i, 2 * i + 1, 60  16 * i, 1, r ) for i = 0..3.

Invoke H( i, 4 + i, 0, r ) for i = 0..3.

Invoke B( 4 + 3 * i, 5 + i, 48  32 * i, 1, r ) for i = 0..1.

Invoke H( 4 * j + i, 2 + 4 * j + i, 0, r ) for i = 0..1, for j = 0..1.

Invoke B( 2 + 4 * i, 3 + 4 * i, 32, 1, r ) for i = 0..1.

Invoke the ADST output array permutation process specified in Inverse ADST output array permutation process with the input variable n set to 3.
Inverse ADST16 process
This process performs an inplace inverse ADST process on the array T of size 16.
The input to this process is a variable r, specifying the intermediate clamping range.
The following ordered steps apply:

Invoke the ADST input array permutation process specified in Inverse ADST input array permutation process with the input variable n set to 4.

Invoke B( 2 * i, 2 * i + 1, 62  8 * i, 1, r ) for i = 0..7.

Invoke H( i, 8 + i, 0, r ) for i = 0..7.

Invoke B( 8 + 2 * i, 9 + 2 * i, 56  32 * i, 1, r ) and B( 13 + 2 * i, 12 + 2 * i, 8 + 32 * i, 1, r ) for i = 0..1.

Invoke H( 8 * j + i, 4 + 8 * j + i, 0, r ) for i = 0..3, for j = 0..1.

Invoke B( 4 + 8 * j + 3 * i, 5 + 8 * j + i, 48  32 * i, 1, r ) for i = 0..1, for j = 0..1.

Invoke H( 4 * j + i, 2 + 4 * j + i, 0, r ) for i = 0..1, for j = 0..3.

Invoke B( 2 + 4 * i, 3 + 4 * i, 32, 1, r ) for i = 0..3.

Invoke the ADST output array permutation process specified in Inverse ADST output array permutation process with the input variable n set to 4.
Inverse ADST process
This process performs an inplace inverse ADST process on the array T of size 2^{n} for 2 â‰¤ n â‰¤ 4.
The inputs to this process are:

a variable n that specifies the base 2 logarithm of the length of the input array.

a variable r that specifies the intermediate clamping range.
The following steps apply:

If n is equal to 2, invoke the inverse ADST4 process in Inverse ADST4 process, with the input variable r.

Otherwise, if n is equal to 3, invoke the inverse ADST8 process in Inverse ADST8 process, with the input variable r.

Otherwise (n is equal to 4), invoke the inverse ADST16 process in Inverse ADST16 process, with the input variable r.
Inverse WalshHadamard transform process
The input to this process is a variable shift that specifies the amount of prescaling.
This process does an inplace transform of the array T (of length 4) by the following ordered steps:
a = T[ 0 ] >> shift
c = T[ 1 ] >> shift
d = T[ 2 ] >> shift
b = T[ 3 ] >> shift
a += c
d = b
e = (a  d) >> 1
b = e  b
c = e  c
a = b
d += c
T[ 0 ] = a
T[ 1 ] = b
T[ 2 ] = c
T[ 3 ] = d
Inverse identity transform 4 process
The process does an inplace transform of the array T (of length 4) by the following calculation for i = 0..3:
T[ i ] = Round2( T[ i ] * 5793, 12 )
Inverse identity transform 8 process
The process does an inplace transform of the array T (of length 8) by the following calculation for i = 0..7:
T[ i ] = T[ i ] * 2
Inverse identity transform 16 process
The process does an inplace transform of the array T (of length 16) by the following calculation for i = 0..15:
T[ i ] = Round2( T[ i ] * 11586, 12 )
Inverse identity transform 32 process
The process does an inplace transform of the array T (of length 32) by the following calculation for i = 0..31:
T[ i ] = T[ i ] * 4
Inverse identity transform process
This process performs an inplace identity transform process (with a sizedependent scaling factor) on the array T of size 2^{n} for 2 â‰¤ n â‰¤ 5.
The input to this process is a variable n that specifies the base 2 logarithm of the length of the input array.
The process to invoke depends on n as follows:

If n is equal to 2, invoke the inverse identity transform 4 process in Inverse identity transform 4 process.

Otherwise, if n is equal to 3, invoke the inverse identity transform 8 process in Inverse identity transform 8 process.

Otherwise, if n is equal to 4, invoke the inverse identity transform 16 process in Inverse identity transform 16 process.

Otherwise (n is equal to 5), invoke the inverse identity transform 32 process in Inverse identity transform 32 process.
2D inverse transform process
This process performs a 2D inverse transform for an array of coefficients stored in the 2D array Dequant. The output is placed in the 2D array Residual.
The input to this process is a variable txSz that specifies the transform size.
Set the variable log2W equal to Tx_Width_Log2[ txSz ].
Set the variable log2H equal to Tx_Height_Log2[ txSz ].
Set the variable w equal to 1 << log2W.
Set the variable h equal to 1 << log2H.
Set the variable rowShift equal to Lossless ? 0 : Transform_Row_Shift[ txSz ].
Set the variable colShift equal to Lossless ? 0 : 4.
Set the variable rowClampRange equal to BitDepth + 8.
Set the variable colClampRange equal to Max( BitDepth + 6, 16 ).
The row transforms with i = 0..(h1) are applied as follows:

T[ j ] is derived as follows for j = 0..(w1):

If i and j are both less than 32, T[ j ] is set equal to Dequant[ i ][ j ].

Otherwise, T[ j ] is set equal to 0.


If Abs( log2W  log2H ) is equal to 1, T[ j ] is set equal to Round2( T[ j ] * 2896, 12 ) for j = 0..(w1).

If Lossless is equal to 1, invoke the Inverse WHT process as specified in Inverse WalshHadamard transform process with shift equal to 2.

Otherwise, if PlaneTxType is equal to one of DCT_DCT, ADST_DCT, FLIPADST_DCT or H_DCT, invoke the inverse DCT process as specified in Inverse DCT process with the input variable n equal to log2W and the input variable r equal to rowClampRange.

Otherwise, if PlaneTxType is equal to one of DCT_ADST, ADST_ADST, DCT_FLIPADST, FLIPADST_FLIPADST, ADST_FLIPADST, FLIPADST_ADST, H_ADST, or H_FLIPADST, invoke the inverse ADST process as specified in Inverse ADST process with input variable n equal to log2W and the input variable r equal to rowClampRange.

Otherwise, invoke the inverse identity transform process specified in Inverse identity transform process with the input variable n equal to log2W.

Set Residual[ i ][ j ] equal to Round2( T[ j ], rowShift ) for j = 0..(w1).
Between the row and column transforms, Residual[ i ][ j ] is set equal to Clip3(  ( 1 << ( colClampRange  1 ) ), ( 1 << ( colClampRange  1 ) )  1, Residual[ i ][ j ] ) for i = 0..(h1), for j = 0..(w1).
The column transforms with j = 0..(w1) are applied as follows:

Set T[ i ] equal to Residual[ i ][ j ] for i = 0..(h1).

If Lossless is equal to 1, invoke the Inverse WHT process as specified in Inverse WalshHadamard transform process with shift equal to 0.

Otherwise, if PlaneTxType is equal to one of DCT_DCT, DCT_ADST, DCT_FLIPADST or V_DCT, invoke the inverse DCT process as specified in Inverse DCT process with the input variable n equal to log2H and the input variable r equal to colClampRange.

Otherwise, if PlaneTxType is equal to one of ADST_DCT, ADST_ADST, FLIPADST_DCT, FLIPADST_FLIPADST, ADST_FLIPADST, FLIPADST_ADST, V_ADST, or V_FLIPADST, invoke the inverse ADST process as specified in Inverse ADST process with input variable n equal to log2H and the input variable r equal to colClampRange.

Otherwise, invoke the inverse identity transform process specified in Inverse identity transform process with the input variable n equal to log2H.

Residual[ i ][ j ] is set equal to Round2( T[ i ], colShift ) for i = 0..(h1).
where Transform_Row_Shift is defined as:
Transform_Row_Shift[ TX_SIZES_ALL ] = {
0, 1, 2, 2, 2, 0, 0, 1, 1,
1, 1, 1, 1, 1, 1, 2, 2, 2, 2
}
Loop filter process
General
Input to this process is the array CurrFrame of reconstructed samples.
Output from this process is a modified array CurrFrame containing deblocked samples.
The purpose of the loop filter is to eliminate (or at least reduce) visually objectionable artifacts associated with the semiindependence of the coding of super blocks and their constituent subblocks.
The loop filter is applied on all vertical boundaries followed by all horizontal boundaries as follows:
for ( plane = 0; plane < NumPlanes; plane++ ) {
if ( plane == 0 
loop_filter_level[ 1 + plane ] ) {
for ( pass = 0; pass < 2; pass++ ) {
rowStep = ( plane == 0 ) ? 1 : ( 1 << subsampling_y )
colStep = ( plane == 0 ) ? 1 : ( 1 << subsampling_x )
for ( row = 0; row < MiRows; row += rowStep )
for ( col = 0; col < MiCols; col += colStep )
loop_filter_edge( plane, pass, row, col )
}
}
}
When the function loop_filter_edge is called, the edge loop filter process specified in Edge loop filter process is invoked with the variables plane, pass, row, and col as inputs.
Note: The loop filter is an integral part of the decoding process, in that the results of loop filtering are used in the prediction of subsequent frames.
Note: The loop filtering is designed so that any order of filtering for the edges will give identical results, provided that the vertical boundaries are filtered before the horizontal boundaries.
Edge loop filter process
The inputs to this process are:

a variable plane specifying whether the process is filtering Y, U, or V samples,

a variable pass specifying the direction of the edges. pass equal to 0 means the process is filtering vertical block boundaries, and pass equal to 1 means the process is filtering horizontal block boundaries,

variables row and col specifying the location of the edge in units of 4x4 blocks in the luma plane.
The outputs of this process are modified values in the array CurrFrame.
The variables subX and subY describing the subsampling of the current plane are derived as follows:

If plane is equal to 0, subX and subY are set equal to 0.

Otherwise (plane is not equal to 0), subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The variables dx and dy are derived as follows:

If pass is equal to 0, then dx is set equal to 1, dy is set equal to 0.

Otherwise (pass is equal to 1), dy is set equal to 1, dx is set equal to 0.
dx and dy specify the offset between the samples to be filtered.
The variable x is set equal to col * MI_SIZE.
The variable y is set equal to row * MI_SIZE.
x and y contain the location in luma coordinates.
The variables row and col are adjusted as follows:

row is set equal to ( row  subY )

col is set equal to ( col  subX )
The variable onScreen (equal to 1 if the samples on both sides of the boundary lie in the visible area) is derived as follows:

If x is greater than or equal to FrameWidth, onScreen is set equal to 0.

Otherwise, if y is greater than or equal to FrameHeight, onScreen is set equal to 0.

Otherwise, if pass is equal to 0 and x is equal to 0, onScreen is set equal to 0.

Otherwise, if pass is equal to 1 and y is equal to 0, onScreen is set equal to 0.

Otherwise, onScreen is set equal to 1.
If onScreen is equal to 0, then this process immediately returns and no filtering is applied to this edge.
The variables xP and yP (containing the location in the current plane) are derived as follows:

Set xP equal to x >> subX

Set yP equal to y >> subY
The variables prevRow and prevCol (containing the location of the mode info block on the other side of the boundary) are derived as follows:

Set prevRow equal to row  ( dy << subY )

Set prevCol equal to col  ( dx << subX )
Set the variable MiSize equal to MiSizes[ row ][ col ].
Set the variable txSz equal to LoopfilterTxSizes[ plane ][ row >> subY ][ col >> subX ].
Set the variable planeSize equal to get_plane_residual_size( MiSize, plane )
Set the variable skip equal to Skips[ row ][ col ].
Set the variable isIntra equal to RefFrames[ row ][ col ][ 0 ] <= INTRA_FRAME.
Set the variable prevTxSz equal to LoopfilterTxSizes[ plane ][ prevRow >> subY ][ prevCol >> subX ].
The variable isBlockEdge (equal to 1 if the samples cross a prediction block edge) is derived as follows:

If pass is equal to 0 and xP is an exact multiple of Block_Width[ planeSize ], isBlockEdge is set equal to 1.

Otherwise, if pass is equal to 1 and yP is an exact multiple of Block_Height[ planeSize ], isBlockEdge is set equal to 1.

Otherwise, isBlockEdge is set equal to 0.
The variable isTxEdge (equal to 1 if the samples cross a transform block edge) is derived as follows:

If pass is equal to 0 and xP is an exact multiple of Tx_Width[ txSz ], isTxEdge is set equal to 1.

Otherwise, if pass is equal to 1 and yP is an exact multiple of Tx_Height[ txSz ], isTxEdge is set equal to 1.

Otherwise, isTxEdge is set equal to 0.
The variable applyFilter (equal to 1 if the samples are filtered) is derived as follows:

If isTxEdge is equal to 0, applyFilter is set equal to 0.

Otherwise, if isBlockEdge is equal to 1 or skip is equal to 0 or isIntra is equal to 1, applyFilter is set equal to 1.

Otherwise applyFilter is set equal to 0.
The filter size process specified in Filter size process is invoked with the inputs txSz, prevTxSz, pass, and plane, and the output assigned to the variable filterSize (containing the maximum filter size that can be used).
The adaptive filter strength process specified in Adaptive filter strength process is invoked with the inputs row, col, plane, and pass, and the output assigned to the variables lvl, limit, blimit, and thresh.
If lvl is equal to 0, the adaptive filter strength process specified in Adaptive filter strength process is invoked with the inputs prevRow, prevCol, plane, and pass, and the output assigned to the variables lvl, limit, blimit, and thresh.
For the variable i taking values from 0 to MI_SIZE  1, the following applies:
 If applyFilter is equal to 1 and lvl is greater than zero, the sample filtering process specified in Sample filtering process is invoked with the input variable x set equal to xP + dy * i, the input variable y set equal to yP + dx * i, and the variables plane, limit, blimit, thresh, dx, dy, and filterSize supplied as inputs.
Note: the vector (dx,dy) represents the direction of the filter, while (dy,dx) represents the direction of the boundary.
Filter size process
The inputs to this process are:

a variable txSz specifying the size of the transform block,

a variable prevTxSz specifying the size of the transform block on the other side of the boundary,

a variable pass specifying the direction of the edges,

a variable plane specifying whether the process is filtering Y, U, or V samples.
The output of this process is the variable filterSize containing the maximum filter size that can be used in samples.
The purpose of this process is to reduce the width of the chroma filters and to ensure that different boundaries can be filtered in parallel.
The variable baseSize is derived as follows:

If pass is equal to 0, baseSize is set equal to Min( Tx_Width[ prevTxSz ], Tx_Width[ txSz ] ),

Otherwise (pass is equal to 1), baseSize is set equal to Min( Tx_Height[ prevTxSz ], Tx_Height[ txSz ] ).
The output variable filterSize is derived as follows:

If plane is equal to 0, filterSize is set equal to Min( 16, baseSize ),

Otherwise, (plane is greater than 0), filterSize is set equal to Min( 8, baseSize ).
Adaptive filter strength process
The inputs to this process are:

the variables row and col specifying the luma location in units of 4x4 blocks,

the variable plane specifying whether the process is filtering Y, U or V samples,

the variable pass specifying the direction of the edge being filtered. pass equal to 0 means the process is filtering vertical block boundaries, and pass equal to 1 means the process is filtering horizontal block boundaries.
The outputs of this process are the variables lvl, limit, blimit, and thresh.
The output variable lvl is derived as follows:

The variable segment is set equal to SegmentIds[ row ][ col ].

The variable ref is set equal to RefFrames[ row ][ col ][ 0 ].

The variable mode is set equal to YModes[ row ][ col ].

The variable modeType is derived as follows:

If mode is greater than or equal to NEARESTMV, and not equal to GLOBALMV, and not equal to GLOBAL_GLOBALMV, modeType is set equal to 1.

Otherwise (if mode is an intra type or GLOBALMV or GLOBAL_GLOBALMV), modeType is set equal to 0.


The variable deltaLF is derived as follows:

If delta_lf_multi is equal to 0, deltaLF is set equal to DeltaLFs[ row ][ col ][ 0 ].

Otherwise (delta_lf_multi is equal to 1), deltaLF is set equal to DeltaLFs[ row ][ col ][ ( plane == 0 ) ? pass : ( plane + 1 ) ].


The adaptive filter strength selection process specified in [Adaptive filter strength selection process] is invoked, with segment, ref, modeType, deltaLF, plane, and pass as inputs, and the output being the output variable lvl.
The variable shift is derived as follows:

If loop_filter_sharpness is greater than 4, shift is set equal to 2.

Otherwise, if loop_filter_sharpness is greater than 0, shift is set equal to 1.

Otherwise, shift is set equal to 0.
The output variable limit is derived as follows:

If loop_filter_sharpness is greater than 0, limit is set equal to Clip3( 1, 9  loop_filter_sharpness, lvl >> shift ).

Otherwise, limit is set equal to Max( 1, lvl >> shift ).
The output variable blimit is set equal to 2 * (lvl + 2) + limit.
The output variable thresh is set equal to lvl >> 4.
Adaptive filter strength selection process
The inputs to this process are:

The variable segment, specifying the current segment id,

The variable ref, specifying the reference frame type (INTRA_FRAME, LAST_FRAME, etc.),

The variable modeType, specifying the loop filter mode type,

The variable deltaLF, specifying the loop filter delta value,

The variable plane, specifying whether the process is filtering Y, U or V samples,

The variable pass, specifying the direction of the edge being filtered. pass equal to 0 means the process is filtering vertical block boundaries, and pass equal to 1 means the process is filtering horizontal block boundaries.
The output of this process is a filter strength level.
This process is invoked to select a loop filter strength level.
The variable i is set equal to ( plane == 0 ) ? pass : ( plane + 1 ).
The variable baseFilterLevel is set equal to Clip3( 0, MAX_LOOP_FILTER, deltaLF + loop_filter_level[ i ] ).
The following ordered steps apply:

The variable lvlSeg is set equal to baseFilterLevel.

The variable feature is set equal to SEG_LVL_ALT_LF_Y_V + i.

If seg_feature_active_idx( segment, feature ) is equal to 1 the following ordered steps apply:
a. lvlSeg is set equal to FeatureData[ segment ][ feature ] + lvlSeg.
b. lvlSeg is set equal to Clip3( 0, MAX_LOOP_FILTER, lvlSeg ).

If loop_filter_delta_enabled is equal to 1, then the following ordered steps apply:
a. The variable nShift is set equal to lvlSeg >> 5.
b. If ref is equal to INTRA_FRAME, then lvlSeg is set equal to lvlSeg + ( loop_filter_ref_deltas[ INTRA_FRAME ] << nShift ).
c. Otherwise, if ref is not equal to INTRA_FRAME, then lvlSeg is set equal to lvlSeg + ( loop_filter_ref_deltas[ ref ] << nShift ) + ( loop_filter_mode_deltas[ modeType ] << nShift ).
d. lvlSeg is set equal to Clip3(0, MAX_LOOP_FILTER, lvlSeg).

Return lvlSeg.
Sample filtering process
General
The inputs to this process are:

variables x and y specifying the location within CurrFrame[ plane ],

a variable plane specifying whether the block is the Y, U or V plane,

variables limit, blimit, thresh that specify the strength of the filtering operation,

variables dx and dy specifying the direction perpendicular to the edge being filtered,

a variable filterSize of specifying the maximum size of filter allowed.
The outputs of this process are modified values in the array CurrFrame.
First the filter mask process specified in Filter mask process is invoked with the inputs x, y, plane, limit, blimit, thresh, dx, dy, and filterSize, and the output is assigned to the variables hevMask, filterMask, flatMask, and flatMask2.
Then the appropriate filter process is invoked with the inputs x, y, plane, dx, dy as follows:

If filterMask is equal to 0, no filter is invoked.

Otherwise, if filterSize is equal to 4 or flatMask is equal to 0, the narrow filter process specified in Narrow filter process is invoked with the additional input variable hevMask.

Otherwise, if filterSize is equal to 8 or flatMask2 is equal to 0, the wide filter process specified in Wide filter process is invoked with the additional input variable log2Size set to 3.

Otherwise, the wide filter process specified in Wide filter process is invoked with the additional input variable log2Size set to 4.
Filter mask process
The inputs to this process are:

variables x and y specifying the location within CurrFrame[ plane ],

a variable plane specifying whether the block is the Y, U or V plane,

variables limit, blimit, thresh that specify the strength of the filtering operation,

variables dx and dy specifying the direction perpendicular to the edge being filtered,

a variable filterSize of specifying the maximum size of filter allowed.
The outputs from this process are the variables:

hevMask,

filterMask,

flatMask, (only used if filterSize >= 8),

flatMask2 (only used if filterSize >= 16).
The values output for these masks depend on the differences between samples on either side of the specified boundary. These samples are specified as follows:
q0 = CurrFrame[ plane ][ y ][ x ]
q1 = CurrFrame[ plane ][ y + dy ][ x + dx ]
q2 = CurrFrame[ plane ][ y + dy * 2 ][ x + dx * 2 ]
q3 = CurrFrame[ plane ][ y + dy * 3 ][ x + dx * 3 ]
q4 = CurrFrame[ plane ][ y + dy * 4 ][ x + dx * 4 ]
q5 = CurrFrame[ plane ][ y + dy * 5 ][ x + dx * 5 ]
q6 = CurrFrame[ plane ][ y + dy * 6 ][ x + dx * 6 ]
p0 = CurrFrame[ plane ][ y  dy ][ x  dx ]
p1 = CurrFrame[ plane ][ y  dy * 2 ][ x  dx * 2 ]
p2 = CurrFrame[ plane ][ y  dy * 3 ][ x  dx * 3 ]
p3 = CurrFrame[ plane ][ y  dy * 4 ][ x  dx * 4 ]
p4 = CurrFrame[ plane ][ y  dy * 5 ][ x  dx * 5 ]
p5 = CurrFrame[ plane ][ y  dy * 6 ][ x  dx * 6 ]
p6 = CurrFrame[ plane ][ y  dy * 7 ][ x  dx * 7 ]
Note: Samples q4, q5, q6, p4, p5, and p6 are only used if filterSize is equal to 16.
The value of hevMask indicates whether the sample has high edge variance. It is calculated as follows:
hevMask = 0
threshBd = thresh << (BitDepth  8)
hevMask = (Abs( p1  p0 ) > threshBd)
hevMask = (Abs( q1  q0 ) > threshBd)
The variable filterLen, representing the number of taps each side of the central sample in the filter, is derived as follows:

If filterSize is equal to 4, filterLen is set equal to 4.

Otherwise, if plane is not equal to 0, filterLen is set equal to 6.

Otherwise, if filterSize is equal to 8, filterLen is set equal to 8.

Otherwise, filterLen is set equal to 16.
The value of filterMask indicates whether adjacent samples close to the edge (within four samples either side of the specified boundary) vary by less than the limits given by limit and blimit. It is used to determine if any filtering should occur and is calculated as follows:
limitBd = limit << (BitDepth  8)
blimitBd = blimit << (BitDepth  8)
mask = 0
mask = (Abs( p1  p0 ) > limitBd)
mask = (Abs( q1  q0 ) > limitBd)
mask = (Abs( p0  q0 ) * 2 + Abs( p1  q1 ) / 2 > blimitBd)
if ( filterLen >= 6 ) {
mask = (Abs( p2  p1 ) > limitBd)
mask = (Abs( q2  q1 ) > limitBd)
}
if ( filterLen >= 8 ) {
mask = (Abs( p3  p2 ) > limitBd)
mask = (Abs( q3  q2 ) > limitBd)
}
filterMask = (mask == 0)
The value of flatMask is only required when filterSize >= 8. It measures whether samples from each side of the specified boundary are in a flat region. That is whether those samples are at most (1 << (BitDepth  8)) different from the sample on the boundary. It is calculated as follows:
thresholdBd = 1 << (BitDepth  8)
if ( filterSize >= 8 ) {
mask = 0
mask = (Abs( p1  p0 ) > thresholdBd)
mask = (Abs( q1  q0 ) > thresholdBd)
mask = (Abs( p2  p0 ) > thresholdBd)
mask = (Abs( q2  q0 ) > thresholdBd)
if ( filterLen >= 8 ) {
mask = (Abs( p3  p0 ) > thresholdBd)
mask = (Abs( q3  q0 ) > thresholdBd)
}
flatMask = (mask == 0)
}
The value of flatMask2 is only required when filterSize >= 16. It measures whether at least seven samples from each side of the specified boundary are in a flat region assuming the first four on each side are (so the full region is flat if flatMask & flatMask2 == 0). The value of flatMask2 is calculated as follows:
thresholdBd = 1 << (BitDepth  8)
if ( filterSize >= 16 ) {
mask = 0
mask = (Abs( p6  p0 ) > thresholdBd)
mask = (Abs( q6  q0 ) > thresholdBd)
mask = (Abs( p5  p0 ) > thresholdBd)
mask = (Abs( q5  q0 ) > thresholdBd)
mask = (Abs( p4  p0 ) > thresholdBd)
mask = (Abs( q4  q0 ) > thresholdBd)
flatMask2 = (mask == 0)
}
Narrow filter process
The inputs to this filter are:

a variable hevMask specifying whether this is a high edge variance case,

variables x, y specifying the location within CurrFrame[ plane ],

a variable plane specifying whether the block is the Y, U or V plane,

variables dx and dy specifying the direction perpendicular to the edge being filtered.
This process modifies up to two samples on each side of the specified boundary depending on the value of hevMask as follows:

If hevMask is equal to 0 (i.e. the samples do not have high edge variance), this process modifies two samples on each side of the specified boundary, using a filter constructed from just the inner two (one from each side of the specified boundary).

Otherwise (the samples do have high edge variance), this process only modifies the one value on each side of the specified boundary, using a filter constructed from four input samples (two from each side of the specified boundary).
The process subtracts 0x80 << (BitDepth  8) from the input sample values so that they are in the range (1 << (BitDepth  1)) to (1 << (BitDepth  1))  1 inclusive. Intermediate values are made to be in this range by the following function:
filter4_clamp( value ) {
return Clip3( (1 << (BitDepth  1)), (1 << (BitDepth  1))  1, value )
}
The process is specified as follows:
q0 = CurrFrame[ plane ][ y ][ x ]
q1 = CurrFrame[ plane ][ y + dy ][ x + dx ]
p0 = CurrFrame[ plane ][ y  dy ][ x  dx ]
p1 = CurrFrame[ plane ][ y  dy * 2 ][ x  dx * 2 ]
ps1 = p1  (0x80 << (BitDepth  8))
ps0 = p0  (0x80 << (BitDepth  8))
qs0 = q0  (0x80 << (BitDepth  8))
qs1 = q1  (0x80 << (BitDepth  8))
filter = hevMask ? filter4_clamp( ps1  qs1 ) : 0
filter = filter4_clamp( filter + 3 * (qs0  ps0) )
filter1 = filter4_clamp( filter + 4 ) >> 3
filter2 = filter4_clamp( filter + 3 ) >> 3
oq0 = filter4_clamp( qs0  filter1 ) + (0x80 << (BitDepth  8))
op0 = filter4_clamp( ps0 + filter2 ) + (0x80 << (BitDepth  8))
CurrFrame[ plane ][ y ][ x ] = oq0
CurrFrame[ plane ][ y  dy ][ x  dx ] = op0
if ( !hevMask ) {
filter = Round2( filter1, 1 )
oq1 = filter4_clamp( qs1  filter ) + (0x80 << (BitDepth  8))
op1 = filter4_clamp( ps1 + filter ) + (0x80 << (BitDepth  8))
CurrFrame[ plane ][ y + dy ][ x + dx ] = oq1
CurrFrame[ plane ][ y  dy * 2 ][ x  dx * 2 ] = op1
}
Wide filter process
The inputs to this filter are:

variables x, y specifying the the location within CurrFrame[ plane ],

a variable plane specifying whether the block is the Y, U or V plane,

variables dx and dy specifying the direction perpendicular to the edge being filtered,

a variable log2Size specifying the base 2 logarithm of the number of taps.
This filter is only applied when samples from each side of the boundary are detected to be in a flat region.
The variable n (specifying the number of filter taps on each side of the central sample) is set as follows:

If log2Size is equal to 4, n is set equal to 6.

Otherwise if plane is equal to 0, n is set equal to 3.

Otherwise (log2Size is equal to 3 and plane is greater than 0), n is set equal to 2.
The variable n2 (specifying the number of filter taps equal to 2 on each side of the central sample needed to give a unity DC gain) is set as follows:

If log2Size is equal to 3 and plane is equal to 0, n2 is set equal to 0.

Otherwise (log2Size is equal to 4 or plane is greater than 0), n2 is set equal to 1.
This process modifies the samples on each side of the specified boundary by applying a low pass filter as follows:
for ( i = n; i < n; i++ ) {
t = 0
for ( j = n; j <= n; j++ ) {
p = Clip3( ( n + 1 ), n, i + j )
tap = ( Abs( j ) <= n2 ) ? 2 : 1
t += CurrFrame[ plane ][ y + p * dy ][ x + p * dx ] * tap
}
F[ i ] = Round2( t, log2Size )
}
for ( i = n; i < n; i++ )
CurrFrame[ plane ][ y+i * dy ][ x+i * dx ] = F[ i ]
where F is an array with indices from n to n  1 used to store the filtered results.
CDEF process
Input to this process is the array CurrFrame of reconstructed samples.
Output from this process is the array CdefFrame containing deringed samples.
The purpose of CDEF is to perform deringing based on the detected direction of blocks.
CDEF parameters are stored for each 64 by 64 block of luma samples.
The CDEF filter is applied on each 8 by 8 block as follows:
step4 = Num_4x4_Blocks_Wide[ BLOCK_8X8 ]
cdefSize4 = Num_4x4_Blocks_Wide[ BLOCK_64X64 ]
cdefMask4 = ~(cdefSize4  1)
for ( r = 0; r < MiRows; r += step4 ) {
for ( c = 0; c < MiCols; c += step4 ) {
baseR = r & cdefMask4
baseC = c & cdefMask4
idx = cdef_idx[ baseR ][ baseC ]
cdef_block(r, c, idx)
}
}
When the cdef_block function is called, the CDEF block process specified in CDEF block process is invoked with r, c, and idx as inputs.
CDEF block process
The inputs to this process are:

variables r and c specifying the location of an 8x8 block in units of 4x4 blocks in the luma plane,

a variable idx specifying which set of CDEF parameters to use, or 1 to signal that no filtering should be applied.
The block is first copied to the CdefFrame as follows:
startY = r * MI_SIZE
endY = startY + MI_SIZE * 2
startX = c * MI_SIZE
endX = startX + MI_SIZE * 2
for ( y = startY; y < endY; y++ ) {
for ( x = startX; x < endX; x++ ) {
CdefFrame[ 0 ][ y ][ x ] = CurrFrame[ 0 ][ y ][ x ]
}
}
if ( NumPlanes > 1 ) {
startY >>= subsampling_y
endY >>= subsampling_y
startX >>= subsampling_x
endX >>= subsampling_x
for ( y = startY; y < endY; y++ ) {
for ( x = startX; x < endX; x++ ) {
CdefFrame[ 1 ][ y ][ x ] = CurrFrame[ 1 ][ y ][ x ]
CdefFrame[ 2 ][ y ][ x ] = CurrFrame[ 2 ][ y ][ x ]
}
}
}
Note If CDEF filtering turns out to be needed, then the contents of CdefFrame will be overwritten later in this process.
If idx is equal to 1, then the process returns immediately after performing this copy.
The variable coeffShift is set equal to BitDepth  8.
The variable skip is set equal to ( Skips[ r ][ c ] && Skips[ r + 1 ][ c ] && Skips[ r ][ c + 1 ] && Skips[ r + 1 ][ c + 1 ] ).
If skip is equal to 0, the CDEF direction process specified in CDEF direction process is invoked with r and c as inputs, and the outputs assigned to variables yDir and var.
If skip is equal to 0, the following ordered steps apply:

The variable priStr is set equal to cdef_y_pri_strength[ idx ] << coeffShift.

The variable secStr is set equal to cdef_y_sec_strength[ idx ] << coeffShift.

The variable dir is set equal to ( priStr == 0 ) ? 0 : yDir.

The variable varStr is set equal to ( var >> 6 ) ? Min( FloorLog2( var >> 6 ), 12) : 0.

The variable priStr is set equal to ( var ? ( priStr * ( 4 + varStr ) + 8 ) >> 4 : 0 ).

The variable damping is set equal to CdefDamping + coeffShift.

The CDEF filter process specified in CDEF filter process is invoked with plane equal to 0, r, c, priStr, secStr, damping, and dir as input.

If NumPlanes is equal to 1, the process terminates at this point (i.e. filtering is not done for the U and V planes).

The variable priStr is set equal to cdef_uv_pri_strength[ idx ] << coeffShift.

The variable secStr is set equal to cdef_uv_sec_strength[ idx ] << coeffShift.

The variable dir is set equal to ( priStr == 0 ) ? 0 : Cdef_Uv_Dir[ subsampling_x ][ subsampling_y ][ yDir ].

The variable damping is set equal to CdefDamping + coeffShift  1.

The CDEF filter process specified in CDEF filter process is invoked with plane equal to 1, r, c, priStr, secStr, damping, and dir as input.

The CDEF filter process specified in CDEF filter process is invoked with plane equal to 2, r, c, priStr, secStr, damping, and dir as input.
Cdef_Uv_Dir is a constant lookup table defined as:
Cdef_Uv_Dir[ 2 ][ 2 ][ 8 ] = {
{ {0, 1, 2, 3, 4, 5, 6, 7},
{1, 2, 2, 2, 3, 4, 6, 0} },
{ {7, 0, 2, 4, 5, 6, 6, 6},
{0, 1, 2, 3, 4, 5, 6, 7} }
}
CDEF direction process
The inputs to this process are variables r and c specifying the location of an 8x8 block in units of 4x4 blocks in the luma plane.
The outputs of this process are:

a variable yDir containing the direction of this block,

a variable var containing the variance for this block.
This block uses luma samples to measure the direction and variance of a block.
The process is specified as:
for ( i = 0; i < 8; i++ ) {
cost[i] = 0
for ( j = 0; j < 15; j++ )
partial[i][j] = 0
}
bestCost = 0
yDir = 0
x0 = c << MI_SIZE_LOG2
y0 = r << MI_SIZE_LOG2
for ( i = 0; i < 8; i++ ) {
for ( j = 0; j < 8; j++ ) {
x = (CurrFrame[ 0 ][y0 + i][x0 + j] >> (BitDepth  8))  128
partial[0][i + j] += x
partial[1][i + j / 2] += x
partial[2][i] += x
partial[3][3 + i  j / 2] += x
partial[4][7 + i  j] += x
partial[5][3  i / 2 + j] += x
partial[6][j] += x
partial[7][i / 2 + j] += x
}
}
for ( i = 0; i < 8; i++ ) {
cost[2] += partial[2][i] * partial[2][i]
cost[6] += partial[6][i] * partial[6][i]
}
cost[2] *= Div_Table[8]
cost[6] *= Div_Table[8]
for ( i = 0; i < 7; i++ ) {
cost[0] += (partial[0][i] * partial[0][i] +
partial[0][14  i] * partial[0][14  i]) *
Div_Table[i + 1]
cost[4] += (partial[4][i] * partial[4][i] +
partial[4][14  i] * partial[4][14  i]) *
Div_Table[i + 1]
}
cost[0] += partial[0][7] * partial[0][7] * Div_Table[8]
cost[4] += partial[4][7] * partial[4][7] * Div_Table[8]
for ( i = 1; i < 8; i += 2 ) {
for ( j = 0; j < 4 + 1; j++ ) {
cost[i] += partial[i][3 + j] * partial[i][3 + j]
}
cost[i] *= Div_Table[8]
for ( j = 0; j < 4  1; j++ ) {
cost[i] += (partial[i][j] * partial[i][j] +
partial[i][10  j] * partial[i][10  j]) *
Div_Table[2 * j + 2]
}
}
for ( i = 0; i < 8; i++ ) {
if ( cost[i] > bestCost ) {
bestCost = cost[i]
yDir = i
}
}
var = (bestCost  cost[(yDir + 4) & 7]) >> 10
where the Div_Table is a constant lookup table specified as:
Div_Table[9] = {
0, 840, 420, 280, 210, 168, 140, 120, 105
}
CDEF filter process
The inputs to this process are:

a variable plane specifying which plane is being predicted,

variables r and c specifying the location of an 8x8 block in units of 4x4 blocks in the luma plane,

a variable priStr specifying the primary filter strength,

a variable secStr specifying the secondary filter strength,

a variable damping specifying a shift used for damping,

a variable dir specifying the detected direction of the block.
The process modifies samples in CdefFrame based on filtering samples from CurrFrame.
MiColStart, MiRowStart, MiColEnd, MiRowEnd are set equal to the values they had when the syntax element MiSizes[ r ][ c ] was written.
Note: These variables are used by the is_inside_filter_region function to determine which samples are available for use in filtering.
The variable coeffShift is set equal to BitDepth  8.
The filtering is applied as follows:
subX = (plane > 0) ? subsampling_x : 0
subY = (plane > 0) ? subsampling_y : 0
x0 = (c * MI_SIZE ) >> subX
y0 = (r * MI_SIZE ) >> subY
w = 8 >> subX
h = 8 >> subY
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
sum = 0
x = CurrFrame[plane][y0 + i][x0 + j]
max = x
min = x
for ( k = 0; k < 2; k++ ) {
for ( sign = 1; sign <= 1; sign += 2 ) {
p = cdef_get_at(plane, x0, y0, i, j, dir, k, sign, subX, subY)
if ( CdefAvailable ) {
sum += Cdef_Pri_Taps[(priStr >> coeffShift) & 1][k] * constrain(p  x, priStr, damping)
max = Max(p, max)
min = Min(p, min)
}
for ( dirOff = 2; dirOff <= 2; dirOff += 4) {
s = cdef_get_at(plane, x0, y0, i, j, (dir + dirOff) & 7, k, sign, subX, subY)
if ( CdefAvailable ) {
sum += Cdef_Sec_Taps[(priStr >> coeffShift) & 1][k] * constrain(s  x, secStr, damping)
max = Max(s, max)
min = Min(s, min)
}
}
}
}
CdefFrame[plane][y0 + i][x0 + j] = Clip3(min, max, x + ((8 + sum  (sum < 0)) >> 4) )
}
}
where Cdef_Pri_Taps and Cdef_Sec_Taps are constant lookup tables specified as:
Cdef_Pri_Taps[2][2] = {
{ 4, 2 }, { 3, 3 }
}
Cdef_Sec_Taps[2][2] = {
{ 2, 1 }, { 2, 1 }
}
constrain is specified as:
constrain(diff, threshold, damping) {
if ( !threshold )
return 0
dampingAdj = Max(0, damping  FloorLog2( threshold ) )
sign = (diff < 0) ? 1 : 1
return sign * Clip3(0, Abs(diff), threshold  (Abs(diff) >> dampingAdj) )
}
cdef_get_at fetches a sample from CurrFrame and sets CdefAvailable according to whether the sample is available. cdef_get_at is specified as:
cdef_get_at(plane, x0, y0, i, j, dir, k, sign, subX, subY) {
y = y0 + i + sign * Cdef_Directions[dir][k][0]
x = x0 + j + sign * Cdef_Directions[dir][k][1]
candidateR = (y << subY) >> MI_SIZE_LOG2
candidateC = (x << subX) >> MI_SIZE_LOG2
if ( is_inside_filter_region( candidateR, candidateC ) ) {
CdefAvailable = 1
return CurrFrame[ plane ][ y ][ x ]
} else {
CdefAvailable = 0
return 0
}
}
where Cdef_Directions is a constant lookup table defined as:
Cdef_Directions[8][2][2] = {
{ { 1, 1 }, { 2, 2 } },
{ { 0, 1 }, { 1, 2 } },
{ { 0, 1 }, { 0, 2 } },
{ { 0, 1 }, { 1, 2 } },
{ { 1, 1 }, { 2, 2 } },
{ { 1, 0 }, { 2, 1 } },
{ { 1, 0 }, { 2, 0 } },
{ { 1, 0 }, { 2, 1 } }
}
Upscaling process
Input to this process is an array inputFrame of width FrameWidth and height FrameHeight.
The output of this process is a horizontally upscaled frame of width UpscaledWidth and height FrameHeight.
If use_superres is equal to 0, no upscaling is required and this process returns inputFrame.
This process is specified as:
for ( plane = 0; plane < NumPlanes; plane++ ) {
if ( plane > 0 ) {
subX = subsampling_x
subY = subsampling_y
} else {
subX = 0
subY = 0
}
downscaledPlaneW = Round2(FrameWidth, subX)
upscaledPlaneW = Round2(UpscaledWidth, subX)
planeH = Round2(FrameHeight, subY)
stepX = ((downscaledPlaneW << SUPERRES_SCALE_BITS) + (upscaledPlaneW / 2)) / upscaledPlaneW
err = (upscaledPlaneW * stepX)  (downscaledPlaneW << SUPERRES_SCALE_BITS)
initialSubpelX =
(((upscaledPlaneW  downscaledPlaneW) << (SUPERRES_SCALE_BITS  1)) + upscaledPlaneW / 2) / upscaledPlaneW +
(1 << (SUPERRES_EXTRA_BITS  1))  err / 2
initialSubpelX &= SUPERRES_SCALE_MASK
miW = MiCols >> subX
minX = 0
maxX = miW * MI_SIZE  1
for ( y = 0; y < planeH; y++ ) {
for ( x = 0; x < upscaledPlaneW; x++ ) {
srcX = (1 << SUPERRES_SCALE_BITS) + initialSubpelX + x*stepX
srcXPx = (srcX >> SUPERRES_SCALE_BITS)
srcXSubpel = (srcX & SUPERRES_SCALE_MASK) >> SUPERRES_EXTRA_BITS
sum = 0
for ( k = 0; k < SUPERRES_FILTER_TAPS; k++ ) {
sampleX = Clip3(minX, maxX, srcXPx + (k  SUPERRES_FILTER_OFFSET))
px = frame[plane][y][sampleX]
sum += px * Upscale_Filter[srcXSubpel][k]
}
outputFrame[plane][y][x] = Clip1(Round2(sum, FILTER_BITS))
}
}
}
where Upscale_Filter is specified as:
Upscale_Filter[SUPERRES_FILTER_SHIFTS][SUPERRES_FILTER_TAPS] = {
{ 0, 0, 0, 128, 0, 0, 0, 0 }, { 0, 0, 1, 128, 2, 1, 0, 0 },
{ 0, 1, 3, 127, 4, 2, 1, 0 }, { 0, 1, 4, 127, 6, 3, 1, 0 },
{ 0, 2, 6, 126, 8, 3, 1, 0 }, { 0, 2, 7, 125, 11, 4, 1, 0 },
{ 1, 2, 8, 125, 13, 5, 2, 0 }, { 1, 3, 9, 124, 15, 6, 2, 0 },
{ 1, 3, 10, 123, 18, 6, 2, 1 }, { 1, 3, 11, 122, 20, 7, 3, 1 },
{ 1, 4, 12, 121, 22, 8, 3, 1 }, { 1, 4, 13, 120, 25, 9, 3, 1 },
{ 1, 4, 14, 118, 28, 9, 3, 1 }, { 1, 4, 15, 117, 30, 10, 4, 1 },
{ 1, 5, 16, 116, 32, 11, 4, 1 }, { 1, 5, 16, 114, 35, 12, 4, 1 },
{ 1, 5, 17, 112, 38, 12, 4, 1 }, { 1, 5, 18, 111, 40, 13, 5, 1 },
{ 1, 5, 18, 109, 43, 14, 5, 1 }, { 1, 6, 19, 107, 45, 14, 5, 1 },
{ 1, 6, 19, 105, 48, 15, 5, 1 }, { 1, 6, 19, 103, 51, 16, 5, 1 },
{ 1, 6, 20, 101, 53, 16, 6, 1 }, { 1, 6, 20, 99, 56, 17, 6, 1 },
{ 1, 6, 20, 97, 58, 17, 6, 1 }, { 1, 6, 20, 95, 61, 18, 6, 1 },
{ 2, 7, 20, 93, 64, 18, 6, 2 }, { 2, 7, 20, 91, 66, 19, 6, 1 },
{ 2, 7, 20, 88, 69, 19, 6, 1 }, { 2, 7, 20, 86, 71, 19, 6, 1 },
{ 2, 7, 20, 84, 74, 20, 7, 2 }, { 2, 7, 20, 81, 76, 20, 7, 1 },
{ 2, 7, 20, 79, 79, 20, 7, 2 }, { 1, 7, 20, 76, 81, 20, 7, 2 },
{ 2, 7, 20, 74, 84, 20, 7, 2 }, { 1, 6, 19, 71, 86, 20, 7, 2 },
{ 1, 6, 19, 69, 88, 20, 7, 2 }, { 1, 6, 19, 66, 91, 20, 7, 2 },
{ 2, 6, 18, 64, 93, 20, 7, 2 }, { 1, 6, 18, 61, 95, 20, 6, 1 },
{ 1, 6, 17, 58, 97, 20, 6, 1 }, { 1, 6, 17, 56, 99, 20, 6, 1 },
{ 1, 6, 16, 53, 101, 20, 6, 1 }, { 1, 5, 16, 51, 103, 19, 6, 1 },
{ 1, 5, 15, 48, 105, 19, 6, 1 }, { 1, 5, 14, 45, 107, 19, 6, 1 },
{ 1, 5, 14, 43, 109, 18, 5, 1 }, { 1, 5, 13, 40, 111, 18, 5, 1 },
{ 1, 4, 12, 38, 112, 17, 5, 1 }, { 1, 4, 12, 35, 114, 16, 5, 1 },
{ 1, 4, 11, 32, 116, 16, 5, 1 }, { 1, 4, 10, 30, 117, 15, 4, 1 },
{ 1, 3, 9, 28, 118, 14, 4, 1 }, { 1, 3, 9, 25, 120, 13, 4, 1 },
{ 1, 3, 8, 22, 121, 12, 4, 1 }, { 1, 3, 7, 20, 122, 11, 3, 1 },
{ 1, 2, 6, 18, 123, 10, 3, 1 }, { 0, 2, 6, 15, 124, 9, 3, 1 },
{ 0, 2, 5, 13, 125, 8, 2, 1 }, { 0, 1, 4, 11, 125, 7, 2, 0 },
{ 0, 1, 3, 8, 126, 6, 2, 0 }, { 0, 1, 3, 6, 127, 4, 1, 0 },
{ 0, 1, 2, 4, 127, 3, 1, 0 }, { 0, 0, 1, 2, 128, 1, 0, 0 },
}
It is a requirement of bitstream conformance that upscaledPlaneW is strictly greater than downscaledPlaneW.
The output of this process is equal to outputFrame.
Loop restoration process
Input to this process are the arrays UpscaledCurrFrame (of reconstructed samples) and UpscaledCdefFrame (of deringed samples).
Output from this process is the array LrFrame of loop restored samples.
Note: Although this process loops over 4x4 blocks, loop restoration is designed to work in stripes 64 luma samples high without needing additional line buffers. Samples within the current stripe are fetched from UpscaledCdefFrame. Samples outside the current stripe are fetched from UpscaledCurrFrame (these samples will be deblocked, but will not have CDEF filtering applied).
The array LrFrame is set equal to a copy of UpscaledCdefFrame. (The contents of LrFrame will later be overwritten for blocks that require restoration filtering.)
If UsesLr is equal to 0, then the process returns immediately after performing this copy.
Otherwise, loop restoration is applied as follows:
for ( y = 0; y < FrameHeight; y += MI_SIZE ) {
for ( x = 0; x < UpscaledWidth; x += MI_SIZE ) {
for ( plane = 0; plane < NumPlanes; plane++ ) {
if ( FrameRestorationType[ plane ] != RESTORE_NONE ) {
row = y >> MI_SIZE_LOG2
col = x >> MI_SIZE_LOG2
loop_restore_block( plane, row, col )
}
}
}
}
When loop_restore_block is called, the loop restore block process in Loop restore block process is invoked with plane, row, and col as inputs.
Loop restore block process
The inputs to this process are:

a variable plane specifying whether the process is filtering Y, U, or V samples,

variables row and col specifying the location of the block in units of 4x4 blocks in the upscaled luma plane.
The output of this process are samples in LrFrame[ plane ].
The variable lumaY is set equal to row * MI_SIZE.
The variable stripeNum (specifying the zerobased index of the current stripe) is set equal to (lumaY + 8) / 64.
Note: The stripes are offset upwards by 8 luma samples to make pipelined implementations more efficient. When a row of superblocks has been received, enough rows of deblocked output can be produced to allow loop restoration of the corresponding stripes.
The variables subX and subY are set equal to the subsampling for the current plane as follows:

If plane is equal to 0, subX is set equal to 0 and subY is set equal to 0.

Otherwise, subX is set equal to subsampling_x and subY is set equal to subsampling_y.
The variable StripeStartY (specifying the start of the stripe in units of samples in the current plane) is set equal to ( (8 + stripeNum * 64) >> subY ).
The variable StripeEndY (specifying the end of the stripe in units of samples in the current plane) is set equal to StripeStartY + (64 >> subY)  1.
Note: StripeStartY and StripeEndY are used by the get source sample process to decide whether to fetch from UpscaledCurrFrame or UpscaledCdefFrame.
The variable unitSize (specifying the size of restoration units in units of samples in the current plane) is set equal to LoopRestorationSize[ plane ].
The variable unitRows (specifying the number of restoration units down the frame) is set equal to count_units_in_frame( unitSize, Round2( FrameHeight, subY) ).
The variable unitCols (specifying the number of restoration units across the frame) is set equal to count_units_in_frame( unitSize, Round2( UpscaledWidth, subX ) ).
Note: The number of restoration units in a frame can be different for chroma and luma.
The variable unitRow (specifying the vertical index of the current loop restoration unit) is set equal to Min( unitRows  1, ( ( row * MI_SIZE + 8) >> subY ) / unitSize ).
The variable unitCol (specifying the horizontal index of the current loop restoration unit) is set equal to Min( unitCols  1, ( col * MI_SIZE >> subX ) / unitSize ).
The horizontal extent of the space allowed for filtering is specified as follows:
The variable PlaneEndX (specifying the horizontal extent of the space allowed for filtering) is set equal to Round2( UpscaledWidth, subX )  1.
The variable PlaneEndY (specifying the vertical extent of the space allowed for filtering) is set equal to Round2( FrameHeight, subY )  1.
The variable x is set equal to ( col * MI_SIZE >> subX ).
The variable y is set equal to ( row * MI_SIZE >> subY ).
The variable w is set equal to Min( MI_SIZE >> subX, PlaneEndX  x + 1 ).
The variable h is set equal to Min( MI_SIZE >> subY, PlaneEndY  y + 1 ).
(Variables x and y represent the position of the block in samples relative to the topleft corner of the current plane. Variables w and h represent the size of the block in samples.)
Note: Although the filter is described as operating on small blocks, the output will be the same if larger blocks are used  provided all contained samples belong to the same loop restoration unit.
The variable rType (specifying the loop restoration type) is set equal to LrType[ plane ][ unitRow ][ unitCol ].
The filter to used depends on rType as follows:

If rType is equal to RESTORE_WIENER, the Wiener filter process specified in Wiener filter process is invoked with plane, unitRow, unitCol, x, y, w, and h as inputs.

Otherwise, if rType is equal to RESTORE_SGRPROJ, the self guided filter process specified in Self guided filter process is invoked with plane, unitRow, unitCol, x, y, w, and h as inputs.

Otherwise (rType is equal to RESTORE_NONE), no filtering is applied.
Self guided filter process
The inputs to this block are:

a variable plane specifying whether the process is filtering Y, U, or V samples,

variables unitRow and unitCol specifying the position of the loop restoration unit,

variables x and y specifying the position of the block in samples relative to the topleft corner of the current plane,

variables w and h specifying the size of the block in samples.
The arrays flt0 and flt1 are prepared by the following ordered steps:

The variable set is set equal to LrSgrSet[ plane ][ unitRow ][ unitCol ].

The variable pass is set equal to 0.

The box filter process specified in Box filter process is invoked with plane, x, y, w, h, set, and pass as inputs, and the output assigned to flt0.

The variable pass is set equal to 1.

The box filter process specified in Box filter process is invoked with plane, x, y, w, h, set, and pass as inputs, and the output assigned to flt1.
The restoration process is then applied for each sample as follows:
w0 = LrSgrXqd[ plane ][ unitRow ][ unitCol ][ 0 ]
w1 = LrSgrXqd[ plane ][ unitRow ][ unitCol ][ 1 ]
w2 = (1 << SGRPROJ_PRJ_BITS)  w0  w1
r0 = Sgr_Params[ set ][ 0 ]
r1 = Sgr_Params[ set ][ 2 ]
for ( i = 0; i < h; i++ ) {
for ( j = 0; j < w; j++ ) {
u = UpscaledCdefFrame[ plane ][ y + i ][ x + j ] << SGRPROJ_RST_BITS
v = w1 * u
if ( r0 )
v += w0 * flt0[ i ][ j ]
else
v += w0 * u
if ( r1 )
v += w2 * flt1[ i ][ j ]
else
v += w2 * u
s = Round2( v, SGRPROJ_RST_BITS + SGRPROJ_PRJ_BITS )
LrFrame[ plane ][ y + i ][ x + j ] = Clip1( s )
}
}
Box filter process
The inputs to this process are:

a variable plane specifying whether the process is filtering Y, U, or V samples,

variables x and y specifying the position of the block in samples relative to the topleft corner of the current plane,

variables w and h specifying the size of the block in samples,

a variable set specifying the strength of the filtering,

a variable pass (equal to 0 or 1), specifying if the process is generating the first or second filtered output.
The output of this process is a 2d array F.
The variable r (specifying that the box filters have side length 2*r+1) is set equal to Sgr_Params[ set ][ pass * 2 + 0 ].
If r is equal to 0, then this process immediately terminates.
The variable eps (specifying a scaling for the output) is set equal to Sgr_Params[ set ][ pass * 2 + 1 ].
The 2d arrays A and B (note these arrays are valid for coordinates including an extra sample around the boundary) are prepared as follows:
n = ( 2 * r + 1 ) * ( 2 * r + 1 )
n2e = n * n * eps
s = (((1 << SGRPROJ_MTABLE_BITS) + n2e / 2) / n2e)
for ( i = 1; i < h + 1; i++ ) {
for ( j = 1; j < w + 1; j++ ) {
a = 0
b = 0
for ( dy = r ; dy <= r; dy++ ) {
for ( dx = r; dx <= r; dx++ ) {
c = get_source_sample( plane, x + j + dx, y + i + dy )
a += c * c
b += c
}
}
a = Round2( a, 2 * (BitDepth  8) )
d = Round2( b, BitDepth  8 )
p = Max( 0, a * n  d * d )
z = Round2( p * s, SGRPROJ_MTABLE_BITS )
if ( z >= 255 )
a2 = 256
else if ( z == 0 )
a2 = 1
else
a2 = ((z << SGRPROJ_SGR_BITS) + (z/2)) / (z + 1)
oneOverN = ((1 << SGRPROJ_RECIP_BITS) + (n/2)) / n
b2 = ( (1 << SGRPROJ_SGR_BITS)  a2 ) * b * oneOverN
A[ i ][ j ] = a2
B[ i ][ j ] = Round2( b2, SGRPROJ_RECIP_BITS )
}
}
Note: When pass is equal to 0, only odd rows (i.e. entries A[ i ][ j ] and B[ i ][ j ] with i odd) will be used to generate the output.
where the call to get_source_sample specifies that the get source sample process specified in Get source sample process should be invoked and the output assigned to variable c.
After A and B are prepared, the output array F is generated as follows:
for ( i = 0; i < h; i++ ) {
shift = 5
if ( pass == 0 && ( i & 1 ) ) {
shift = 4
}
for ( j = 0; j < w; j++ ) {
a = 0
b = 0
for ( dy = 1 ; dy <= 1; dy++ ) {
for ( dx = 1; dx <= 1; dx++ ) {
if ( pass == 0 ) {
if ( (i + dy) & 1 ) {
weight = (dx == 0) ? 6 : 5
} else {
weight = 0
}
} else {
weight = (dx == 0  dy == 0) ? 4 : 3
}
a += weight * A[ i + dy ][ j + dx ]
b += weight * B[ i + dy ][ j + dx ]
}
}
v = a * UpscaledCdefFrame[ plane ][ y + i ][ x + j ] + b
F[ i ][ j ] = Round2( v, SGRPROJ_SGR_BITS + shift  SGRPROJ_RST_BITS)
}
}
Note: When pass is equal to 0, the weights for even rows of A and B are always equal to 0.
The constant lookup table Sgr_Params is specified as:
Sgr_Params[ (1 << SGRPROJ_PARAMS_BITS) ][ 4 ] = {
{ 2, 12, 1, 4 }, { 2, 15, 1, 6 }, { 2, 18, 1, 8 }, { 2, 21, 1, 9 },
{ 2, 24, 1, 10 }, { 2, 29, 1, 11 }, { 2, 36, 1, 12 }, { 2, 45, 1, 13 },
{ 2, 56, 1, 14 }, { 2, 68, 1, 15 }, { 0, 0, 1, 5 }, { 0, 0, 1, 8 },
{ 0, 0, 1, 11 }, { 0, 0, 1, 14 }, { 2, 30, 0, 0 }, { 2, 75, 0, 0 }
}
Wiener filter process
The inputs to this block are:

a variable plane specifying whether the process is filtering Y, U, or V samples,

variables unitRow and unitCol specifying the position of the loop restoration unit,

variables x and y specifying the position of the block in samples relative to the topleft corner of the current plane,

variables w and h specifying the size of the block in samples.
The output from this process are modified samples in LrFrame.
The subsample interpolation is effected via two onedimensional convolutions. First a horizontal filter is used to build up a temporary array, and then this array is vertically filtered to obtain the final prediction.
The rounding variables derivation process specified in Rounding variables derivation process is invoked with the input variable isCompound set equal to 0.
The Wiener coefficient process specified in Wiener coefficient process is invoked with an input of LrWiener[ plane ][ unitRow ][ unitCol ][ 0 ] and the output assigned to vfilter.
The Wiener coefficient process specified in Wiener coefficient process is invoked with an input of LrWiener[ plane ][ unitRow ][ unitCol ][ 1 ] and the output assigned to hfilter.
Note: The horizontal filter needs to be applied before the vertical filter, but the horizontal coefficients are sent after the vertical coefficients.
The filtering is applied as follows:

The array intermediate is specified as follows:
offset = (1 << (BitDepth + FILTER_BITS  InterRound0  1)) limit = (1 << (BitDepth + 1 + FILTER_BITS  InterRound0))  1 for ( r = 0; r < h + 6; r++ ) { for ( c = 0; c < w; c++ ) { s = 0 for ( t = 0; t < 7; t++ ) s += hfilter[ t ] * get_source_sample( plane, x + c + t  3, y + r  3 ) v = Round2(s, InterRound0) intermediate[ r ][ c ] = Clip3( offset, limit  offset, v ) } }
Where the call to get_source_sample specifies that the get source sample process specified in Get source sample process should be invoked.
Note: The intermediate result is clipped so that ( intermediate[ r ][ c ] + offset ) fits in an unsigned variable with Min(15,BitDepth + 5) bits.

The output samples are written as follows:
for ( r = 0; r < h; r++ ) { for ( c = 0; c < w; c++ ) { s = 0 for ( t = 0; t < 7; t++ ) s += vfilter[ t ] * intermediate[ r + t ][ c ] v = Round2( s, InterRound1 ) LrFrame[ plane ][ y + r ][ x + c ] = Clip1( v ) } }
Wiener coefficient process
The input to this process is an array coeff containing 3 coefficients.
The output from this process is an an array containing 7 coefficients.
The Wiener filter is always symmetrical and has a unit DC gain, so there are only three coefficients that need to be explicitly coded.
This process computes the full set of coefficients as follows:
filter[ 3 ] = 128
for ( i = 0; i < 3; i++ ) {
c = coeff[ i ]
filter[ i ] = c
filter[ 6  i ] = c
filter[ 3 ] = 2 * c
}
The output of the process is the array filter.
Note: When chroma is being filtered, coeff[ 0 ] will always be equal to 0, therefore filter[ 0 ] and filter[ 6 ] will always be equal to 0. In other words, luma uses a 7tap filter, while chroma uses a 5tap filter.
Get source sample process
The inputs to this process are:

a variable plane specifying whether the process is filtering Y, U, or V samples,

variables x and y specifying the location in the current plane in units of samples.
This process makes sure samples are taken from within the allowed extent for loop restoration filtering.
Samples within the current stripe are taken after Cdef filtering has been applied, samples outside the current stripe are taken before Cdef filtering.
The sample to return is specified as follows:
x = Min(PlaneEndX, x)
x = Max(0, x)
y = Min(PlaneEndY, y)
y = Max(0, y)
if ( y < StripeStartY ) {
y = Max(StripeStartY  2,y)
return UpscaledCurrFrame[ plane ][ y ][ x ]
} else if ( y > StripeEndY ) {
y = Min(StripeEndY + 2,y)
return UpscaledCurrFrame[ plane ][ y ][ x ]
} else {
return UpscaledCdefFrame[ plane ][ y ][ x ]
}
Note: This process can be called for samples on the three lines above and three lines below the current stripe. However, the coordinates are cropped such that only two lines above and below the stripe need to be fetched. In other words, requests for the third line (above or below) are given a copy of the second line.
Output process
General
This process is invoked to prepare output frames.
If scalability is being used (OperatingPointIdc not equal to 0), an applicationspecific function is called to decide whether this frame will be output. If this function returns a value equal to 0, then this process terminates immediately.
Note: Applications that are displaying the decoded video are expected to only display one frame from each temporal unit within the selected operating point. This frame should be the highest spatial layer that is both within the operating point and present within the temporal unit. Other applications may set their own policy about which frames are output.
The intermediate output preparation process specified in Intermediate output preparation process is invoked to prepare arrays OutY, OutU, and OutV, and the outputs are assigned to w, h, subX, and subY.
If film_grain_params_present is equal to 1 and apply_grain is equal to 1, then the film grain synthesis process specified in Film grain synthesis process is invoked with inputs of w, h, subX, and subY. (This process modifies the output arrays OutY, OutU, OutV).
Finally, the frame to be output is defined to be the arrays OutY, OutU, OutV where the bit depth for each sample is BitDepth.
This frame to be output is the overall output of the decoding process and further processing (such as color conversion) is outside the scope of this specification.
For example, a real implementation might use these arrays to display the frame to the user, or a test system might save the arrays so the output can be verified.
Note: If NumPlanes is equal to 1, then the U and V planes should be ignored.
Intermediate output preparation process
The output of this process are the variables w, h, subX, and subY describing the format of the data in arrays OutY, OutU, and OutV.
If show_existing_frame is equal to 1, then the decoder should copy OutY, OutU, and OutV from a previously decoded frame as follows:

The variable w is set equal to RefUpscaledWidth[ frame_to_show_map_idx ].

The variable h is set equal to RefFrameHeight[ frame_to_show_map_idx ].

The variable subX is set equal to RefSubsamplingX[ frame_to_show_map_idx ].

The variable subY is set equal to RefSubsamplingY[ frame_to_show_map_idx ].

The array OutY is w samples across by h samples down and the sample at location x samples across and y samples down is given by OutY[ y ][ x ] = FrameStore[ frame_to_show_map_idx ][ 0 ][ y ][ x ] with x = 0..w  1 and y = 0..h  1.

The array OutU is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutU[ y ][ x ] = FrameStore[ frame_to_show_map_idx ][ 1 ][ y ][ x ] with x = 0..((w + subX) >> subX)  1 and y = 0..((h + subY) >> subY)  1.

The array OutV is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutV[ y ][ x ] = FrameStore[ frame_to_show_map_idx ][ 2 ][ x ][ y ] with x = 0..((w + subX) >> subX)  1 and y = 0..((h + subY) >> subY)  1.

The variable BitDepth is set equal to RefBitDepth[ frame_to_show_map_idx ].

The bit depth for each sample is BitDepth.
Otherwise (show_existing_frame is equal to 0), then the decoder should copy the current frame as follows:

The variable w is set equal to UpscaledWidth.

The variable h is set equal to FrameHeight.

The variable subX is set equal to subsampling_x.

The variable subY is set equal to subsampling_y.

The array OutY is w samples across by h samples down and the sample at location x samples across and y samples down is given by OutY[ y ][ x ] = LrFrame[ 0 ][ y ][ x ] with x = 0..w  1 and y = 0..h  1.

The array OutU is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutU[ y ][ x ] = LrFrame[ 1 ][ y ][ x ] with x = 0..((w + subX) >> subX)  1 and y = 0..((h + subY) >> subY)  1.

The array OutV is (w + subX) >> subX samples across by (h + subY) >> subY samples down and the sample at location x samples across and y samples down is given by OutV[ y ][ x ] = LrFrame[ 2 ][ y ][ x ] with x = 0..((w + subX) >> subX)  1 and y = 0..((h + subY) >> subY)  1.

The bit depth for each sample is BitDepth.
The output of this process are the variables w, h, subX, and subY.
Film grain synthesis process
General
The inputs to this process are:

variables w and h specifying the width and height of the frame,

variables subX and subY specifying the subsampling parameters of the frame.
The process modifies the arrays OutY, OutU, OutV to add film grain noise as follows:

The variable RandomRegister (used for generating pseudorandom numbers) is set equal to grain_seed.

The variable GrainCenter is set equal to 128 << (BitDepth  8).

The variable GrainMin is set equal to = GrainCenter.

The variable GrainMax is set equal to (256 << (BitDepth  8))  1  GrainCenter.

The generate grain process specified in Generate grain process is invoked.

The scaling lookup initialization process specified in Scaling lookup initialization process is invoked.

The add noise process specified in Add noise synthesis process is invoked with w, h, subX, and subY as inputs.
Random number process
The input to this process is a variable bits specifying the number of random bits to return.
The output of this process is a pseudorandom number based on the state in RandomRegister.
The process is specified as follows:
get_random_number( bits ) {
r = RandomRegister
bit = ((r >> 0) ^ (r >> 1) ^ (r >> 3) ^ (r >> 12)) & 1
r = (r >> 1)  (bit << 15)
result = (r >> (16  bits)) & ((1 << bits)  1)
RandomRegister = r
return result
}
The output of this process is the variable result.
Generate grain process
This process generates noise via an autoregressive filter.
First an array LumaGrain 82 samples wide and 73 samples high of white noise is generated for luma as follows:
shift = 12  BitDepth + grain_scale_shift
for ( y = 0; y < 73; y++ ) {
for ( x = 0; x < 82; x++ ) {
if ( num_y_points > 0 ) {
g = Gaussian_Sequence[ get_random_number( 11 ) ]
} else {
g = 0
}
LumaGrain[ y ][ x ] = Round2( g, shift )
}
}
where the function call get_random_number invokes the random number process specified in Random number process.
Then an autoregressive filter is applied to the white noise as follows:
shift = ar_coeff_shift_minus_6 + 6
for ( y = 3; y < 73; y++ ) {
for ( x = 3; x < 82  3; x++ ) {
s = 0
pos = 0
for ( deltaRow = ar_coeff_lag; deltaRow <= 0; deltaRow++ ) {
for ( deltaCol = ar_coeff_lag; deltaCol <= ar_coeff_lag; deltaCol++ ) {
if ( deltaRow == 0 && deltaCol == 0 )
break
c = ar_coeffs_y_plus_128[ pos ]  128
s += LumaGrain[ y + deltaRow ][ x + deltaCol ] * c
pos++
}
}
LumaGrain[ y ][ x ] = Clip3( GrainMin, GrainMax, LumaGrain[ y ][ x ] + Round2( s, shift ) )
}
}
If mono_chrome is equal to 0, the chroma grain is generated in a similar way, except the filtering includes a coefficient that introduces a correlation with the luma grain.
The variable chromaW (representing the width of the chroma noise array) is set equal to (subsampling_x ? 44 : 82).
The variable chromaH (representing the height of the chroma noise array) is set equal to (subsampling_y ? 38 : 73).
White noise arrays CbGrain and CrGrain chromaW samples wide and chromaH samples high are generated as follows:
shift = 12  BitDepth + grain_scale_shift
RandomRegister = grain_seed ^ 0xb524
for ( y = 0; y < chromaH; y++ ) {
for ( x = 0; x < chromaW; x++ ) {
if ( num_cb_points > 0  chroma_scaling_from_luma) {
g = Gaussian_Sequence[ get_random_number( 11 ) ]
} else {
g = 0
}
CbGrain[ y ][ x ] = Round2( g, shift )
}
}
RandomRegister = grain_seed ^ 0x49d8
for ( y = 0; y < chromaH; y++ ) {
for ( x = 0; x < chromaW; x++ ) {
if ( num_cr_points > 0  chroma_scaling_from_luma) {
g = Gaussian_Sequence[ get_random_number( 11 ) ]
} else {
g = 0
}
CrGrain[ y ][ x ] = Round2( g, shift )
}
}
Then the autoregressive filter is applied as follows:
shift = ar_coeff_shift_minus_6 + 6
for ( y = 3; y < chromaH; y++ ) {
for ( x = 3; x < chromaW  3; x++ ) {
s0 = 0
s1 = 0
pos = 0
for ( deltaRow = ar_coeff_lag; deltaRow <= 0; deltaRow++ ) {
for ( deltaCol = ar_coeff_lag; deltaCol <= ar_coeff_lag; deltaCol++ ) {
c0 = ar_coeffs_cb_plus_128[ pos ]  128
c1 = ar_coeffs_cr_plus_128[ pos ]  128
if ( deltaRow == 0 && deltaCol == 0 ) {
if ( num_y_points > 0 ) {
luma = 0
lumaX = ( (x  3) << subsampling_x ) + 3
lumaY = ( (y  3) << subsampling_y ) + 3
for ( i = 0; i <= subsampling_y; i++ )
for ( j = 0; j <= subsampling_x; j++ )
luma += LumaGrain[ lumaY + i ][ lumaX + j ]
luma = Round2( luma, subsampling_x + subsampling_y )
s0 += luma * c0
s1 += luma * c1
}
break
}
s0 += CbGrain[ y + deltaRow ][ x + deltaCol ] * c0
s1 += CrGrain[ y + deltaRow ][ x + deltaCol ] * c1
pos++
}
}
CbGrain[ y ][ x ] = Clip3( GrainMin, GrainMax, CbGrain[ y ][ x ] + Round2( s0, shift ) )
CrGrain[ y ][ x ] = Clip3( GrainMin, GrainMax, CrGrain[ y ][ x ] + Round2( s1, shift ) )
}
}
Note: When num_y_points is equal to 0, this process may use uninitialized values within ar_coeffs_y_plus_128 to compute LumaGrain. However, LumaGrain will never be read in this case so it does not matter what values are constucted. Similarly, when num_cr_points/num_cb_points are equal to 0 and chroma_scaling_from_luma is equal to 0, the CbGrain/CrGrain arrays will never be read.
Scaling lookup initialization process
This process computes 3 lookup tables for the different color components.
Each lookup table ScalingLut[ plane ] contains 256 entries constructed by a piecewise linear interpolation of the given points as follows:
for ( plane = 0; plane < NumPlanes; plane++ ) {
if ( plane == 0  chroma_scaling_from_luma )
numPoints = num_y_points
else if ( plane == 1 )
numPoints = num_cb_points
else
numPoints = num_cr_points
if ( numPoints == 0 ) {
for ( x = 0; x < 256; x++ ) {
ScalingLut[ plane ][ x ] = 0
}
} else {
for ( x = 0; x < get_x( plane, 0 ); x++ ) {
ScalingLut[ plane ][ x ] = get_y( plane, 0 )
}
for ( i = 0; i < numPoints  1; i++ ) {
deltaY = get_y( plane, i + 1 )  get_y( plane, i )
deltaX = get_x( plane, i + 1 )  get_x( plane, i )
delta = deltaY * ( ( 65536 + (deltaX >> 1) ) / deltaX )
for ( x = 0; x < deltaX; x++ ) {
v = get_y( plane, i ) + ( ( x * delta + 32768 ) >> 16 )
ScalingLut[ plane ][ get_x( plane, i ) + x ] = v
}
}
for ( x = get_x( plane, numPoints  1 ); x < 256; x++ ) {
ScalingLut[ plane ][ x ] = get_y( plane, numPoints  1 )
}
}
}
where the functions get_x and get_y return the coordinates for a specific point and are specified as:
get_x( plane, i ) {
if ( plane == 0  chroma_scaling_from_luma )
return point_y_value[ i ]
else if ( plane == 1 )
return point_cb_value[ i ]
else
return point_cr_value[ i ]
}
get_y( plane, i ) {
if ( plane == 0  chroma_scaling_from_luma )
return point_y_scaling[ i ]
else if ( plane == 1 )
return point_cb_scaling[ i ]
else
return point_cr_scaling[ i ]
}
Add noise synthesis process
The inputs to this process are:

variables w and h specifying the width and height of the frame,

variables subX and subY specifying the subsampling parameters of the frame.
This process combines the film grain with the image data.
First an array of noise data noiseStripe is generated for each 32 luma sample high stripe of the image.
noiseStripe[ lumaNum ][ 0 ] is 34 samples high and w samples wide (a few additional samples across are actually written to the array, but these are never read) and contains noise for the luma component.
noiseStripe[ lumaNum ][ 1 ] and noiseStripe[ lumaNum ][ 2 ] are (34 >> subY) samples high and Round2(w, subX) samples wide and contain noise for the chroma components.
noiseStripe represents the result of constructing square grain blocks and blending horizontally adjacent blocks together (although blending is only applied if overlap_flag is equal to 1) and is constructed as follows:
lumaNum = 0
for ( y = 0; y < (h + 1)/2 ; y += 16 ) {
RandomRegister = grain_seed
RandomRegister ^= ((lumaNum * 37 + 178) & 255) << 8
RandomRegister ^= ((lumaNum * 173 + 105) & 255)
for ( x = 0; x < (w + 1)/2 ; x += 16 ) {
rand = get_random_number( 8 )
offsetX = rand >> 4
offsetY = rand & 15
for ( plane = 0 ; plane < NumPlanes; plane++ ) {
planeSubX = ( plane > 0) ? subX : 0
planeSubY = ( plane > 0) ? subY : 0
planeOffsetX = planeSubX ? 6 + offsetX : 9 + offsetX * 2
planeOffsetY = planeSubY ? 6 + offsetY : 9 + offsetY * 2
for ( i = 0; i < 34 >> planeSubY ; i++ ) {
for ( j = 0; j < 34 >> planeSubX ; j++ ) {
if ( plane == 0 )
g = LumaGrain[ planeOffsetY + i ][ planeOffsetX + j ]
else if ( plane == 1 )
g = CbGrain[ planeOffsetY + i ][ planeOffsetX + j ]
else
g = CrGrain[ planeOffsetY + i ][ planeOffsetX + j ]
if ( planeSubX == 0 ) {
if ( j < 2 && overlap_flag && x > 0 ) {
old = noiseStripe[ lumaNum ][ plane ][ i ][ x * 2 + j ]
if ( j == 0 ) {
g = old * 27 + g * 17
} else {
g = old * 17 + g * 27
}
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
noiseStripe[ lumaNum ][ plane ][ i ][ x * 2 + j ] = g
} else {
if ( j == 0 && overlap_flag && x > 0 ) {
old = noiseStripe[ lumaNum ][ plane ][ i ][ x + j ]
g = old * 23 + g * 22
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
noiseStripe[ lumaNum ][ plane ][ i ][ x + j ] = g
}
}
}
}
}
lumaNum++
}
Then the noise stripes are blended together to form a noise image noiseImage as follows:
for ( plane = 0; plane < NumPlanes; plane++ ) {
planeSubX = ( plane > 0) ? subX : 0
planeSubY = ( plane > 0) ? subY : 0
for ( y = 0; y < ( (h + planeSubY) >> planeSubY ) ; y++ ) {
lumaNum = y >> ( 5  planeSubY )
i = y  (lumaNum << ( 5  planeSubY ) )
for ( x = 0; x < ( (w + planeSubX) >> planeSubX) ; x++ ) {
g = noiseStripe[ lumaNum ][ plane ][ i ][ x ]
if ( planeSubY == 0 ) {
if ( i < 2 && lumaNum > 0 && overlap_flag ) {
old = noiseStripe[ lumaNum  1 ][ plane ][ i + 32 ][ x ]
if ( i == 0 ) {
g = old * 27 + g * 17
} else {
g = old * 17 + g * 27
}
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
} else {
if ( i < 1 && lumaNum > 0 && overlap_flag ) {
old = noiseStripe[ lumaNum  1 ][ plane ][ i + 16 ][ x ]
g = old * 23 + g * 22
g = Clip3( GrainMin, GrainMax, Round2(g, 5) )
}
}
noiseImage[ plane ][ y ][ x ] = g
}
}
}
Note: Although this process is specified in terms of full size noiseStripe and noiseImage arrays, the reference code shows how it is possible to implement the grain synthesis with just 2 line buffers for luma, and 1 line buffer for each chroma component.
Finally, the noise is blended with the original image data as follows:
if ( clip_to_restricted_range ) {
minValue = 16 << (BitDepth  8)
maxLuma = 235 << (BitDepth  8)
if ( matrix_coefficients == MC_IDENTITY )
maxChroma = maxLuma
else
maxChroma = 240 << (BitDepth  8)
} else {
minValue = 0
maxLuma = (256 << (BitDepth  8))  1
maxChroma = maxLuma
}
ScalingShift = grain_scaling_minus_8 + 8
for ( y = 0; y < ( (h + subY) >> subY) ; y++ ) {
for ( x = 0; x < ( (w + subX) >> subX) ; x++ ) {
lumaX = x << subX
lumaY = y << subY
lumaNextX = Min( lumaX + 1, w  1 )
if ( subX )
averageLuma = Round2( OutY[ lumaY ][ lumaX ] + OutY[ lumaY ][ lumaNextX ], 1 )
else
averageLuma = OutY[ lumaY ][ lumaX ]
if ( num_cb_points > 0  chroma_scaling_from_luma ) {
orig = OutU[ y ][ x ]
if ( chroma_scaling_from_luma ) {
merged = averageLuma
} else {
combined = averageLuma * ( cb_luma_mult  128 ) + orig * ( cb_mult  128 )
merged = Clip1( ( combined >> 6 ) + ( (cb_offset  256 ) << (BitDepth  8) ) )
}
noise = noiseImage[ 1 ][ y ][ x ]
noise = Round2( scale_lut( 1, merged ) * noise, ScalingShift )
OutU[ y ][ x ] = Clip3( minValue, maxChroma, orig + noise )
}
if ( num_cr_points > 0  chroma_scaling_from_luma) {
orig = OutV[ y ][ x ]
if ( chroma_scaling_from_luma ) {
merged = averageLuma
} else {
combined = averageLuma * ( cr_luma_mult  128 ) + orig * ( cr_mult  128 )
merged = Clip1( ( combined >> 6 ) + ( (cr_offset  256 ) << (BitDepth  8) ) )
}
noise = noiseImage[ 2 ][ y ][ x ]
noise = Round2( scale_lut( 2, merged ) * noise, ScalingShift )
OutV[ y ][ x ] = Clip3( minValue, maxChroma, orig + noise )
}
}
}
for ( y = 0; y < h ; y++ ) {
for ( x = 0; x < w ; x++ ) {
orig = OutY[ y ][ x ]
noise = noiseImage[ 0 ][ y ][ x ]
noise = Round2( scale_lut( 0, orig ) * noise, ScalingShift )
if ( num_y_points > 0 ) {
OutY[ y ][ x ] = Clip3( minValue, maxLuma, orig + noise )
}
}
}
where scale_lut is a function that performs a piecewise linear interpolation into the appropriate scaling table. The scale_lut function is specified as follows:
scale_lut( plane, index ) {
shift = BitDepth  8
x = index >> shift
rem = index  ( x << shift )
if ( BitDepth == 8  x == 255) {
return ScalingLut[ plane ][ x ]
} else {
start = ScalingLut[ plane ][ x ]
end = ScalingLut[ plane ][ x + 1 ]
return start + Round2( (end  start) * rem, shift )
}
}
Motion field motion vector storage process
This process applies some filtering and reordering to the motion vectors to prepare them for storage as part of the reference frame update process.
The following applies for row = 0..MiRows1, for col = 0..MiCols1:
MfRefFrames[ row ][ col ] = NONE
MfMvs[ row ][ col ][ 0 ] = 0
MfMvs[ row ][ col ][ 1 ] = 0
for ( list = 0; list < 2; list++ ) {
r = RefFrames[ row ][ col ][ list ]
if ( r > INTRA_FRAME ) {
refIdx = ref_frame_idx[ r  LAST_FRAME ]
dist = get_relative_dist( RefOrderHint[ refIdx ], OrderHint )
if ( dist < 0 ) {
mvRow = Mvs[ row ][ col ][ list ][ 0 ]
mvCol = Mvs[ row ][ col ][ list ][ 1 ]
if ( Abs( mvRow ) <= REFMVS_LIMIT && Abs( mvCol ) <= REFMVS_LIMIT ) {
MfRefFrames[ row ][ col ] = r
MfMvs[ row ][ col ][ 0 ] = mvRow
MfMvs[ row ][ col ][ 1 ] = mvCol
}
}
}
}
Note: Although this process stores all the motion vectors into MfMvs, only the values where row and col are both odd will affect the decoding process.
Reference frame update process
This process is invoked as the final step in decoding a frame.
The inputs to this process are the decoded samples for the current frame LrFrame[ plane ][ x ][ y ].
The output from this process is an updated set of reference frames and previous motion vectors.
For each value of i from 0 to NUM_REF_FRAMES  1, the following applies if bit i of refresh_frame_flags is equal to 1 (i.e. if (refresh_frame_flags >> i) & 1 is equal to 1):

RefValid[ i ] is set equal to 1.

RefFrameId[ i ] is set equal to current_frame_id.

RefUpscaledWidth[ i ] is set equal to UpscaledWidth.

RefFrameWidth[ i ] is set equal to FrameWidth.

RefFrameHeight[ i ] is set equal to FrameHeight.

RefRenderWidth[ i ] is set equal to RenderWidth.

RefRenderHeight[ i ] is set equal to RenderHeight.

RefMiCols[ i ] is set equal to MiCols.

RefMiRows[ i ] is set equal to MiRows.

RefFrameType[ i ] is set equal to frame_type.

RefSubsamplingX[ i ] is set equal to subsampling_x.

RefSubsamplingY[ i ] is set equal to subsampling_y.

RefBitDepth[ i ] is set equal to BitDepth.

SavedOrderHints[ i ][ j + LAST_FRAME ] is set equal to OrderHints[ j + LAST_FRAME ] for j = 0..REFS_PER_FRAME1.

FrameStore[ i ][ 0 ][ y ][ x ] is set equal to LrFrame[ 0 ][ y ][ x ] for x = 0..UpscaledWidth1, for y = 0..FrameHeight1.

FrameStore[ i ][ plane ][ y ][ x ] is set equal to LrFrame[ plane ][ y ][ x ] for plane = 1..2, for x = 0..((UpscaledWidth + subsampling_x) >> subsampling_x)  1, for y = 0..((FrameHeight + subsampling_y) >> subsampling_y)  1.

SavedRefFrames[ i ][ row ][ col ] is set equal to MfRefFrames[ row ][ col ] for row = 0..MiRows1, for col = 0..MiCols1.

SavedMvs[ i ][ row ][ col ][ comp ] is set equal to MfMvs[ row ][ col ][ comp ] for comp = 0..1, for row = 0..MiRows1, for col = 0..MiCols1.

SavedGmParams[ i ][ ref ][ j ] is set equal to gm_params[ ref ][ j ] for ref = LAST_FRAME..ALTREF_FRAME, for j = 0..5.

SavedSegmentIds[ i ][ row ][ col ] is set equal to SegmentIds[ row ][ col ] for row = 0..MiRows1, for col = 0..MiCols1.

The function save_cdfs( i ) is invoked (see below).

If film_grain_params_present is equal to 1, the function save_grain_params( i ) is invoked (see below).

The function save_loop_filter_params( i ) is invoked (see below).

The function save_segmentation_params( i ) is invoked (see below).
For each value of i from 0 to NUM_REF_FRAMES  1, the following applies if bit i of refresh_frame_flags is equal to 1 (i.e. if (refresh_frame_flags >> i) & 1 is equal to 1):
 RefOrderHint[ i ] is set equal to OrderHint.
save_cdfs( ctx ) is a function call that indicates that all the CDF arrays are saved into frame context number ctx in the range 0 to (NUM_REF_FRAMES  1). When this function is invoked the following takes place:
 A copy of each CDF array mentioned in the semantics for init_coeff_cdfs and init_non_coeff_cdfs is saved in an area of memory indexed by ctx.
save_grain_params( i ) is a function call that indicates that all the syntax elements that can be read in film_grain_params should be saved into an area of memory indexed by i.
save_loop_filter_params( i ) is a function call that indicates that the values of loop_filter_ref_deltas[ j ] for j = 0 .. TOTAL_REFS_PER_FRAME1, and the values of loop_filter_mode_deltas[ j ] for j = 0 .. 1 should be saved into an area of memory indexed by i.
save_segmentation_params( i ) is a function call that indicates that the values of FeatureEnabled[ j ][ k ] and FeatureData[ j ][ k ] for j = 0 .. MAX_SEGMENTS1, for k = 0 .. SEG_LVL_MAX1 should be saved into an area of memory indexed by i.
Note: Although this process stores all the motion vectors into SavedMvs, only the values where row and col are both odd will affect the decoding process.
Reference frame loading process
This process is the reverse of the reference frame update process specified in Reference frame update process. It loads saved values for a previous reference frame back into the current frame variables. The index of the saved reference frame to load is given by the syntax element frame_to_show_map_idx.

current_frame_id is set equal to RefFrameId[ i ].

UpscaledWidth is set equal to RefUpscaledWidth[ frame_to_show_map_idx ].

FrameWidth is set equal to RefFrameWidth[ frame_to_show_map_idx ].

FrameHeight is set equal to RefFrameHeight[ frame_to_show_map_idx ].

RenderWidth is set equal to RefRenderWidth[ frame_to_show_map_idx ].

RenderHeight is set equal to RefRenderHeight[ frame_to_show_map_idx ].

MiCols is set equal to RefMiCols[ frame_to_show_map_idx ].

MiRows is set equal to RefMiRows[ frame_to_show_map_idx ].

subsampling_x is set equal to RefSubsamplingX[ frame_to_show_map_idx ].

subsampling_y is set equal to RefSubsamplingY[ frame_to_show_map_idx ].

BitDepth is set equal to RefBitDepth[ frame_to_show_map_idx ].

OrderHint is set equal to RefOrderHint[ frame_to_show_map_idx ].

OrderHints[ j + LAST_FRAME ] is set equal to SavedOrderHints[ frame_to_show_map_idx ][ j + LAST_FRAME ] for j = 0..REFS_PER_FRAME1.

LrFrame[ 0 ][ y ][ x ] is set equal to FrameStore[ frame_to_show_map_idx ][ 0 ][ y ][ x ] for x = 0..UpscaledWidth1, for y = 0..FrameHeight1.

LrFrame[ plane ][ y ][ x ] is set equal to FrameStore[ frame_to_show_map_idx ][ plane ][ y ][ x ] for plane = 1..2, for x = 0..((UpscaledWidth + subsampling_x) >> subsampling_x)  1, for y = 0..((FrameHeight + subsampling_y) >> subsampling_y)  1.

MfRefFrames[ row ][ col ] is set equal to SavedRefFrames[ frame_to_show_map_idx ][ row ][ col ] for row = 0..MiRows1, for col = 0..MiCols1.

MfMvs[ row ][ col ][ comp ] is set equal to SavedMvs[ frame_to_show_map_idx ][ row ][ col ][ comp ] for comp = 0..1, for row = 0..MiRows1, for col = 0..MiCols1.

gm_params[ ref ][ j ] is set equal to SavedGmParams[ frame_to_show_map_idx ][ ref ][ j ] for ref = LAST_FRAME..ALTREF_FRAME, for j = 0..5.

SegmentIds[ row ][ col ] is set equal to SavedSegmentIds[ frame_to_show_map_idx ][ row ][ col ] for row = 0..MiRows1, for col = 0..MiCols1.

The function load_cdfs( frame_to_show_map_idx ) is invoked.

If film_grain_params_present is equal to 1, the function load_grain_params( frame_to_show_map_idx ) is invoked (see Film grain params semantics).

The function load_loop_filter_params( frame_to_show_map_idx ) is invoked (see below).

The function load_segmentation_params( frame_to_show_map_idx ) is invoked (see below).
load_loop_filter_params( i ) is a function call that indicates that the values of loop_filter_ref_deltas[ j ] for j = 0 .. TOTAL_REFS_PER_FRAME1, and the values of loop_filter_mode_deltas[ j ] for j = 0 .. 1 should be loaded from an area of memory indexed by i.
load_segmentation_params( i ) is a function call that indicates that the values of FeatureEnabled[ j ][ k ] and FeatureData[ j ][ k ] for j = 0 .. MAX_SEGMENTS1, for k = 0 .. SEG_LVL_MAX1 should be loaded from an area of memory indexed by i.