Quantcast
Channel: Media & Entertainment Technology » objects
Viewing all articles
Browse latest Browse all 11

AES 4k-8k Audio Panel

$
0
0

October 10. 2014, Audio Engineering Society Convention, Los Angeles—A panel of experts from standards organizations and industry considered the issues of matching audio with the emerging 4k and 8k displays. Fred Willard from Univision moderated the panel, and panelists were Skip Pizzi from NAB, Robert Blerdt from Fraunhofer, Jeff Riedmiller from Dolby, Tim Carrol from Linear Acoustics, Thomas Lund from TC Electronics, and David McIntyre from dts.

McIntyre suggested 5 items for standards: first, the 4k and 8k displays will use high frame rate and high dynamic range. the HEVC codecs and new formats will require new silicon, so it is time to redo audio and piggyback on the changes since the formats will not be back compatible. Second, the new standards will have to address higher quality to match the displays, even though many are calling for reduced bits for audio.

Third, this transition will be challenging, since the formats will not be back compatible and users may not want to invest in completely new systems. Video moved from SD to HD and content went from DVD to Blu-Ray, but 4k as 4 times HD is not good enough to make a change in all the equipment. If 4k includes high frame rate, and high dynamic range, then the difference may be noticeable enough to consider change. Audio is stuck in 5.1 type formats, but the newer formats will offer interactivity.

Fourth, objects for audio will be a part of the solution for getting the audio to match the video in spatial resolution. The increased flexibility and changes in the fundamentals will be interesting. When stereo moved to 5.1, the tools and playback equipment remained similar, but the mix with objects represents a new paradigm and requires a renderer for the objects. The increase in variability in object-based loudness will require a monitor function in the system. A short-term stop-gap is t pre-render in the studio and monitor in 5.1, and all of this monitoring will call for some standard reference.

Finally, the standards for object-based audio and the ecosystems for making, archiving, and distributing the content don't exist. The whole area needs some standard render mechanism, and this issues is under consideration from many organizations. Ideally, this will result in an open standard to ensure interoperability.

Lund described perception and ear-brain processing, and noted that loudness ranges have dropped while overall loudness has increased. Part of the problem is that there is too much hyper-compression and lossy data processing resulting in unpredictable audio. All the steps can change the quality of the sounds and mix, from ingest to transmute, to distribution, and consumption. There is no control of dynamic compression in any step. For example in "Game of Thrones", the average sound level is -24 dB, but playback changes with DRC on or off. This situation causes cumulative uncertainty for all facets of the sound.

The fix is simple. Don't allow any lossy data reduction for audio and use intrinsic loudness normalization. These steps will result in predictable and easy to operate audio that can enable super-natural envelopment for 3-D comparable to a concert hall.

Reidmiller stated that in the US, broadcast changed from analog to stereo in half of the stations. Half of these stations are non-commercial. A lack of standards delayed the roll out and exacerbated the necessary changes in workflow and highlighted the limitations of the technologies. Now, 5.1 plus metadata is complicated. SMPTE standard 292 calls for a 1.5 Gb/s channel for HD-SDI that includes 16 channels of 24-bit audio at 48 kHz.

Unfortunately, the metadata is not working due to separate channels and a lack of standard timing signals to map to embedded audio. The entry to 4k needs a 3-D view of the objects to be relevant. One big problem is how to make all this work in a broadcast environment. The data volume causes a large data transport problem.

When the audio is defined as objects that has to be mixed to fit the environment and carry information to the receiver, there are many tradeoffs that are not all controllable. For example, with 100 elements you need to have 10 tacks of metadata, including time, coordinate system, size, and render in either zone- or speaker snap- mode. SDI workflows are not capable of handling this volume of data flows.

The data volumes will grow to unusable sizes. One 24-bit channel at 48 kHz needs to process 1.152 Mb/s plus the metadata of 1.25-1.5 kb per object per track with 100 objects per track leads to over 70 GB for the audio track on a 90 minute movie. If the metadata are compressed to 4.6 Mb/s, the result is a 3.1 GB file. These files need to be synched with the audio and video within a 10 sample error window or spatial errors occur. Mezzanine compression can bridge and track the metadata and audio data. Standards, which don't currently exist, for object-based audio and metadata need to consider some form of interchange format.

Carrol commented that object-oriented audio still enables personalization. The problem is to a good end encoder into the house. This issue needs industry collaboration. The fact that the new formats are not back compatible means that the work can start with a clean sheet. The changes in infrastructure and other technologies need good, workable standards.

Blerdt stated that some of the issues are resolved in the new MPEG-H standard that was developed by Fraunhofer, Qualcomm, and Technicolor. The new features that are enabled include personalization and surround. For example a listener could change the mix and set a different level for the announcer, change the announcer language, and other functions. the broadcaster only needs to send out one signal with the default mix and the other objects for alternate tracks and levels.

Ongoing experiments are looking at levels of control for the sound, real-time decode and render, and type and quantity of presets. In field tests, the sound field is an 11.1 system broken down into a 7.1 plus 4 for height information. Sound effects are in 5.1 and commentary is in mono. All of these data fits into 448 kb/s.

The overall result is an immersive 3-D sound. It is possible to do a similar job with a 5.1 +4 configuration. The new standard allows for up to 128 channels with 128 objects per channel. The recommended speaker configuration is a 7.1 + 4 to get the higher-order ambisonics. The idea is to transmit the predominant sounds plus ambience in separate tracks that are independent of the speaker fields. If the sound and picture on UHD have over a 15 ° divergence, the effect is noticeable and objectionable.

One challenge is that most consumers will not be able to hear any problems because many will still be using 2.0 or 2.1 sound bars. To address this failure, Franuhofer has designed a 3-d sound bar for UHD TV that goes around the TV set and creates a full 3-D sound field.

The underlying technology will use multi-platform control. It will use the existing AC-3 metadata and additional bit rates for other devices. The system will auto-adapt to the consumer environment including earbuds on phones. A speaker fool-proof render can handle a zero height configuration by doing some psycho-acoustics for height information. The end system will combine streams over disparate networks, so the full ecosystem will have to include codecs and timing synchronization for broadcast delivery.

They expect deployment to occur in four stages. First, to replace the AC-3 with MPEG-H codecs to reduce bit rates by half, or to double the total data. Next, add objects at 20-40 kb/s for each object. Third, add the 5.1 + 4 or 7.1 + 4 configuration to generate height information. The challenge is that the existing SDI will run out of channels. Finally, add dynamic objects. This class of objects will need more metadata and more bandwidth. The standard is expected to be released in Q1 of '15 and is a embedded within the ATSC 3.0 and DTV specifications.

Pizzi talked about the ATSC 3.0 audio standards. This specification is still evolving with full ratification expected in late '15 and product roll-out in '17. The specification has many new requirements and is not back compatible with any existing standards. The addition of channels and objects will improve audio quality and lower bandwidth for audio functions.

The standard will support many configurations from headphones to full 7.1+4 and is adaptable. They are not specifying any render, but instead define render outputs. The new features include personalization, etc. and will have to sync broadcast and broadband streams with low latency. The workgroups are addressing accessibility support, loudness normalization, and adaptive dynamic range control.
 


Viewing all articles
Browse latest Browse all 11

Trending Articles