by Ryan Seghers, September 4, 2013
Intel first built H.264 video encoding hardware (called Intel Quick Sync Video) into their Sandy Bridge CPUs in 2011. From the start Quick Sync was a very fast encoding solution, and Intel's next generation CPUs (Ivy Bridge) was a significant improvement. But the common perception was that for both Sandy Bridge and Ivy Bridge the quality of the encoded video was not as good as alternative software encoders. With the release of the latest Intel CPUs code named Haswell, we have a chance to see how Intel has done in their newest Quick Sync implementation. For this article I used q264 for Quick Sync encoding and x264 for software encoding to investigate the performance and quality of Intel Quick Sync Video H.264 encoding on Haswell, Ivy Bridge, and Sandy Bridge chips.
q264 is a Windows command-line program that uses Intel® Quick Sync Video hardware to encode video using the H.264 (AVC) encoding algorithm. Version 0.3.0 (beta) is the latest version and the one used for this test. q264 0.3.0 uses Intel Media SDK 2013.
x264 is the defacto standard open-source H.264 encoder. The test transcodes used this version of x264:
x264 0.130.2273kMod b3065e6 (libswscale 2.2.100) (libavformat 54.63.100) (ffmpegsource 220.127.116.11) built by Komisar on Feb 27 2013, gcc: 4.7.2 (multilib.generic.Komisar) configuration: --bit-depth=8 --chroma-format=all x264 license: GPL version 2 or later libswscale/libavformat/ffmpegsource license: GPL version 2 or laterThis version of x264 can be downloaded here. As of this writing version 2334 is the latest, but 2273 is the most recent version that doesn't crash on the test input file used.
The test machines were:
- sb1: Core i5 2500K 3.3 GHz with Intel HD 3000 Graphics (code name Sandy Bridge), Windows 7 x64 sp1
- ivy1: Core i7 3770K 3.5 GHz with Intel HD 4000 Graphics (code name Ivy Bridge), Windows 7 x64 sp1
- has1: Core i7 4770 3.4 GHz with Intel HD 4600 Graphics (code name Haswell), Windows 8 x64
The ivy1 and has1 systems used Intel HD Graphics driver version 18.104.22.168.3165 (video: 22.214.171.12465) which was the latest as of the test (and this writing). The sb1 system used 126.96.36.199.3062 (video: 188.8.131.5262) which despite being older is apparently the latest version for that platform.
The 1080p test file was Samsung's Above the Earth, a 61 second 29.97 fps m2ts sample with a variety of nature scenery available here.
H.264 and Encoder Parameter Background
There are some things you need to know about H.264 and the encoders tested in order to understand the results. The first is H.264 profiles.
There are many algorithm options the encoder can use during encoding. To provide a simpler way to communicate about these options the standards body created "profiles" which are sets of option values. The profiles are designed to span the range of complexity from simple/fast encoding to complex/slow encoding. The profiles used in this analysis are Baseline, Main, and High. See the Wikipedia H.264 page Profiles section for more information.
The H.264 standard also defines Levels. See the Wikipedia H.264 page Levels section for more information. The level is defined by the output resolution and this test used 1080p at 29.970 fps, so all encodes in this test are Level 4.0.
Encoding quality is always relative to the number of bytes used. You cannot compare encodings without looking at the size, because of course you can have perfect fidelity (compared to the original) by simply copying the original, uncompressed. The number of bytes used in encoding is typically expressed as a bitrate, meaning bytes or bits per second, and the normal unit is Kbps (kilobits per second). Note that despite the fact that efficiency is often used in regards to speed, in this analysis for clarity efficiency is always used to mean quality per byte.
The encoders have various methods of controlling the bitrate. For this test both were given a target bitrate to attempt to hit. For x264 this uses the --bitrate argument. For q264 this uses the -b argument. The encoded video quality of x264 is likely to be better on average if its quality-based rate control mechanism (-crf) or 2-pass encoding were used, however that was not done for this test. Note that the charts are based on actual bitrate, not requested.
In general there is a quality and speed tradeoff with video encoding. The more time the encoder can take per frame the more efficient the encoding can be. So faster encodes usually result in lower efficiency encoding, and vice versa. Both encoders used in this test allow you to specify a parameter that tells the encoder where in this spectrum to operate. The encoders and their speed-quality parameters are:
- x264: The speed-quality parameter is called "preset". The possible values are: placebo, veryslow, slower, slow, medium, fast, faster, veryfast, superfast, and ultrafast. For time and chart clutter reasons this test does not cover all possible values.
- q264: The Intel Quick Sync API defines a "usage" parameter which is an integer from 1 to 7, where 1 is best quality and 7 is highest speed.
There are many command line arguments that affect the encoding process, especially for x264. For this test all the defaults were used unless specified here.
Quality Comparison Caveats
This test uses a single video example (see above) so this test is not representative of different types of video. The example video used here is a nature montage, so animated video content, for example, would be very different and perhaps have very different results. The final frames of the video are all black with a small Samsung logo. These logo frames can bias the results because different encoders and settings take advantage of these identical frames to varying degrees.
There is probably no perfect single automated quality metric. The ideal measure would be one that represents how people perceive the quality of an encoded video across many people with many types of video. This test uses the peak signal-to-noise ratio (PSNR) metric which simply measures the difference between each original frame and its corresponding encoded frame. The more different the encoded frame, the more the encoded video is presumed to be degraded from the original. See the Wikipedia PSNR page for more information.
x264 has an argument that tells it to optimize for PSNR ("--tune psnr") and based on a small test run that argument does improve the efficiency as measured by PSNR so that argument was used for all x264 runs in this test.
This analysis uses the mean PSNR as the measure of quality of an encoded video, where the mean is across all frames in the video. Taking the mean of multiple frames hides potential issues with quality variation among the frames.
Speed Comparison Caveats
Intel's Quick Sync video encoder uses special hardware built into recent Intel CPUs, whereas x264 uses the traditional CPU cores. Therefore comparing speed is comparing apples and oranges to a large degree because the x264 speed will depend on how many cores there are and how fast they are, whereas the q264 speed will depend on the Quick Sync hardware.
Also, note that the different test machines have all different components (e.g. motherboards and RAM) so the CPU differences are not isolated, so the CPU comparisons are also not apples to apples, but may be interesting nevertheless.
q264 currently uses the ffmpeg libraries to decode incoming video. Since the example video is h.264, it is possible that it would be faster to use Quick Sync hardware for both the decode and the encode, though the fact that the CPUs are nearly idle and the Quick Sync hardware is busy encoding would seem to indicate otherwise. Note that this is also a factor in the comparison of x264 to q264: q264 gets the benefit of CPU as well as the Quick Sync hardware, whereas x264 is using the CPU for both decode and encode.