Historically, GPUs have been the go-to for computer vision training, providing excellent performance for training different model types. But, GPU-optimized computing is not your only option for running computer vision models. State-of-the-art custom-designed CPUs such as the Intel “Ice Lake” component are promising alternatives to GPU-optimized computing.

In this guide, we are going to compare the Intel c6i Ice Lake Amazon Web Services (AWS) Instance against three other common AWS GPU instances. Without further ado, let’s begin. In the video below, we walk through our findings. These findings are documented in more depth later in the post, accompanied by more information about the Ice Lake processor.

What is the Intel c6i “Ice Lake” CPU?

Amazon EC2 C6i (“Ice Lake”) instances are powered by 3rd Generation Intel Xeon Scalable processors, proven to deliver up to 15% better price performance than C5 instances for a wide variety of workloads. The Intel C6i AWS instance type is a compute-optimized instance offering that is designed to provide an excellent balance of compute resources and cost.

There are many features of the C6i instance that make it a compelling alternative for computer vision applications.

First, C6i instances feature a 2:1 ratio of memory to vCPU, similar to C5 instances. But, the C6i instances support up to 128 vCPUs per instance, 33% more than C5 instances. This will give you faster performance on many compute-intensive applications, from training computer vision models to processing data.

C6i instances feature twice the networking bandwidth of C5 instances, making them an ideal fit for compute-intensive workloads. This includes batch processing, distributed analytics, high performance computing (HPC), ad serving, highly scalable multiplayer gaming, and video encoding.

They were released into general availability in October of 2021 and are available in the following 9 sizes:

Name

vCPUs

Memory

(GiB)

Network Bandwidth

(Gbps)

EBS Throughput

(Gbps)

c6i.large

2

4

Up to 12.5

Up to 10

c6i.xlarge

4

8

Up to 12.5

Up to 10

c6i.2xlarge

8

16

Up to 12.5

Up to 10

c6i.4xlarge

16

32

Up to 12.5

Up to 10

c6i.8xlarge

32

64

12.5

10

c6i.12xlarge

48

96

18.75

15

c6i.16xlarge

64

128

25

20

c6i.24xlarge

96

192

37.5

30

c6i.32xlarge

128

256

50

40

All C6i instances offer:

  • Memory capacity: New larger sizes with up to 128 vCPUs and 256 GiB of memory that you can use to consolidate workloads on fewer instances.
  • High storage capacity: Up to 7.6 TB of local NVMe-based SSD block-level storage, which makes for a great instance type for handling large datasets.
  • EBS Storage: Access to up to 80 Gbps Amazon Elastic Block Store (EBS) bandwidth
  • High local storage throughput: fast local storage throughput of up to 2.1 GB/s.
  • High network throughput: up to 200 Gbps network bandwidth, up to 2x higher to comparable C5n instances.
  • Enhanced efficiency & security: C6i instances are built on the AWS Nitro System, a combination of dedicated hardware and lightweight hypervisor. AWS Nitro delivers almost all of the compute and memory resources of the host hardware to your instances for better overall performance and security.

Below, we will compare the c6i.2xlarge instance against several of the most commonly used GPU instances. Our goal is to demonstrate the performance of Intel hardware for computer vision inference on CPU compared to GPU.

Testing Process

To ensure that we make fair comparisons, we used the parameters and methods documented below across all of our benchmarking experiments.

Single Inference Tests

First, we performed single inference tests on a single image with the following characteristics:

  • A width of 393px and a height of 487px.
  • One annotation file containing data for a class named “helmet”.
  • Inference was performed on a hosted Roboflow endpoint using the “ROBOFLOW 2.0 OBJECT DETECTION (FAST)” model.

Multiple Inference Tests

We then conducted multiple inference tests with the same 100 images across each instance. The testing dataset has the following characteristics:

  • Images varied in size from ~400x400 to ~600x600 pixels
  • The number of annotations in a file ranged from one to three objects.
  • Inference was performed on a hosted Roboflow endpoint using the “ROBOFLOW 2.0 OBJECT DETECTION (FAST)” model.

We used the “mi-003f25e6e2d2db8f1” AWS GPU image for GPU testing. We used the “ami-0574da719dca65348” Ice Lake image for testing with the Intel Ice Lake CPU.

Findings

After completing our benchmarks using the aforementioned specifications, we arrived at the conclusions documented in the table below.

Instance Type

Ice Lake c6i.2xlarge

g4dn.2xlarge

g5.2xlarge

p3.2xlarge

Single inference speed results (ms)

19.23

17.84

16.89

15.07

Multi inference speed results (seconds)

2.16

1.98

1.51

1.38

Multi inference speed results (FPS)

51.8

71

116

122

Cost of instance (on-demand pricing in us-east-1)

$0.34

$0.752

$1.212

$3.06

GPU Instance

Non-GPU - 3rd generation Intel Xeon

T4

A10G

V100

FPS / cost of instance

152

94

95

40

The numbers show that the c6i.2xlarge does not offer the highest possible performance in terms of inference speeds. But, the c6i.2xlarge instance provides the best cost-to-performance ratio. This instance can be an excellent workhorse instance for general computer vision inference needs.

When bumping up to something more expensive, one should consider that the cost increase doesn't scale linearly with performance increase. Higher costs lead to diminishing returns.

Conclusion

The Intel c6i “Ice Lake” CPU chip is a great NVIDIA alternative for consumers looking for good performance at a reasonable price. As an AWS instance type, the c6i offers a great balance of price to performance without the added overhead of renting a GPU instance and comes in a standard range of instance sizes to meet your specific usage requirements.

From an AWS management perspective, running CPU instances over GPU instances keeps things simpler and alleviates the common GPU availability issues that the most hotly demanded GPUs run into. But, if you need the highest possible inference speed, a GPU-based instance may be a more appropriate choice.