Threaded and scalable to utilize multiple CPUs Vectorized for efficient use of multiple FPUs Tuned to take advantage of non-uniform memory architectures and caches