Less Slow C++
Earlier this year, I took a month to reexamine my coding habits and rethink some past design choices. I hope to rewrite and improve my FOSS libraries this year, and I needed answers to a few questions first. Perhaps some of these questions will resonate with others in the community, too. - Are coroutines viable for high-performance work? - Should I use SIMD intrinsics for clarity or drop to assembly for easier library distribution? - Has hardware caught up with vectorized scatter/gather in AVX-512 & SVE? - How do secure enclaves & pointer tagging differ on Intel, Arm, & AMD? - What's the throughput gap between CPU and GPU Tensor Cores (TCs)? - How costly are misaligned memory accesses & split-loads, and what gains do non-temporal loads/stores offer? - Which parts of the standard library hit performance hardest? - How do error-handling strategies compare overhead-wise? - What's the compile-time vs. run-time trade-off for lazily evaluated ranges? - Wha
DeepCamp AI