TCP is widely used for client-server communication in modern data centers, but TCP packet handling is notoriously CPU intensive. Our goal is to make data center TCP processing in a multi-tenant setting efficient, scalable, flexible, and secure. We propose a unique refactoring of TCP functionality that splits processing between a streamlined fast path for common operations, and an out-of-band slow path. Protocol processing executes in the kernel on dedicated cores that enforce correctness and resource isolation. Applications asynchronously communicate with the kernel through event queues, improving parallelism and cache utilization. We show that our approach can increase RPC throughput by up to 3x compared to Linux. Our fast-path can be offloaded to a programmable NIC to further improve performance and minimize CPU time for network processing. With hardware offload, data packets are delivered directly from application to application, while the NIC and kernel cooperate to enforce correctness and resource isolation. Using an emulation-based methodology, we show that our approach can increase per-core packet throughput by 8.2x compared to the Linux kernel TCP implementation.