You’re on cable. That means that you, and all your neighbours are connected to one local node. Cable from the node to your homes is like a shared wire, with the right tool you could see every packet sent to everyone else (as well as your own.) You posted this on Sunday while saying “last night” which means it was Saturday night. In normal (non-covid) times, this would be a busy night for entertainment as most people would be home from work and wanting to relax and be entertained. If everyone is surfing the net, watching Neflix/HBO/Prime video, or downloading torrents (as an example of heavy usage), then there is likely to be a lot of congestion at the node.
While the node is probably uplinked into Comcast’s network via fibre, and has a lot of capacity, it’s not infinite. If there is a lot of traffic originating or destined for your node, some of it will possibly need to be simply dropped in order for the “more important” traffic to get through. This is known as quality of service management, and as you no doubt noticed, Comcast gives their own traffic a higher QOS tag than the ones destined for your Roku. I’m not sure that’s very fair, but it is their network, and their box, and some would argue their right to give their content preferential treatment.
Depending on where you content originates, or where an upload (if you were doing one) was destined, it may have to go to one of many central locations in the internet known as an interchange. At the interchange each of the major ISPs interconnect with each other in what is known as a peering arrangement. These arrangements are physical hardware, back by contractual obligations. Funds change hands based on who handles more traffic. Services like Netflix and HBO and others mostly originate a lot of traffic, so according to the ISPs like Comcast, they need to pay for Comcast to accept the deluge. Depending on contract negotiations, it may be strategic for an ISP to limit the rates of the interchange process by not upgrading the equipment to the level it should be if there was to be no congestion.
As you can see, it’s not really any one thing, device, place or process that causes the process. The Internet was designed from the very beginning to allow for packets to be dropped at any point along a path where congestion occurs. This works fine if it’s an email or a file download, or a torrent transfer (upload and download at the same time,) it poses a problem for any traffic that is time sensitive. You, watching a movie, are expecting your packets of video and audio to arrive at a minimum speed, so that they can be reassembled, decompressed, processed and delivered to your TV at a rate that doesn’t involve annoying pauses.
To deal with this unpredictability, your device would normally have a buffer of some sort. This allows it to basically receive and assemble enough of the movie ahead of the point where you’re currently watching so that a minor glitch doesn’t become apparent to you. Say that buffer is 10 seconds, then, in theory, that would allow a congestion event of up to 9 seconds of blocked or delayed traffic to happen without you noticing it. In average times, maybe that is perfectly enough, but at the busiest time that may not be enough. The players/devices are supposed to be smart about the buffer size, and increase or decrease it based on conditions. The problem, of course, is that a buffer is basically RAM memory, which is fixed in size. You Roku may have a smaller buffer than your Comcast box, for example.
When you really stop to think of all the tech involved, you start to marvel that any of it works at all… 