@rkenmi - Home

Traditional Message Queues vs. Log-based Message Brokers

Traditional Message Queues Traditional message queues are based off of the JMS / AMQP standard. These message brokers focus on a pub/sub model where publishers write messages to a queue and the queue is consumed by subscribers.

message queueevent streamingmessage broker

Distributed scaling with Relational Databases

Background A lot of articles will talk about how to scale databases. Typically, they will talk about the purpose and the general idea of sharding and replication, but often times these topics are explained separately and not so much in conjunction.

horizontal partitioningsql

2PC - Two Phase Commit and Why it Sucks

Background Two Phase Commit (abbreviated 2PC) is a protocol used to achieve atomic writes in distributed systems. It was a novel concept in the 1970's and had good intentions, but in practice the implementations are not too great.

two phase commit2pc

Local Secondary Index vs. Global Secondary Index

Secondary Index A secondary index is used in databases to help speed up queries when we want to grab data from popular columns or if we want to do some type of key range lookup efficiently. Secondary indices are used in relational databases (e.

databaseshorizontal partitioningshardingglobal secondary indexlocal secondary index

Big Data Processing: Batching vs. Streaming

Intro In data processing, we often have to work with large amounts of data. The way in which this data is gathered comes in a few variants: batching, where we aggregate a collection of data (e.g., by hourly time), streaming for data that needs to be processed in real-time, and a unified variant which simply does not distinguish the technical difference between batching and streaming, allowing you to programmatically use the same API for both.

SparkstreamingapacheApache SparkApache Hadoopbatchbatchingbig dataApache BeamApache Flink