We want to design a unique ID generator in a distributed computing setting. This is not possible through a database’s auto increment id feature, as one database will not support the load.
Requirements
- IDs have to be unique and sortable
- Must fit into 64 bits
- Must be incremental in time, an IDs generated later in the day is larger
Multi-Master setup
If we have N servers in the cluster, then a server with number i
will generate the IDs i, i + N, i + 2N, ...
- Hard to scale up with multiple data centers
- IDs across multiple servers don’t go up with time
- Hard to scale when adding or removing servers
UUID
A Universally Unique Identifier (UUID) is a 128-bit number that’s generated to have a very low probability of collision.
- Very easy to generate, no communication needed between servers
- Doesn’t increase with time
- It is 128-bits, we want 64
Ticket Server
We could have one server which issues IDs. This has the issue that it is a single point of failure and also it might not be able to handle the load
Twitter Snowflake
This divides the 64 bits of the number into multiple sections:
<-1 bit reserved-><----41 bits for timestamp----><--5 bits datacenter id--><--5 bits machine id--><--12 bits sequence number-->
We need a timestamp that can fit in 41 bits. We can take a timestamp with millisecond precision that is offset from a nearby epoch. This is to make sure it fits into 41 bits. We can increase the allocation to timestamp if we want older data.
The sequence number is machine specific and is incremented on each ID issued. It is reset after every millisecond. This allows us to issue multiple IDs in each ms. 12 bits allows us to issue 4096 IDs every millisecond.
Observations:
- Increases with time and doesn’t need communication between machines
- The bits allocated to machine id or timestamp can be adjusted as per requirements.
- This assumes time is in sync across servers, which can be done through NTP.