Notes on How to Time-Stamp a Digital Document

This is a summary of Stuart Haber and W. Scott Stornetta’s paper, “How to Time-Stamp a Digital Document.” Referred to here as “HS”. In A.J. Menezes and S. A. Vanstone (eds.), Advances in Cryptology - CRYPTO ‘90, LNCS 537, pp. 437-455 (1991), Springer-Verlag Berlin, Heidelberg.

Original Paper

Paper Setup

In many situations, there is a need to certify the date a document was created or last modified. Methods for time-stamping physical documents rest on two assumptions: (1) records can be examined for telltale signs of tampering; (2) another party whose integrity or impartiality is seen as vouching for the claim. However, these assumptions do not hold for digital documents, which can be tampered with undetectably. To address this, HS propose a method of time-stamping digital documents such that one must time-stamp the data itself (making any change to even one bit apparent) and it should be impossible to stamp a document with a time and date different from the actual one.

They write:

“A good solution to the time-stamping problem is one for which, under reasonable assumptions about the computational abilities of the users of the scheme and about the complexity of a computational problem, and possibly about the trustworthiness of the users, it is difficult or impossible to produce false time-stamps.” (pp. 439-40)

Naive Solution

The naive approach relies on a trusted Time-Stamping Service (TSS) acting as a “digital safety-deposit box.” In its basic form, the client sends the full document to the TSS for time-stamping.

Here is how it works:

Client sends full document $x$ to TSS.
TSS appends current time $t$ to $x$ to create a certificate.
TSS returns certificate to client.

If there is need for verification: compare provided $x$ against TSS’s stored copy and check time $t$.

There are problems with the naive solution.

Compromises privacy in two ways: 3rd party could eavesdrop while document is transmitted, and indefinitely afterward to the TSS itself.
Bandwidth and storage: too much time to send a document.
Incompetence: Copy of document becomes corrupted in transmission to TSS, incorrectly time-stamping it.
Trust is still required of the TSS to not collude with a client.

Two improvements on the naive solution

Hash

To address privacy, bandwidth, and storage issues, HS introduce collision-free one-way hash functions $h$. Instead of sending the full document $x$, the client computes a fixed-length hash $y=h(x)$ (e.g., 128 bits) and sends only $y$ to the TSS. Since finding collisions (two different documents hashing to the same $y$) or pre-images (reversing $y$ to find $x$) is computationally infeasible, time-stamping $y$ is equivalent to time-stamping $x$ without revealing the document’s content. This makes the process efficient and secure against exposure.

They write, “Instead of transmitting his document $x$ to the TSS, a client will send its hash value $h(x)=y$ instead. For the purposes of authentication, time-stamping $y$ is equivalent to time-stamping $x$."(p. 442)

Updated Process:

Client computes hash $y=h(x)$ of document $x$.
Client sends $y$ to TSS.
TSS appends current time $t$, forming the pair $(y, t)$.
Returns certificate containing $(y, t)$ to client.
Verification: Recompute hash from $x$, check it matches $y$, and confirm time $t$.

Digital Signatures for Authenticity and Integrity

To make the certificate verifiable and tamper-proof without needing to re-contact the TSS, HS incorporate a secure digital signature scheme (e.g., public-key based). They say that a “signature scheme is an algorithm for a party, the signer, to tag messages in a way that uniquely identifies the signer” (p. 442). The authors assume the scheme is secure, meaning it can be broken only with negligible probability. The TSS signs the pair $(y, t)$ using its private key, producing a signature σ. The full certificate includes $y$, $t$, and σ. “By checking the signature, the client is assured that the TSS actually did process the request, that the hash was correctly received, and that the correct time is included” (p. 443).

When the TSS receives the hash $y$, it chooses the current time $t$ (based on its own clock/system), forms $(y, t)$, hashes it, and signs the hash with its private key to create the signature σ.

The signature proves:

Authenticity: Only the TSS (with its private key) could have created $σ$ for that exact $(y, t)$.
Integrity: The $(y, t)$ in the certificate hasn’t been altered since signing (any change would make the hash mismatch during verification).

Upon receiving the certificate, the client (or verifier) can immediately check the signature using the TSS’s public key. If it fails, it’s invalid—perhaps due to a processing error (e.g., corruption during signing). This catches “present incompetence” in the sense of botched certificate creation or transmission right at issuance time.

Two Time-Stamping Schemes

The above improvements still permit the TSS to issue false time-stamps. HS tell us that the mechanism they want, ideally, “guarantees that no matter how unscrupulous the TSS is, the time it certifies will always be the correct ones”, even being unable to issue incorrect time-stamps were it to try (p. 443).

There are two high level strategies: one centralized and one distributed. The centralized model still makes use of ideas adjacent to blockchain.

Centralized / Linking Scheme

The centralized model builds off the observation that the sequence of clients requesting time-stamps and the hashes they provide cannot be known in advance. (p. 444) The key insight is, “if we include bits from the previous sequence of client requests in the signed certificate, then we know that the time-stamp occurred after these requests.”

Prevents backdating: You cannot fake an earlier time-stamp because doing so would break the chain. If you tried to insert a false earlier time-stamp, it would reference previous requests that either don’t exist or are already linked to by genuine later time-stamps, creating an inconsistent chain that reveals the forgery.

Prevents postdating: You cannot fake a later time-stamp because you cannot predict or forge the hashes of future client requests that haven’t been submitted yet. The linking information must come from actual prior requests in the sequence.

This creates a temporal constraint in both directions: the time-stamp must have occurred after the previous requests it references, but before any subsequent requests that reference it.

HS say there are two variants on this linking scheme.

The simpler scheme only includes the hash of the last request; the stronger scheme includes every prior hash. In the stronger model, if someone loses a certificate in the middle, they can reconstruct it because every later time-stamp acts like a backup.

Let:

$$C_n = (n, t_n, ID_n, y_n, L_n) $$

n is the sequence number; the time tn; the client number IDn; the hash value yn from the request; and the linking information which comes from the previous issued certificate:

$$L_n = t_{n-1}, ID_{n-1}, y_{n-1}, H(L_{n-1})$$

where $H$ is a collision-free hash function. Notice that in this simpler model, the linking information is only to the previous issued certificate.

The TSS issues signed certificates, sequentially numbered time-stamp certificates.

Let’s look at an example of what the certificate might look like. This illustration uses the SHA-1 style hashes that would have been used in 1991, but you would use something more modern (e.g. SHA-256) today.

{
  "sequence_number_n": 42,
  "client_id": "ID_42",
  "document_hash_yn": "a591a6d40bf420404a011733cfb3b190d62c65bf0bcda32b57b277d9abd97f4", // h(x) where x is your document
  "time_stamp_tn": "1991-08-18 14:30:00", // Time from TSS (paper doesn't specify format)
  "linking_info_Ln": {
    "previous_time_tn_minus_1": "1991-08-18 14:29:00",
    "previous_client_id": "ID_41",
    "previous_document_hash_yn_minus_1": "e99f33420f577ee8ce54f4de1971cdb0b02b6a57b8b8e7b0c84c9c7e3f8a8e7",
    "hash_of_previous_Ln_minus_1": "123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef"
  },
  "signature_sigma": "3045022100a2b3c4d5e6f7890abcdeff0123456789abcdef0123456789abcdef012345678902200123456789abcdef0123456789abcdef0123456789abcdef0123456789abcdef", // Fictional signature on H(n + tn + IDn + yn + Ln)
  "hash_function": "MD4" // As suggested in the paper (p. 441); use SHA-256 today
}

In the simple model, clients would need to retain all of these in order for the the sequence to be auditable, and there is only a linear sequence.

In a more complex model, the linking data would not include just the next request but rather the next k requests.

Distributed Trust Scheme

Instead of one central TSS authority, users run a pseudorandom generator. It is an algorithm that “stretches short input seeds to output sequences that are indistinguishable by any feasible algorithm from random sequences” (p. 446). The key property is deterministic consistency: given the same document hash $y$, the pseudorandom generator $G(y)$ will always produce the exact same set of witness client $ID$s. This ensures that anyone verifying the time-stamp later will contact the same witnesses who originally signed it. However, for different document hashes, the algorithm produces outputs that appear random and unpredictable.

The output are going to be IDs of the clients in the system.

$$G(y) = (ID_1, ID_2, … , ID_n)$$

In the distributed scheme, a client sends a the document hash $y$ and their client identifier to all of the clients returned by the pseudorandom generator. They in turn will add a time to the document hash $y$ and the sending client’s $ID$. They send this back. A consequence of there being multiple $ID$s is that their times will differ. The distributed model isn’t after one perfect time; it is after consensus. Each of the nodes are not clones, but are witnesses that something happened within some window.

In this scheme, it must be possible to call up other clients at will and receive from them the required signatures, and there is a public directory of clients. (verbatim p. 447)

Remarks

Tradeoffs

Distributed-trust scheme has all processing take place when a request is made. However, it also requries permitting the ability to call up and get a response at will.

The linking scheme has a short delay in certificate issuance. This delay enables stronger time constraints by incorporating information from subsequent requests.

The linking scheme does not depend on the use of digital signatures because there is only one service, the TSS, that time-stamps them.

Time constraints

HS think their schemes constrain the event of time-stamping in both forward and backward directions. If there were a gap between the document creation event and time-stamping event, then there would only be a forward-constraint.

$D_t, T_{t+1}$

There is nothing to determine which of these are true:

$D_{t-2},T_{t+1}$ $D_{t+1},T_{t+1}$

But we can rule out:

$T_t,D_{t+1}$

since the document could not be created after it is time-stamped.

According to HS, if the time-stamping event were made part of the document creation, then the constrain can be held in both directions. They write hat this idea of making time-stamping part of document creation “could apply to sequential financial transactions, such as a stock trades or currency exchanges, or any sequence of electronic interactions that take place over a given physical interaction” (p. 449).

Theoretical Considerations

Security

HS propose a complexity-theoretic definition of security following Goldwasser and Micali, et al. They want to be able to specify precisely how secure a scheme is against adversaries from creating false time-stamps.

The time-stamping and verification procedures depend on a security parameter $p$. “A time-stamp scheme would be polynomially secure if the success probability of a polynomially bounded adversery who tries to manufacture a bogus time-stamp is smaller than any given polynomial in $1/p$ for sufficiently large $p$.” (p. 449) The parameter $p$ represents the strength or level of the security. A “polynomially bounded adversary” means an attacker whose computational resources (time and space) are limited to polynomial functions of $p$ — essentially, realistic attackers using efficient algorithms, as opposed to those with unlimited computational power. The security parameter, sometimes denoted as (k) or λ, is explicitly an input parameter that the scheme designer or user chooses to set the desired “level of security.” It’s not the result or output of the scheme.

As a parameter $p$ determines the size and complexity of the system’s components. For example: in a hash function, $p$ might set the output length (e.g., 256 bits for SHA-256, implying $p≈256$). In signatures or encryption, it scales key sizes (e.g., RSA with 2048-bit keys for $p≈128$ bits of security against attacks). The larger $p$ is, the harder the security, but the computationally more expensive.

Going back to their quote note now, a time-stamp is “polynomially secure” if a “polynomially bounded adversary” (an attacker limited to computations that grow polynomially with $p$, like real-world efficient algorithms) has a success probability in forging a time-stamp that is “negligible.” This means practically impossible, though not zero.

Application to Time-Stamping Schemes

HS apply the above formal account to the two schemes, saying they can be provably polynomially secure if….

Linking Scheme: if there exist one-way claw-free permutations.

Distributed-trust Scheme: if there is always at most a constant fraction of corruptible clients, and the existence of one-way functions (and therefore the existence of pseudorandom generators and a secure signature scheme)

The distributued-trust scheme looks the most complex but most of that are just the components or algorithms of the scheme itself. The new concepts here are the one-way functions and claw-free permutations. A one-way function is a mathematical function that is “easy” to compute in one direction but “hard” (computationally infeasible) to reverse or invert. Claw-free permutations are a stronger cryptographic primitive: a pair of permutations (bijective functions, meaning one-to-one and onto) where it’s hard to find a “claw”—two different inputs that map to the same output under the two permutations. To understand this, imagine you have two algorithms for shuffling cards. If you had one deck of cards for each algorithm, it should be very hard to know how to order the decks so that their shuffled output came out the same. The “claw” refers to a pair of different starting decks (inputs $x_0$ and $x_1$) that, when shuffled by the two different algorithms ($f_0$ and $f_1$), result in the exact same shuffled order (output $y$), where $f_0(x_0)=f_1(x_1)=y$. The claw-free property makes finding such a pair computationally infeasible.

It appears that HS prefer the distributed-trust scheme when they write, “we would like to reduce the assumptions we require for secure time-stamping to the simple assumption that one-way functions exist.” (p. 450) The linking scheme requires claw-free permutations, which is a stronger commitment.

Practical Considerations

As a practical matter, time-stamping places heavy demand on the reliability of the hash function. If the hash function becomes broken at some time, then all time-stamps prior to the time of breaking would be called into question. This is because someone could reforge old documents to match the broken hash (e.g., via collisions), making every past time-stamp suspect. Not every domain may have such a massive blast radius for a hash function breaking.

I think HS are answering an objection here although they did not present it as one. It seems risky to rely on a system that can have all of its data become questionably true. This is why HS offer a “partial answer.” (It is “partial” because renewal is preventive—it must happen before breakage and requires ongoing migrations—but doesn’t fix already-broken hashes retroactively.)

Time-stamps can be renewed. If there were a second time-stamping service using a different hash function, it can take in both the original document and the time-stamp certificate (including signature) from the first service. So the second service, in other words, can time-stamp the document plus the prior time-stamp before the first hash is broken.

In other words, there is a mechanism in place that helps prevent the easy corruption of all data, provided there is another time-stamping service that can take earlier time-stamps as inputs.

Also, HS say not just any collision is sufficient to break a function. The documents have to be “meaningful”. If, for example, the $H($ “snow is white” $)$ is the same value as $H($ “[enter extremely unlikely string]” $)$, then so what? The authors leave the formalization of this to future work.

Applications

This is not a comprehensive summary, but they cover some benefits of time-stamping.

Makes possible establishing a time of precedence of intellectual property without disclosing its contents. (p. 452)
Determining whether a document has been tampered with. They introduce the idea of document tamper-unpredictability, which they suggest would lead to increase the credibility of business documents.
They intend digital time-stamping to apply not just to text documents but also to digital audio, images, video - other media.

This ends the summary notes of their paper.

Relevance of their paper

Satoshi Nakamoto directly cited Haber and Stornetta’s work in the Bitcoin whitepaper (2008), specifically referencing their linking scheme as a foundational concept for blockchain’s immutable time-stamping. While Bitcoin uses proof-of-work rather than HS’s pseudorandom generator approach, the distributed trust scheme’s core insight—that multiple independent parties can collectively provide time-stamping without a central authority—became fundamental to blockchain design.

The paper’s goal of preventing false time-stamps translates directly to blockchain’s promise of immutable transaction history.