196 lines
6.4 KiB
Text
196 lines
6.4 KiB
Text
November 8, 2002
|
|
|
|
Ideas for low-overhead anonymous publishing.
|
|
|
|
Freenet problem:
|
|
In a system where a data retrieval can take up to 10 network hops, redirects
|
|
are rediculous (10 hops to find the pointer file, 10 more hops to find
|
|
the actual data). In Freenet, since SSKs are common, redirects are used
|
|
for almost every piece of data in the system.
|
|
|
|
From the start, the network should be designed to handle the most common
|
|
usage scenareo (fetching the latest version of an anonymous author's content),
|
|
not to handle all sorts of other possible scenareos.
|
|
|
|
The goal of such a system should be usable anonymous publication and
|
|
retrieval. We should not focus (as Freenet does) on side issues such as
|
|
retrieval of past document versions or elimination of data redundancy.
|
|
|
|
|
|
|
|
Simplest solution:
|
|
|
|
Sign(K,D) signs data D with key K.
|
|
Encr(K,D) encrypts data D with key K.
|
|
|
|
Hash(D) produces a fixed-length hash of data D.
|
|
|
|
|
|
A | B concatonates A and B.
|
|
|
|
|
|
An anonymous author generates a private key R and a public key U.
|
|
|
|
The author wants to post a subspace file "index.html", with content C and
|
|
timestamp T.
|
|
|
|
The author generates a random symmetric key K.
|
|
|
|
The author generates the following pointer URL:
|
|
|
|
K/U/index.html
|
|
|
|
|
|
The author generates the following insertion data:
|
|
|
|
T | Encr(K,C) | Sign( U, T | Encr(K,C) )
|
|
|
|
|
|
The author generates the following insertion URL:
|
|
|
|
U/index.html
|
|
|
|
|
|
Problems: U, the author's public key, is visible in the insertion URL. Thus,
|
|
nodes can selectively block particular authors. The second problem is
|
|
that the "file name" is visible in the insertion URL, allowing nodes to
|
|
selectively block certain files.
|
|
|
|
|
|
What if the insertion URL looks like this:
|
|
Hash(U) + Hash("index.html")
|
|
|
|
Problem:
|
|
There is no way for storage nodes to verify that the inserted data actually
|
|
matches the insertion URL.
|
|
|
|
|
|
We have several reasons for assuming that we have one priv/pub key pair per
|
|
author:
|
|
|
|
1. This allows authors to build an anonymous publishing identity and
|
|
repuation, since all URLs that contain a particular U certainly correspond
|
|
to data posted by a particular author.
|
|
|
|
2. Generating priv/pub key pairs is computationally expensive, so we
|
|
want to do this as infrequently as possible.
|
|
|
|
3. Authors only need to manage one priv/pub key pair to publish in the
|
|
network.
|
|
|
|
|
|
Thus, given all of the assumptions and issues raised above, we can see why
|
|
the Freenet paradigm uses redirects.
|
|
|
|
However, we should note the following points about this paradigm:
|
|
|
|
1. Using redirects *doubles* data fetch time in the worst case.
|
|
|
|
2. Generating key pairs, though expensive, only takes a few seconds on
|
|
modern computing hardware.
|
|
|
|
3. Freenet data fetch times, in practice, are take tens of seconds.
|
|
|
|
4. Each data post operation may be followed by many fetch operations for
|
|
that data.
|
|
|
|
5. Readers, in general, do not compare the public keys in a URLs to judge
|
|
whether or not two pieces of content were posted by the same author. They
|
|
generally look at whether or not the two content "links" were grouped together
|
|
on the same "homepage". I.e., they assume that all data grouped together
|
|
into a "freesite" was posted by one author. (This is a fair assumption to make,
|
|
since only one author posted the main page of the freesite, so it is safe
|
|
to assume that the author chose to group the content links together.)
|
|
|
|
|
|
|
|
Dropping the assumtions about keys stated earlier, we can devise a publishing
|
|
mechanism with better properties:
|
|
|
|
|
|
An anonymous author wants to post a new file "index.html", with content C and
|
|
timestamp T.
|
|
|
|
The author generates a new private key R and a public key U.
|
|
|
|
The author generates a random symmetric key K.
|
|
|
|
The author generates the following pointer URL:
|
|
|
|
K/U
|
|
|
|
|
|
The author generates the following insertion data:
|
|
|
|
T | Encr(K,C) | Sign( U, T | Encr(K,C) )
|
|
|
|
|
|
The author generates the following insertion URL:
|
|
|
|
U
|
|
|
|
|
|
Whenever the author wants to post an updated version of this file, s/he uses
|
|
the same R/U pair, inserting new content C' with a new timestamp T'. Storage
|
|
nodes can check timestamps for key collisions and keep only the latest version
|
|
of a content unit.
|
|
|
|
|
|
This scheme has the following properties:
|
|
|
|
1. Storage nodes can verify that content matches a key by checking the
|
|
signature included with the content.
|
|
|
|
2. Storage nodes can obtain a secure timestamp for each unit of content.
|
|
|
|
3. Storage nodes do not have access to unencrypted content.
|
|
|
|
4. Storage nodes cannot tie a piece of content to any particular author.
|
|
|
|
3. Readers, using the K/U URL form, can fetch the content (using U), verify
|
|
its signature and timestamp, and decrypt it (using K).
|
|
|
|
|
|
|
|
We might observe that, with this scheme, readers have lost the ability to
|
|
associate a collection of content with a particular author, since each
|
|
unit of content is signed with a different private key.
|
|
|
|
However, we propose the following simple solution to this problem:
|
|
|
|
Authors can insert a "collection" document with links to all of their work.
|
|
This is similar to a homepage on the web.
|
|
|
|
Since an author can securely update and maintain their collection document,
|
|
readers can be sure that all of the work pointed to by the collection document
|
|
was actually collected by the same author.
|
|
|
|
Note that with *any* scheme, there is no way to guarentee authorship, so
|
|
nothing really is lost here. With Freenet, all we can know for sure is that
|
|
the same person *posted* each of a particular series of documents. In our
|
|
system, we know for sure only that the same person *linked to* each of a
|
|
particular series of documents. Since there is nothing all that sacred about
|
|
posting a document, we claim that nothing sacred is lost.
|
|
|
|
|
|
However, for each document posted, a new priv/pub key pair must be generated.
|
|
This can be computationally expensive and time-consuming for a poster. To
|
|
deal with this issue, each node can build a collection of fresh priv/pub key
|
|
pairs using spare computation cycles (or a thread with low priority) so that
|
|
fresh keys are ready whenever an author wishes to publish new content.
|
|
|
|
|
|
We should note the following trade-offs:
|
|
|
|
We gain:
|
|
The ability to publish and retrieve content securely and anonymously without
|
|
using redirects, while still allowing for reputation building and anonymous
|
|
identity.
|
|
|
|
We lose:
|
|
The ability to factor redundant content out of the network.
|
|
|
|
|
|
However, the only want to factor redundant content out of the network while
|
|
still allowing for high-level pointers to content is by using redirects
|
|
(a high-level pointer that redirects you to a low-level, content hash key).
|