November 8, 2002 Ideas for low-overhead anonymous publishing. Freenet problem: In a system where a data retrieval can take up to 10 network hops, redirects are rediculous (10 hops to find the pointer file, 10 more hops to find the actual data). In Freenet, since SSKs are common, redirects are used for almost every piece of data in the system. From the start, the network should be designed to handle the most common usage scenareo (fetching the latest version of an anonymous author's content), not to handle all sorts of other possible scenareos. The goal of such a system should be usable anonymous publication and retrieval. We should not focus (as Freenet does) on side issues such as retrieval of past document versions or elimination of data redundancy. Simplest solution: Sign(K,D) signs data D with key K. Encr(K,D) encrypts data D with key K. Hash(D) produces a fixed-length hash of data D. A | B concatonates A and B. An anonymous author generates a private key R and a public key U. The author wants to post a subspace file "index.html", with content C and timestamp T. The author generates a random symmetric key K. The author generates the following pointer URL: K/U/index.html The author generates the following insertion data: T | Encr(K,C) | Sign( U, T | Encr(K,C) ) The author generates the following insertion URL: U/index.html Problems: U, the author's public key, is visible in the insertion URL. Thus, nodes can selectively block particular authors. The second problem is that the "file name" is visible in the insertion URL, allowing nodes to selectively block certain files. What if the insertion URL looks like this: Hash(U) + Hash("index.html") Problem: There is no way for storage nodes to verify that the inserted data actually matches the insertion URL. We have several reasons for assuming that we have one priv/pub key pair per author: 1. This allows authors to build an anonymous publishing identity and repuation, since all URLs that contain a particular U certainly correspond to data posted by a particular author. 2. Generating priv/pub key pairs is computationally expensive, so we want to do this as infrequently as possible. 3. Authors only need to manage one priv/pub key pair to publish in the network. Thus, given all of the assumptions and issues raised above, we can see why the Freenet paradigm uses redirects. However, we should note the following points about this paradigm: 1. Using redirects *doubles* data fetch time in the worst case. 2. Generating key pairs, though expensive, only takes a few seconds on modern computing hardware. 3. Freenet data fetch times, in practice, are take tens of seconds. 4. Each data post operation may be followed by many fetch operations for that data. 5. Readers, in general, do not compare the public keys in a URLs to judge whether or not two pieces of content were posted by the same author. They generally look at whether or not the two content "links" were grouped together on the same "homepage". I.e., they assume that all data grouped together into a "freesite" was posted by one author. (This is a fair assumption to make, since only one author posted the main page of the freesite, so it is safe to assume that the author chose to group the content links together.) Dropping the assumtions about keys stated earlier, we can devise a publishing mechanism with better properties: An anonymous author wants to post a new file "index.html", with content C and timestamp T. The author generates a new private key R and a public key U. The author generates a random symmetric key K. The author generates the following pointer URL: K/U The author generates the following insertion data: T | Encr(K,C) | Sign( U, T | Encr(K,C) ) The author generates the following insertion URL: U Whenever the author wants to post an updated version of this file, s/he uses the same R/U pair, inserting new content C' with a new timestamp T'. Storage nodes can check timestamps for key collisions and keep only the latest version of a content unit. This scheme has the following properties: 1. Storage nodes can verify that content matches a key by checking the signature included with the content. 2. Storage nodes can obtain a secure timestamp for each unit of content. 3. Storage nodes do not have access to unencrypted content. 4. Storage nodes cannot tie a piece of content to any particular author. 3. Readers, using the K/U URL form, can fetch the content (using U), verify its signature and timestamp, and decrypt it (using K). We might observe that, with this scheme, readers have lost the ability to associate a collection of content with a particular author, since each unit of content is signed with a different private key. However, we propose the following simple solution to this problem: Authors can insert a "collection" document with links to all of their work. This is similar to a homepage on the web. Since an author can securely update and maintain their collection document, readers can be sure that all of the work pointed to by the collection document was actually collected by the same author. Note that with *any* scheme, there is no way to guarentee authorship, so nothing really is lost here. With Freenet, all we can know for sure is that the same person *posted* each of a particular series of documents. In our system, we know for sure only that the same person *linked to* each of a particular series of documents. Since there is nothing all that sacred about posting a document, we claim that nothing sacred is lost. However, for each document posted, a new priv/pub key pair must be generated. This can be computationally expensive and time-consuming for a poster. To deal with this issue, each node can build a collection of fresh priv/pub key pairs using spare computation cycles (or a thread with low priority) so that fresh keys are ready whenever an author wishes to publish new content. We should note the following trade-offs: We gain: The ability to publish and retrieve content securely and anonymously without using redirects, while still allowing for reputation building and anonymous identity. We lose: The ability to factor redundant content out of the network. However, the only want to factor redundant content out of the network while still allowing for high-level pointers to content is by using redirects (a high-level pointer that redirects you to a low-level, content hash key).