diff --git a/www/contribute/producing-an-ebook-step-by-step.php b/www/contribute/producing-an-ebook-step-by-step.php index 16c254f8..ec25c3ab 100644 --- a/www/contribute/producing-an-ebook-step-by-step.php +++ b/www/contribute/producing-an-ebook-step-by-step.php @@ -48,7 +48,7 @@ require_once('Core.php');
Each of those sources allows you to filter results by publication date, so make sure you select = PD_YEAR ?> and earlier to ensure they’re in the U.S. public domain.
-If you can’t find scans of your book at the above sources, and you’re using a Project Gutenberg transcription as source material, there’s a good chance that PGDP (the sister project of Project Gutenberg that does the actual transcriptions) has a copy of the scans they used accessible in their archives. You should only use the PGDP archives as a last resort; because their scans are not searchable, verifying typos becomes extremely time-consuming.
+If you can’t find scans of your book at the above sources, and you’re using a Project Gutenberg transcription as source material, there’s a good chance that PGDP (the sister project of Project Gutenberg that does the actual transcriptions) has a copy of the scans they used accessible in their archives. You should only use the PGDP archives as a last resort; because their scans are not searchable, verifying typos becomes extremely time-consuming.
Please keep the following important notes in mind when searching for page scans:
The file we downloaded contains the entire work. Jekyll is a short work, but for longer work it quickly becomes impractical to have the entire text in one file. Not only is it a pain to edit, but ereaders often have trouble with extremely large files.
The next step is to split the file at logical places; that usually means at each chapter break. For works that contain their chapters in larger “parts,” the part division should also be its own file. For example, see Treasure Island.
To split the work, we use se split-file
. se split-file
takes a single file and breaks it in to a new file every time it encounters the markup <!--se:split-->
. se split-file
automatically includes basic header and footer markup in each split file.
Notice that in our source file, each chapter is marked with an <h2>
element. We can use that to our advantage and save ourselves the trouble of adding the <!--se:split-->
markup by hand:
sed --in-place "s|<h2|<\!--se:split--><h2|g" src/epub/text/body.xhtml
- (Note the slash before the !
for compatibility with some shells.)
Notice that in our source file, each chapter is marked with an <h2>
element. We can use that to our advantage and save ourselves the trouble of adding the <!--se:split-->
markup by hand:
sed --in-place "s|<h2|<!--se:split--><h2|g" src/epub/text/body.xhtml
Now that we’ve added our markers, we split the file. se split-file
puts the results in our current directory and conveniently names them by chapter number.
se split-file src/epub/text/body.xhtml mv chapter* src/epub/text/
Once we’re happy that the source file has been split correctly, we can remove it.
rm src/epub/text/body.xhtml
Typography rules for times. Wrap a.m. and p.m. in <abbr class="time">
and add a no-break space between digits and a.m. or p.m.
Words or phrases in foreign languages should always be marked up with <i xml:lang="TAG">
, where TAG is an IETF language tag. This app can help you look them up. If the text uses fictional or unspecific languages, use the x-
prefix and make up a subtag yourself.
Words or phrases in foreign languages should always be marked up with <i xml:lang="TAG">
, where TAG is an IETF language tag. This website can help you look them up. If the text uses fictional or unspecific languages, use the x-
prefix and make up a subtag yourself.
Semantics for poetry, verse, and song: Many Gutenberg productions use the <pre>
element to format poetry, verse, and song. This is, of course, semantically incorrect. See the Poetry section of the SEMoS for templates on how to semantically format poetry, verse, and song.
Do run this tool on prose. Don’t run this tool on poetry.
se modernize-spelling .
After you run the tool, you must check what the tool did to confirm that each removed hyphen is correct. Sometimes the tool will remove a hyphen that needs to be included for clarity, or one that changes the meaning of the word, or it may result in a word that just doesn’t seem right. Re-introducing a hyphen is OK in these cases.
-Here’s a real-world example of where se modernize-spelling
made the wrong choice: In The Picture of Dorian Gray chapter 11, Oscar Wilde writes:
Here’s a real-world example of where se modernize-spelling
made the wrong choice: In The Picture of Dorian Gray chapter 11, Oscar Wilde writes:
@@ -300,7 +299,7 @@ proceed to seal up my confession, I bring the life of that unhappy Henry JekyllHe possessed a gorgeous cope of crimson silk and gold-thread damask…