diff --git a/www/contribute/producing-an-ebook-step-by-step.php b/www/contribute/producing-an-ebook-step-by-step.php index b913dd6d..71ae20b4 100644 --- a/www/contribute/producing-an-ebook-step-by-step.php +++ b/www/contribute/producing-an-ebook-step-by-step.php @@ -111,7 +111,7 @@ proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll
Now that we’ve removed all the cruft from the top and bottom of the file, we’re ready for our first commit.
Each commit has an accompanying message describing the changes we are making. Please use the commit messages as they are written here in this guide as the editors rely on these messages when they review the work.
-Also, try to make one commit per type of change, for example: “fixing typos in chapters 1-18” or “worked on letter formatting.”
+Also, try to make one commit per type of change, for example: “fixing typos in chapters 1-18” or “worked on letter formatting.”
For this first commit:
git add -A git commit -m "Initial commit"
Once we’re happy that the source file has been split correctly, we can remove it.
rm src/epub/text/body.xhtml
If you open up any of the chapter files we now have in the src/epub/text/
folder, you’ll notice that the code isn’t very clean. Paragraphs are split over multiple lines, indentation is all wrong, and so on.
If you try opening a chapter in a web browser, you’ll also likely get an error if the chapter includes any HTML entities, like —
. This is because Gutenberg uses plain HTML, which allows entities, but epub uses XHTML, which doesn’t.
We can fix all of this pretty quickly using se clean
. se clean
accepts as its argument the root of a Standard Ebook directory. We’re already in the root, so we pass it .
.
se clean .
@@ -148,9 +148,10 @@ proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll
</html>
If you look carefully, you’ll notice that the <html>
element has the xml:lang="en-US"
attribute, even though our source text uses British spelling! We have to change the xml:lang
attribute for the source files to match the actual language, which in this case is en-GB. Let’s do that now:
perl -pi -e "s|en-US|en-GB|g" src/epub/text/chapter*
Note that we don’t change the language for the metadata or front/back matter files, like content.opf
, titlepage.xhtml
, or colophon.xhtml
. Those must always be in American spelling, so they’ll always have the en-US language tag.
Once the file split and cleanup is complete, you can perform your second commit.
git add -A git commit -m "Split files and clean"
Now that we have a clean starting point, we can start getting the real work done. se typogrify
can do a lot of the heavy lifting necessary to bring an ebook up to Standard Ebooks typography standards.
Like se clean
, se typogrify
accepts as its argument the root of a Standard Ebook directory.
se typogrify .
Among other things, se typogrify
does the following:
Normalizes spacing in em-, en-, and double-em-dashes, as well as between nested quotation marks, and adds word joiners.
While se typogrify
does a lot of work for you, each ebook is totally different so there’s almost always more work to do that can only be done by hand. In Jekyll, you’ll notice that the chapter titles are in all caps. The S.E. standard requires chapter titles to be in title case, and se titlecase
can do that for us. se titlecase
accepts a string as its argument, and outputs the string in title case.
While se typogrify
does a lot of work for you, each ebook is totally different so there’s almost always more work to do that can only be done by hand. However, you will do a third commit first, to put the automated changes in a separate commit from any manual changes.
git add -A git commit -m "Typogrify"
+ As an example of manual changes that might be needed, in Jekyll, you’ll notice that the chapter titles are in all caps. The S.E. standard requires chapter titles to be in title case, and se titlecase
can do that for us. se titlecase
accepts a string as its argument, and outputs the string in title case.
Once you’ve run se typogrify
and you’ve searched the work for the common issues above, you can perform your second commit.
git add -A git commit -m "Typogrify"
+ Once you’ve searched the work for the common issues above, if any manual changes were necessary, you should perform the fourth commit.
git add -A git commit -m "Manual typography changes"
Transcriptions often have errors, because the O.C.R. software might confuse letters for other, more unusual characters, or because the ebook’s character set got mangled somewhere along the way from the source to your repository. You’ll find most transcription errors when you proofread the text, but right now you use the se find-unusual-characters
tool to see a list of any unusual characters in the transcription. If the tool outputs any, check the source to make sure those characters aren’t errors.
se find-unusual-characters .
+ If any errors had to be corrected, a commit is needed as well.
git add -A git commit -m "Correct transcription errors"
Works often include footnotes, either added by an annotator or as part of the work itself. Since ebooks don’t have a concept of a “page,” there’s no place for footnotes to go. Instead, we convert footnotes to a single endnotes file, which will provide popup references in the final epub.
The endnotes file and the format for endnote links are standardized in the SEMoS.
If you find that you accidentally mis-ordered an endnote, never fear! se shift-endnotes
will allow you to quickly rearrange endnotes in your ebook.
If any footnotes were present and moved to endnotes, do another commit.
git add -A git commit -m "Move footnotes to endnotes"
+ Jekyll doesn’t have any footnotes or endnotes, so we skip this step.
+If a work has illustrations besides the cover and title pages, we include a “list of illustrations” at the end of the book, after the endnotes but before the colophon. The LoI file is also standardized.
-Jekyll doesn’t have any footnotes, endnotes, or illustrations, so we skip this step.
+If an LOI is created, do a corresponding commit.
git add -A git commit -m "Add LOI"
+ Jekyll doesn’t have any illustrations, so we skip this step.
Use se semanticate
to do some common cases for you:
se semanticate .
se semanticate
tries its best to correctly add semantics, but sometimes it’s wrong. For that reason you should review the changes it made before accepting them:
git difftool
+ As we did with typogrify
, we want the automated portion of adding semantics to be in its own commit. After running semanticate
, do another commit.
git commit -am "Semanticate"
Beyond that, adding semantics is mostly a by-hand process. See the SEMoS for a detailed list of the kinds of semantics we expect in a Standard Ebook.
Here’s a short list of some of the more common semantic issues you’ll encounter:
Semantics for poetry, verse, and song: Many Gutenberg productions use the <pre>
element to format poetry, verse, and song. This is, of course, semantically incorrect. See the Poetry section of the SEMoS for templates on how to semantically format poetry, verse, and song.
After you’ve added semantics according to the SEMoS, do another commit.
git commit -am "Semanticate"
+ After you’ve added semantics according to the SEMoS, do another commit.
git commit -am "Manually add additional semantics"
se find-mismatched-diacritics .
+ If any changes had to be made, a corresponding editorial commit should be done as well.
git commit -am "[Editorial] Correct mismatched diacritics"
Similar to se find-mismatched-diacritics
, se find-mismatched-dashes
lists instances where a compound word is spelled both with and without a dash. Dashes in words should be normalized to one or the other style.
se find-mismatched-dashes .
+ If corrections were made, another commit is needed.
git commit -am "[Editorial] Correct mismatched dashes"
<title>
elements