diff --git a/www/contribute/producing-an-ebook-step-by-step.php b/www/contribute/producing-an-ebook-step-by-step.php index e58c35f2..5508c150 100644 --- a/www/contribute/producing-an-ebook-step-by-step.php +++ b/www/contribute/producing-an-ebook-step-by-step.php @@ -115,7 +115,7 @@ proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll
The file we downloaded contains the entire work. Jekyll is a short work, but for longer work it quickly becomes impractical to have the entire text in one file. Not only is it a pain to edit, but ereaders often have trouble with extremely large files.
The next step is to split the file at logical places; that usually means at each chapter break. For works that contain their chapters in larger “parts,” the part division should also be its own file. For example, see Treasure Island.
To split the work, we use se split-file
. se split-file
takes a single file and breaks it in to a new file every time it encounters the markup <!--se:split-->
. se split-file
automatically includes basic header and footer markup in each split file.
Notice that in our source file, each chapter is marked with an <h2>
element. We can use that to our advantage and save ourselves the trouble of adding the <!--se:split-->
markup by hand:
sed --in-place 's|<h2|<!--se:split--><h2|g' src/epub/text/body.xhtml
+ Notice that in our source file, each chapter is marked with an <h2>
element. We can use that to our advantage and save ourselves the trouble of adding the <!--se:split-->
markup by hand:
perl -pi -e 's|<h2|<\!--se:split--><h2|g' src/epub/text/body.xhtml
Now that we’ve added our markers, we split the file. se split-file
puts the results in our current directory and conveniently names them by chapter number.
se split-file src/epub/text/body.xhtml mv chapter* src/epub/text/
Once we’re happy that the source file has been split correctly, we can remove it.
rm src/epub/text/body.xhtml
@@ -142,7 +142,7 @@ proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll
</section>
</body>
</html>
- If you look carefully, you’ll notice that the <html>
element has the xml:lang="en-US"
attribute, even though our source text uses British spelling! We have to change the xml:lang
attribute for the source files to match the actual language, which in this case is en-GB. Let’s do that now:
sed --in-place "s|en-US|en-GB|g" src/epub/text/chapter*
+ If you look carefully, you’ll notice that the <html>
element has the xml:lang="en-US"
attribute, even though our source text uses British spelling! We have to change the xml:lang
attribute for the source files to match the actual language, which in this case is en-GB. Let’s do that now:
perl -pi -e "s|en-US|en-GB|g" src/epub/text/chapter*
Note that we don’t change the language for the metadata or front/back matter files, like content.opf
, titlepage.xhtml
, or colophon.xhtml
. Those must always be in American spelling, so they’ll always have the en-US language tag.
Semantics for italics: <em>
should be used for when a passage is emphasized, as in when dialog is shouted or whispered. <i>
is used for all other italics, with the appropriate semantic inflection. Older transcriptions usually use just <i>
for both, so you must change them manually if necessary.
Sometimes, transcriptions from Project Gutenberg may use ALL CAPS instead of italics. To replace these, you can use sed
:
sed --regexp-extended --in-place "s|([A-Z’]{2,})|<em>\L\1</em>|g" src/epub/text/*
+ Sometimes, transcriptions from Project Gutenberg may use ALL CAPS instead of italics. To replace these, you can use:
+perl -pi -e "use utf8;s|([A-Z’]{2,})|<em>\L\1</em>|g" src/epub/text/*
This will unfortunately replace language tags like en-US
, so fix those up with this:
sed --regexp-extended --in-place "s|en-<em>([a-z]+)</em>|en-\U\1|g" src/epub/text/*
- These replacements don’t take Title Caps into account, so use git diff
to review the changes and fix errors before committing.
perl -pi -e "use utf8;s|en-<em>([a-z]+)</em>|en-\U\1|g" src/epub/text/*
+ These replacements don’t take Title Caps or roman numerals into account, so use git diff
to review the changes and fix errors before committing.