diff --git a/templates/Header.php b/templates/Header.php index 571fe3f9..a8ad910a 100644 --- a/templates/Header.php +++ b/templates/Header.php @@ -18,12 +18,15 @@ if(!isset($manual)){ # We hash with crc32 because it's faster than md5 and "good enough" for this simple cache-busting use case -?> - +header('content-type: application/xhtml+xml'); +print(''); +print("\n"); +?> +
All Standard Ebooks source folders have the same basic structure. It looks a little like this:
Carefully review the entirety of the Standard Ebooks Manual of Style.
+Once we’ve OK’d your selection and you’ve read the style manuals, you can get started! Follow the steps in our step-by-step guide to producing an ebook to take your ebook from start to finish.
If you inspect the folder we just created, you’ll see it looks something like this:
You can learn more about what the files in a basic Standard Ebooks source folder are all about before you continue.
Now that we’ve got the source text, we have to do some very broad cleanup before we perform our first commit:
@@ -209,7 +209,7 @@ proceed to seal up my confession, I bring the life of that unhappy Henry Jekyllse british2american
attempts to automate the conversion. Your work must already be typogrified (the previous step in this guide) for the script to work.
se british2american .
While se british2american
tries its best, thanks to the quirkiness of English punctuation rules it’ll invariably mess some stuff up. Proofreading is required after running the conversion.
After you’ve run the conversion, do another commit.
git add -A git commit -m "Convert from British-style quotation to American style"
- This regex is useful for spotting incorrectly converted quotes next to em dashes: “[^”‘]+’—(?=[^”]*?</p>;)
+
This regex is useful for spotting incorrectly converted quotes next to em dashes: “[^”‘]+’—(?=[^”]*?</p>;)
Semantics for italics: <em>
should be used for when a passage is emphasized, as in when dialog is shouted or whispered. <i>
is used for all other italics, with the appropriate semantic inflection. Older transcriptions usually use just <i>
for both, so you must change them manually if necessary.
Sometimes, transcriptions from Project Gutenberg may use ALL CAPS instead of italics. To replace these, you can use sed
:
sed --regexp-extended --in-place "s|[A-Z’]{2,}|<em>\L&</em>|g" src/epub/text/*
+ sed --regexp-extended --in-place "s|[A-Z’]{2,}|<em>\L\1</em>|g" src/epub/text/*
This will unfortunately replace language tags like en-US
, so fix those up with this:
sed --regexp-extended --in-place "s|en-<em>([a-z]+)</em>|en-\U\1|g" src/epub/text/*
These replacments don’t take Title Caps into account, so use git diff
to review the changes and fix errors before committing.
Semantics rules for abbreviations. Abbreviations should always be wrapped in the <abbr>
tag and with the correct class
attribute.
Specifically, see the typography rules for initials. Wrap people’s initials in <abbr class="name">
. This regex helps match initials: [A-Z]\.\s*([A-Z]\.\s*)+
Specifically, see the typography rules for initials. Wrap people’s initials in <abbr class="name">
. This regex helps match initials: [A-Z]\.\s*([A-Z]\.\s*)+
Typography rules for times. Wrap a.m. and p.m. in <abbr class="time">
and add a no-break space between digits and a.m. or p.m.
Typography rules for times. Wrap a.m. and p.m. in <abbr class="time">
and add a no-break space between digits and a.m. or p.m.
Words or phrases in foreign languages should always be marked up with <i xml:lang="TAG">
, where TAG is an IETF language tag. This app can help you look them up. If the text uses fictional or unspecific languages, use the “x-” prefix and make up a subtag yourself.
Words or phrases in foreign languages should always be marked up with <i xml:lang="TAG">
, where TAG is an IETF language tag. This app can help you look them up. If the text uses fictional or unspecific languages, use the “x-” prefix and make up a subtag yourself.
Semantics for poetry, verse, and song: Many Gutenberg productions use the <pre>
tag to format poetry, verse, and song. This is, of course, semantically incorrect. See the Poetry section of the SEMOS for templates on how to semantically format poetry, verse, and song.
Once you’ve verified the titles look good, commit:
git add -A git commit -m "Add titles"
Many older works use outdated spelling and hyphenation that would distract a modern reader. (For example, “to-night” instead of “tonight”). se modernize-spelling
automatically removes hyphens from words that used to be compounded, but aren’t anymore in modern English spelling.
Vathek by William Beckford
Poetry by Mark Akenside (make sure this collection is a complete corpus of his works.)
+The Life of Lazarillo de Tormes by Anonymous