Remove reference to now-obsolete 'se clean --single-lines' option, and reference to removing spaces around opening/closing tags

This commit is contained in:
Alex Cabal 2020-04-28 12:44:13 -05:00
parent a887b886c0
commit bdc627ce65

View file

@ -124,8 +124,7 @@ proceed to seal up my confession, I bring the life of that unhappy Henry Jekyll
<h2>Clean up the source text</h2> <h2>Clean up the source text</h2>
<p>If you open up any of the chapter files we now have in the <code class="path">src/epub/text/</code> folder, youll notice that the code isnt very clean. Paragraphs are split over multiple lines, indentation is all wrong, and so on.</p> <p>If you open up any of the chapter files we now have in the <code class="path">src/epub/text/</code> folder, youll notice that the code isnt very clean. Paragraphs are split over multiple lines, indentation is all wrong, and so on.</p>
<p>If you try opening a chapter in a web browser, youll also likely get an error if the chapter includes any HTML entities, like <code class="html">&amp;mdash;</code>. This is because Gutenberg uses plain HTML, which allows entities, but epub uses XHTML, which doesnt.</p> <p>If you try opening a chapter in a web browser, youll also likely get an error if the chapter includes any HTML entities, like <code class="html">&amp;mdash;</code>. This is because Gutenberg uses plain HTML, which allows entities, but epub uses XHTML, which doesnt.</p>
<p>We can fix all of this pretty quickly using <code class="bash"><b>se</b> clean</code>. <code class="bash"><b>se</b> clean</code> accepts as its argument the root of a Standard Ebook directory, and with the <code class="bash">--single-lines</code> option itll remove the hard line wrapping that Gutenberg is fond of. Were already in the root, so we pass it <code class="path">.</code>.</p><code class="terminal"><span><b>se</b> clean --single-lines <u>.</u></span></code> <p>We can fix all of this pretty quickly using <code class="bash"><b>se</b> clean</code>. <code class="bash"><b>se</b> clean</code> accepts as its argument the root of a Standard Ebook directory. Were already in the root, so we pass it <code class="path">.</code>.</p><code class="terminal"><span><b>se</b> clean <u>.</u></span></code>
<p>Things look much better now, but were not perfect yet. If you open a chapter youll notice that the <code class="html"><span class="p">&lt;</span><span class="nt">p</span><span class="p">&gt;</span></code> and <code class="html"><span class="p">&lt;</span><span class="nt">h2</span><span class="p">&gt;</span></code> tags have a space between the tag and the text. We can clean that up with a few <code class="bash"><b>perl</b></code> commands.</p><code class="terminal"><span><b>perl</b> -pi -e <i>"s|&lt;(p|h2)&gt;\s+|&lt;\1&gt;|g"</i> src/epub/text/chapter<i class="glob">*</i></span> <span><b>perl</b> -pi -e <i>"s|\s+&lt;/(p|h2)&gt;|&lt;/\1&gt;|g"</i> src/epub/text/chapter<i class="glob">*</i></span></code>
<p>Finally, we have to do a quick runthrough of each file by hand to cut out any lingering Gutenberg markup that doesnt belong. In <i>Jekyll</i>, notice that each chapter ends with some extra empty <code class="html"><span class="p">&lt;</span><span class="nt">div</span><span class="p">&gt;</span></code>s and <code class="html"><span class="p">&lt;</span><span class="nt">p</span><span class="p">&gt;</span></code>s. These were used by the original transcriber to put spaces between the chapters, and theyre not necessary anymore, so remove them before continuing.</p> <p>Finally, we have to do a quick runthrough of each file by hand to cut out any lingering Gutenberg markup that doesnt belong. In <i>Jekyll</i>, notice that each chapter ends with some extra empty <code class="html"><span class="p">&lt;</span><span class="nt">div</span><span class="p">&gt;</span></code>s and <code class="html"><span class="p">&lt;</span><span class="nt">p</span><span class="p">&gt;</span></code>s. These were used by the original transcriber to put spaces between the chapters, and theyre not necessary anymore, so remove them before continuing.</p>
<p>Now our chapter 1 source looks like this:</p> <p>Now our chapter 1 source looks like this:</p>
<figure><code class="html full"><span class="cp">&lt;?xml version="1.0" encoding="utf-8"?&gt;</span> <figure><code class="html full"><span class="cp">&lt;?xml version="1.0" encoding="utf-8"?&gt;</span>