A Case Study in Electronic Publishing Theory
Editorial Declaration of Method: An Introduction
Texts are alive. With every passing moment, even physically static texts mutate. They gain new theoretical implications through engagement with each new moment’s cultural attitudes.They change intertextually through their interaction with other published texts and through the alterations in each new moment’s semiotic field. The very paper and ink reacts chemically with the world around it, discoloring, growing brittle, and fading with geriatric age. And, for those texts that are honored with republishing, they go through an editorial process which inevitably involves interpretation and re-alignment to fit with the newly-published artifact; even if these republishing alterations are as minor as changes to pagination or typeface, to colophon or cover art, the text has changed. The notion of “purity” in republishing a text – particularly into a radically different medium, such as from codex to hypertext – is impossible to the point of irrelevance.
However, despite the inevitability of editorial influence on a republication, that influence must be carefully calculated, both in order to preserve the text-as-itself (if the changes become wildly radical, why even preserve the text’s title or author as distinguishing characteristics of it?), and because an editor, like an author, is creating an aesthetic object of critical consideration. Every design decision, every change, must be accounted for – from the most complex tags to the smallest typesetting elements. With this ethos, I proceeded to encode Graham R. Tomson’s “Vespertilia” from The Yellow Book.
In digitally encoding this poem, I indexed those elements which I considered of greatest utility for a reader with a general interest – the author’s name/pseudonym and its relevance to her biography, names of publishers, place names for locations of publication, and the Latin influences on the title and the related tree on line 4 of the first stanza.
(Note: I ran out of time on tagging and indexing, and as such the site index material at the bottom of the document is currently limited. This will be developed further as time permits.)
Visually, I sought to document and preserve the aesthetic qualities of the physical copy I used (housed at the University of Calgary library) as accurately as possible. Spacing between punctuation and text (highly irregular in this poem) is uniformly preserved, and line indentations are tagged. The initial drop cap is identified with a tag for “illuminated” text (industry best practices as indicated through online TEI-encoding forums). The sizing and placing on all title, author, header, and footer text has been indicated and differentiated as accurately as possible, as well as page margins on all four sides of the full document; however, margins around page breaks were left untagged, as proximity to the page break gives context to what the header and footer information is doing, should it be coded for rendering in the XSLT. The background color of the page has been identified as a faded orange-yellow color which matches with the most common background color of the copy of The Yellow Book used (based on a digital RGB analysis and translation into hexadecimal of the original first page); but as it would be difficult to match with the exact color-mottling of the source and make this commonly renderable by the majority of available web browsers, the color is instead left tagged as uniform across the whole document – even then, despite the strictness of RGB levels, different browsers may render the color differently.
However, as noted above, texts are living documents, and should therefore be able to be re-translated with alterations to the specific methods of indicating the tags. As a TEI-XML document, this is primarily an archival documentation of the aesthetics of the codex version which can be altered through XSLT into a published version which may – inevitably must – alter the appearance of the original. Aesthetic information is noted in @rend elements, but the translation of these into HTML is still, as it must be, left open to alteration in (re)production.
Literary Gesamtkunstwerk: A Brief Note on Paratext and Meaning
In Gerard Genette’s seminal text Paratexts: Thresholds of Interpretation, he argues that, though a piece of literature is generally defined as “a more or less long sequence of verbal statements that are more or less endowed with significance,” this “text” is always accompanied by any number of paratextual materials which affect “its ‘reception’ and consumption” (1). The materials bordering on the text (which Genette terms an “edge”) form what Philippe Lejeune, in Le Pacte Autobiographique, terms “a fringe of the printed text which in reality controls one’s whole reading of the text” (qtd. in Genette 2). As an example of the effect of paratext on text, Genette posits a question: “[L]imited to the text alone and without a guiding set of directions, how would we read Joyce’s Ulysses if it were not entitled Ulysses?” (2). For Genette, the title, front matter, back matter, typeface, illustrations, etc. all affect the hermeneutics of the text (though he focuses upon authorial intent far too much for a tried-and-true post-structuralist like myself).
Extending a bit beyond Genette’s limitations on the paratext, but with a similar consideration of its influence, I tend to consider the limitations of “the text” more in correlation to Richard Wagner’s ideology of Gesamtkunstwerk in opera. In brief, Wagner’s ideology (and the basic theory behindGesamtkunstwerk, translating as something like “The collective artwork”) is that everything from the costume to the libretto, the lighting to the violins, even the physical aesthetic of the opera house itself (he had the Bayreuth Festspielhaus built specifically for Der Ring Des Nibelungen), all function collectively as a part of a unified artistic production – none are ancillary, all are fundamental to the produced artwork.
Though in a given work, the “text-proper” is undoubtedly of more overt value in a hermeneutic experience, the “meaning” of a text or the message conveyed by that text is influenced by a variety of elements, even beyond those considered by Genette. The difference in paper texture and ply between the super-thin Norton Anthology pages and the heavy, rough-edged ply of a Penguin Classics Deluxe edition of Jack Kerouac’s Dharma Bumsgive to one the haptic quality of turning pages in the Christian Bible, to the other of thumbing through a traveling bhikkhu’s rucksack journal; and brilliant white paper gives a sense of freshness and newness – of purity – whereas yellowed paper suggests age and, particularly when blotched or mottled, heavy use (and thereby importance to those who used it before).
As we move away from the codex, of course, the technology of the medium and its control over navigation of the text also comes into heavy consideration. The standard codex insists upon a very specific method of navigation: turning pages, one after another, to move through the text – with certain exceptions, of course, in texts which specifically seek to undermine standardized codex navigation (cf. Mark Z. Danielewski’s Only Revolutions: A Novel). Digitized texts, however, create a new method of reading. The viewed surface does not physically change, new material is rendered upon it based on the reader using peripheral devices (e.g. keyboard, mouse) to activate encoded navigation methods – hyperlinks, scroll bars, “mouseover” text, etc. This change to the navigation technology paratextually affects the reader’s method of engaging with the text. As Jessica Pressman points out with respect to both digital texts of various formats and Danielewski’s novel, “reading is a deeply medial activity dependent upon the specificities of reading technologies” (161). Thus even in access to PDF versions of texts, with all their verisimilitude to the printed codex, the reading technology mediates the reading. The skeuomorphic “turning page” aesthetic common to Archive.org and Kindle serve as informative examples of conservative resistance: in attempting to mimic the reading of a codex, the reader is made aware of just how different the acts of reading the two media are. Clicking a mouse and turning a page are very different methods of navigation.
Web-based texts additionally have a series of other more Genette-based paratextual elements. Web sites often have headers, footers, navigation tools, sidebars, and any number of other visual elements which circumscribe the textual material on the page-proper – all of which function much akin to the front and back matter in Genette’s paratextual theory. In my (re)presentation of Tomson’s “Vespertilia,” this has been limited as much as possible; but I cannot eliminate the header or footer material of the reader’s browser (toolbars, address bar, title bar, status bar), the operating system’s task bar (Windows) or Dock (MacOs), nor precisely mimic the original codex’s technology of navigation – paratextual alterations are inevitable. Thus it is important to understand that a republication – particularly when the medium change is radical, but nonetheless true in any situation – is always a change in mediation which considerably affects the text being published.
Publication as Mediation: The HTML Rendering of “Vespertilia”
Publication is mediation, in whatever format the published piece may take; and mediation is never devoid of influence on what is being mediated. As Bruno Latour suggests in We Have Never Been Modern, the very idea of an intermediary – a “void” medium which “simply transports, transfers, transmits” what it carries, which does not influence the transported material (77) – is a modern fallacy. Every medium, Latour argues, is rather a “mediator, […] an original event [which] creates what it translates” (78), an original object in its own right which shapes and, in effect, works symbiotically with the “translated” originary message it delivers to create a “new” message. For a highly simplistic analogy, a text written in crude hand-printed letters elicits a very different initial response from its reader than the same text typed out on academic letterhead; and these function hermeneutically quite differently than the same text presented with verse-style line breaks within a book of poetry, or preached verbally during a religious service, or presented one word at a time in a Flash presentation on an art museum wall. According to Terrence Gordon, this is the foundation of Marshall McLuhan’s oft-quoted statement, “the medium is the message”: McLuhan was trying to describe “how media operate and how they shape and control the speed, scale, and forms of human association” with information (vi). Hermeneutically, one cannot separate the word from its presentation.
Any (re)presentation of a text, any (re)mediation of a message, thus presents a fundamental changing of the text itself. As Julia Kristeva notes, any text is not itself a unique, purely “new” text; rather, every text is a sort of collage built out of the semiotic field of that text’s temporal moment, a space within which “several utterances, taken from other texts, intersect” to form the “new” material (36). Understanding this idea – that every text represents a process of intertextual appropriation – one must then consider the process of writing itself as a sort of editorial process; and, conversely, the process of editing as a sort of authorship. This is not to suggest that the ur-author ceases to be the author of the (re)presented work, nor even that she goes under erasure; rather, the (re)presented work becomes the product of a collaborative writing.
In this light, Marjorie Stone’s and Keith Lawson’s observations about internet users and their attitudes about information found online are troubling. Stone’s and Lawson’s research indicates that “decontextualized” texts – texts devoid of scholarly annotations, contexts, or critical apparatus – are privileged because they present the text with an appearance of purity, of being unmediated (119). Paul Conway similarly notes in Google users a desire for “access to information without human mediation” (63) – a desire for information through a Latourean intermediary, which, as argued above, is a fallacy. Digital readers, according to these studies, do not recognize that the accessed texts are always already mediated. Even PDF facsimiles of the texts in question are mediated through the methods of access to them, as the header/footer material of the webpage in which the PDF appears, the web browser “window dressing,” and even the very fact that the PDF is being viewed as radiant light from a computer screen and navigated using computer peripheral equipment, all work as paratextual elements mediating the reading.
This is not to say that the ur-text and ur-author are irrelevant or of little consequence to digital (re)presentations.As with any “proper” approach to writing, to choosing the semiotic materials with which an author intends to craft her message, editing cannot be approached with careless disdain for the ur-text, as it is, for the editor, the semiotic material she is using to express what she sees as the message of the text-to-be-(re)created. The editor also owes a certain level of respect to the authority of the ur-text’s author as the “co-author” of the (re)presented text. Any changes to the (re)presented text need to follow a certain ethos both of “how can I present this the way I think it should be presented to the reader?” and “how did the ur-author – my co-author – want the text to be presented?”
Thus my digital rendering of Tomson’s “Vespertilia” from The Yellow Book, while keeping in mind the influential “(re)” in (re)presentation, seeks to preserve Tomson’s aesthetic qualities as closely as possible, particularly in terms of typesetting, page layout, and coloration, as I consider these paratextual elements to be fundamental to an aesthetic experience of the text. The initial drop-cap “I,” for instance, though this was a typesetting style applied to all the works in this era of The Yellow Book, is nonetheless a part of the poem as it appears in this publication, and thus should be considered as a paratextual element of the text as much as the general typeface (which I have left Times New Roman – not a precise replica of the original, but the closest one I could find which would be commonly accepted by a variety of web browsers). Indented lines and the distance of the indent have been preserved as accurately as possible with respect to the placement of the indented leading letter and the letters of the words above and below – this specifically out of respect for Tomson’s (and/or her previous editors’) aesthetic decision to indent these lines only slightly, as I do not find some of these to function heavily in the reading of the piece; but without specific reservations about doing so, I yield to the aesthetic tastes of my “co-authors.” The surrounding margins create a blank frame around the text and thereby make the lines of the text seem intimately closed off and isolated, mirroring the intimacy of the narrator and the “stranger-woman” (line 7) in the poem’s desolate landscape. The yellowed page color – which had to be made uniform rather than mottled like the naturally-aged physical pages in the hard copy for sake of ease in coding and rendering – additionally establishes a certain aesthetic of age and mood in the piece, like a photograph cast in sepia tone.
The catchwords and header information (page numbers, as well as “Vespertilia” and “By Graham R. Tomson” on alternating pages) maintain size and location in relation to the page – as web pages are resizable, it is impossible to locate these accurately in relation to both the page and the text; and it appears more obvious what the floating catchwords (in particular) are doing when their location is tied to the page/window rather than the text. Similarly, horizontal-rule page breaks are added, both to reflect the aesthetic of the codex version of the text and to further identify these as page headers and catchwords rather than random pieces of text floating seemingly haphazardly between stanzas (cf. The Yellow Nineties Online version of this poem for an example of what I was avoiding in this respect). The smaller, less-effectual (in my estimation) stylistic preservations from the ur-text are too many to list individually; but with all formatting decisions, representing the aesthetic of the text as it appears in my edition of The Yellow Book was the stylistic modus operandi of encoding the rendered text.
Yet, despite all the steps taken to preserve the aesthetics of the ur-text, it is by no means a facsimile of the original in a variety of aspects, even beyond those issues listed above in terms of precise typeface, page/text relation to headers and catchwords, and unified color. All of the aesthetic decisions, no matter how accurately an editor may try, are inevitably approximations which are further mediated by browser interpretations of the HTML. Inescapably, the medium affects the message.
Archiving vs. Enabling: A Justification of HTML and TEI-encoding
So why publish digitally? Why transcribe texts into an electronic format in the first place? What is the value? As Martha Neill Smith argues, “Editing is a physical as well as a philosophical act, and the medium in which an edition is produced […] is both part of and contains the message of the editorial philosophies at work.” Thus if one is to (re)produce a text in digital format, the very decision to use that digital format needs to be justified.
The values of digitization are rather various; only a few are rather obvious. First is the function of a broadly-accessible and easily-preserved archive. Books decay, binary does not; books are expensive to store and reproduce, binary is cheap in both regards; books sit still, binary travels all over the globe at the speed of light (give or take some bandwidth). The value of this increased longevity, cheapness, and accessibility is self-evident.
But these characteristics are true of PDF copies of texts, which are far more capable of preserving the aesthetic qualities of the copied codex than HTML; so why encode in TEI (for rendering into HTML or any other non-image-based format) in the first place? As Stone and Lawson lament, the paratextual materials (even reader-inscribed palimpsests and marginalia) of older books are lost in this kind of digitization (110), but PDF versions can, with a fairly high level of verisimilitude, preserve much of this.
In defense of TEI encoding and HTML rendering as a publishing format, I refer back to the previous section: publication, of any sort, is a mediation, so “purity” is implicitly a futile pursuit. However, if an editor is interested in preserving the artistic ethos of the ur-author in the (re)presented work along with the editor’s own – if she is interested in making the text a collaborative work with the ur-author rather than moving toward a more singularly-directed work – some steps must be taken to preserve that author’s ethos in publication; hence, I strove to render Tomson’s poem as accurately as possible, with the understanding that adjustments needed to be made for the new medium and that my preservation of textual and paratextual elements is inherently based on my interpretation of the piece’s aesthetic.
Additionally, it should be noted that TEI is not itself renderable by a common web browser. TEI documents (and all XML, for that matter) must be rendered or “transformed” by a rendering program from XML into HTML using an XSLT (note: TEI Boilerplate is an exception to this: by limiting the available tagging and display abilities – by making the aesthetic, eponymously, “boilerplate” – it allows TEI files to be rendered directly by a web browser). A TEI document is rather a database of sorts which stores information about the visual information on the page (among other things). The TEI encoding, though it should contain a great deal of information on how the aesthetic of the ur-pages appears, does not strictly define how the tagged materials will be rendered. For instance, XML page breaks (<pb />) on The Yellow Nineties Online XML of “Vespertilia” were rendered by their XSLT into empty paragraph tags (<p> </p>) in the HTML, which means the page delineations are invisible except for a small amount of blank space; but the XSLT could easily have associated these page break tags (as my mock-rendering does) into horizontal lines (<hr />) to demonstrate the break between pages, more akin to the visual rendering of a Microsoft Word document. Thus the TEI is not itself the (re)presentation, but rather the information used to create any number of (re)presentations – it is the raw data which future editors may use in order to create a text of interpretatively-varied verisimilitude to the ur-text from The Yellow Book. In order to facilitate future editors (myself included) in deciding on the level of verisimilitude they would like – typesetting, page layout, etc. – I believe it is important to encode as much information about the aesthetic qualities of the text as possible in the TEI, which can then either be rendered as I have, rendered differently, or not rendered at all as befits the aesthetic interpretation of the future editors (or, as I have suggested previously, “co-authors” along with myself and Tomson). PDF is perhaps better as an archival project; but if future editors want to “do” something with the text in terms of its aesthetic presentation, HTML is a preferable medium; and it is easier to create the HTML (particularly a whole series of them, if one is dealing with, say, the entire Yellow Book) if the document information is already XML-encoded according to easily-translatable TEI standards.
Speaking of “doing” something with the text, I come to what I would say is the most important benefit of HTML/XML: as a medium, it has capabilities that are very difficult in PDF. With the exception of OCR-enabled PDFs – which have rather specific requirements of the PDF image and require the user to trust the accuracy of OCR in deciphering the visual aspects of the text – PDFs are not easily searched, copied-and-pasted, or manipulated. This makes finding quotations within a work or using specific text within a work for a given purpose very difficult: try finding the place where Leopold Bloom mentions Esperanto in James Joyce’s Ulysses in a non-searchable document (PDF or codex), then try finding it in an HTML version. Even a Modernist scholar who might recall it as being toward the beginning of the “Circe” chapter will (and did) take a considerably longer amount of time searching for it than a search function in HTML.
Furthermore, even searchable PDFs are very difficult to copy in their entirety or manipulate for any specific purpose – they archive well, but enable use poorly for a large section of DH tools. The quantification abilities and subsequent visualizations of tools such as Tableau or Voyant, for instance, require ASCII text, not images of text; and if a researcher is trying to quantify data from the entirety of Joyce’s Ulysses (used again merely for continuity), even a searchable PDF would require a click-and-drag selection method which would take drastically longer (and have room for error) than two clicks with the shift key held down on an HTML document – not to mention using the PDF is trusting a computer program to have translated the text accurately: consider, in this example, the full-page-sized S of “Stately, plump” which opens the novel.
TEI in particular adds to the potential DH functionality of the text through specialized tags which, though they do not need to render overtly into the document, may be used as identifiable traits. Consider, for example, my TEI-encoded version of “Vespertilia.” Each line is coded as a separate line, grouped into line groups identified as stanzas (<lg type=”stanza”>). The number of these line groups indicates the number of stanzas in the poem, and the number of line tags indicates the number of lines. Thus, if a researcher were interested in quantifying the average lengths of stanzas not just in this poem, but in a selection of literature by Tomson, a digital tool could easily scan through and calculate the number of lines or stanza line groups in a series of TEI/XML files before the researcher could count to two. TEI also allows for documenting meter, scansion, rhyme, and specific poetic forms, such as sonnets; and within a poetic form, different kinds of stanzas can be identified, such as quatrains and couplets; and within individual lines, caesuras, internal rhymes, even enjambment. There are literally too many options to name here, all of which allow for digital tools to scan the file, collect data, and provide it for research purposes. PDFs are, again, more useful in terms of preservative archiving; but the utility of TEI/XML (re)production of a text – both in terms of the HTML publishing mentioned previously and in terms of its research facilitation – is unspeakably high.
(NOTE: Due to time constraints, the TEI tagging of “Vespertilia” in the poetic stylistics mentioned in the above paragraph were never realized beyond lines and stanza line groups. I hope to complete adding further coding and finishing soon (summer 2016). XSLT-translation of the XML into HTML is also yet to be realized; but, as noted above, I have composed an HTML facsimile of what it should look like when it is completed. Caveat: this version was coded by my fingers into Notepad based on, not using, any rendering of the XML, so verisimilitude is not guaranteed.)
Conway, Paul. “Preservation in the Age of Google: Digitization, Digital Preservation, and Dilemmas.” Library Quarterly 80.1 (Jan. 2010): 61-79.
Genette, Gerard. Paratexts: Thresholds of Interpretation. Trans. Jane E. Lewin. New York and Cambridge: Cambridge UP, 1997.
Gordon, Terrence W. McLuhan: A Guide for the Perplexed. New York and London: Continuum, 2010.
Kristeva, Julia. “The Bounded Text.” Desire in Language. Trans. Thomas Gora, et al. Ed. Leon S. Roudiez. New York: Columbia UP, 1980: 36-63.
Latour, Bruno. We Have Never Been Modern. Trans. Catherine Porter. Cambridge, MA: Harvard UP, 1993.
Pressman, Jessica. Digital Modernism: Making it New in New Media. New York: Oxford UP, 2014.
Smith, Martha Neill. “Electronic Scholarly Editing.” A Companion to Digital Humanities. Eds. Susan Schreibman, et al. Oxford: Blackwell, 2004.Digitalhumanities.org. Web. 4/6/2015.
Stone, Marjorie, and Keith Lawson. “’One Hot Electric Breath’: EBB’s Technology Debate with Tennyson, Systemic Digital Lags in Nineteenth-Century Literary Scholarship, and the EBB Archive.” Victorian Review 38.2 (Fall 2012): 101-125.
Tomson, Graham R. “Vespertilia.” The Yellow Book 4 (Jan. 1895): 49-52.