This post will hopefully help you to get a general understanding of what coding a TEI file in XML language is and what it’s good for. Please note that I am definitely not an expert on this topic, and I might or might not make some mistakes along the way… Please be patient with me and feel free to give me feedback in the comments!
First things first: what is TEI? and what does XML mean?
TEI stands for “text encoding initiative”, and this already tells us one crucial thing: calling TEI proper coding would be a mistake, since it really serves the purpose of transcription and, basically, just giving us a guideline on how to read and interpret a certain text or chunk of information we got on our hands.
What can be encoded in TEI?
- Text
- Pictures
- Audio
- Video
As in our human experience, we can digitally use different languages to encode something. And, again, similarly to how we, as humans, work, not every computing device understands every language there is. interesting right?
As far as my understanding goes: computers are both very intelligent, and very dumb. Why, you ask? because they are able to process and elaborate on big chunks of information extremely quickly, but you have to tell them exactly how. And God forbid you make a mistake on the way, because then the computer will be completely lost, and refuse to do what you expect from it.
Well, for this instance, we will try and learn all the how to’s surrounding a language called XML.
XML stands for “Extensible Markup Language” and it allows us to communicate efficiently with our coding program of choice: Visual Studio Code.
Let’s get started.
TEI Structure
A TEI file needs some information, to be defined and to define itself, doesn’t it?
All the info we are looking for are contained in the header.
The Header contains precious metadata, that we need to understand what follows. It always contains the following mandatory elements:
- <fileDesc> (file description): bibliographic description of the electronic text (the most important of them all)
- <encodingDesc> (encoding description): description of the relation of the electronictext to its source
- <profileDesc> (profile description): description of the context in which the electronictext was created + classification of information
The <fileDesc> itself contains a whole subsection of more mandatory elements. If you want to know which, just look up the list below!
- <titleStmt> (title statement)
- <publicationStmt> (publication statement)
- <sourceDesc) (source description)
- <encodingDesc> (encoding description)
- <profileDesc> (profile description)
Once we have established all of the previous information, we can move on to the body part, in which the actual media you want to encode is contained. For this part, we also have some basic commands that we can list, that will help organize our chosen text, or picture, or actually whatever you want to encode. Here you got a brief overview with some quick explanations:
- <div>: text divisions, needed to divide everything in paragraphs, looks more neat and clear to the audience
- <head>: title + subtitle, every good file needs a title, so we know what’s going on
- <p>: paragraph
- <q>: spoken words / quotes
- <l>: lines in poems / songs
- <hi>: highlighting things in e.g. italics, bold, superscript …
BUT BEWARE!! To effectively use those tags, you will need to close them accordingly. how do you close a tag? well, my version of Visual Studio Code doe it for me, but if you need to do it manually: just repeat the tag, adding a slash before it!
Here’s how it could look: <fileDesc> The file description goes here </fileDesc>. And that’s basically it! you’re good to go!
This is an example of how some coding could look:
From here on, the Possibilities are endless! you can even cross reference parts of text to other ones: for example if you want to encode a footnote or a glossary entry that you can actually click on, you can!
I didn’t quite get how it really works, but here you go, I’ll give you an example and some references, so you can get an understanding of what I’m talking about!
Did I awake your interest? here’s a link to an interesting and useful page that can help you better understand everything about TEI and XML:
Student: Carlotta Cilia – cacil100
Facility: Heinrich Heine Universität Düsseldorf
Course: “Demarginalizing Demarginalising Orature: Translating Minor Forms into the Digital Age”
Teaching and Supervising: Anne Schulzki, Michael Zane Brose, Emmanuel Tasun Tidorchibe, Prof. Dr. Eva Ulrike Pirker
2 thoughts on “(Not really) Mastering TEI and XML, a rudimental tutorial”