Discovering the world of TEI & XML: My first university lesson!

Hey everyone! I just attended my first university lesson on TEI & XML and I must say, it’s not just about codes and brackets. It’s like unveiling a secret language behind digital humanities! Let’s dive in.

So, what’s the difference between TEI & XML?

XML (Extensible Markup Language) is a language that defines rules for encoding documents in a format readable by both machines and humans. On the other hand, TEI (Text Encoding Initiative) is a standard for representing texts in digital form, which uses XML as its basis. Think of XML as the foundational framework, and TEI as a specific application of it!

Why is this important for our project?

With TEI & XML, we can structure, store, and analyze textual data in a standardized way, ensuring that our work remains interoperable and understandable in the world of digital humanities. So, if we’re collaborating or sharing our work, it’s going to be super handy.

Alright, what about metadata?

The header of a TEI file contains metadata. This is basically like the ID card of a document, giving detailed information about it, such as the title, author, publisher, and more. Basically, it helps people (and machines) understand the context and specifics of the text.

Deep dive into TEI structure: From my notes

  1. TEI File: It consists of two major parts:
    • Header: Contains the metadata (more on this soon).
    • Body (Text): The main content of your electronic text.
  2. Nodes: Remember the family tree? Think of mother nodes as parents and child nodes as their children. It’s a hierarchy!

Mandatory Elements in the Header:

  1. (File Description): The most important section! It has:
  • <titleStmt>: Here we mention the title, author, editor, and others who contributed to the electronic text.
  • <respStmt>: This is where you give credit by encoding responsibilities. For example, who did what.
  • <pubStmt>: Information about the publisher, publication status, address, availability, and date.
  • <sourceDesc>: This is a tricky one. It’s about the bibliographic source, NOT the actual one. For our purposes, think title, storyteller, and translator.
  1. <encodingDesc>: Gives a low-down on the editorial practices, project description, and other technical details.
  2. <profileDesc>: A spotlight on non-bibliographic info. It tells about the creation of the text, the languages used, and details about the ACTUAL source.

<body> Basics:

  1. <div>: Think of it as the boundaries for sections, chapters, etc.
  2. <head>: For titles and subtitles.
  3. <p>: Good old paragraphs.
  4. <q>: Spoken words or quotes. Ready for some drama!
  5. <l>: For those poetic souls, this represents lines in poems and songs.


Remember the sample code mistake I mentioned? Well, it’s always good to cross-check codes. Little nuances can make a huge difference in the digital realm!

That’s a wrap for today! Join our class next time as we decode more of the digital humanities universe. Until then, keep exploring and happy coding!

2 thoughts on “Discovering the world of TEI & XML: My first university lesson!

Leave a Reply

Your email address will not be published. Required fields are marked *