Spring 2026 Syllabus (Schedule) Classes meet M W F 1:25
- 2:15pm in Lilley Library 006.
This contains a detailed explanation of course policies and the basis for
grades.
This link jumps to the closest day to today's date. Review the schedule as we get
started to get a sense of how this course will work on a daily basis.
All the Tools You Need As We Begin:
Download and install the following software on your own personal computer(s) on
or before the first day of class. These software tools are available in our
campus computing labs, too.
- <oXygen/>. (You will probably have this installed from DIGIT
100 or 110.) The DIGIT program has purchased a site license for this
software, which is installed in Burke 153, Kochel 77, the Lilley Library
computers, and Witkowski 109, as well as the computer labs in Hammermill.
The license also permits students enrolled in the course to install the
software on their home computers (for course-related use only). When
installing this on your own computers, you will need the license
key, which we have posted on our course Announcements section
of Canvas.
- AntConc:
(You may have this installed from DIGIT 100.) Free corpus text analysis
tool.
- We will ask you to install Python version 3.9 or higher on your computer,
and install PyCharm Edu to assist in learning and writing Python code with
syntax checking. Follow instructions and links from Pycharm ( https://www.jetbrains.com/help/pycharm/quick-start-guide.html#meet
) paying attention to what you need for your own computer systems. Feel free
to download and explore Pycharm Edu on your own before we start working with
it together: https://www.jetbrains.com/pycharm-edu/. Also, configure Anaconda so
it is available to work within Pycharm following this guide: https://www.jetbrains.com/help/pycharm/conda-support-creating-conda-virtual-environment.html.
(We will provide guidance on this in class.)
- Zoom: Make sure your Zoom installation is up-to-date, and
you are ready to connect. Sometimes we will record portions of class
meetings and tutorial sessions for future reference to share over Zoom. Look
for these in Canvas Announcements and use the Zoom menu option in Canvas to
access these meetings.
- We will use GitHub for for sharing code and for project management. Create
an account (choose the free options) at the https://github.com and install the GitHub client software for your
operating system on your own machine on your computer. (We will explain how
to use git and GitHub this in our course.)
- We will use the Slack chat platform for discussion and for asking questions
(see https://slack.com/help/articles/218080037-Getting-started-for-new-members).
Download and install the Slack client, configuring your account to use use
your Penn State email address (the official address, which looks like
xyz123@psu.edu, and not an alias based on your name that you may have set
up), so you can join our Slack workspace: DIGIT-coders. When you receive an
invitation to join this workspace you should accept.
- Later in the semester we may ask you to install a local copy of the eXist-db
XML database, which you can download from https://exist-db.org/.
- Not much coding experience? Don’t worry! Past students in this course who
never saw anything like markup or XML code have designed projects (like these) and even spoken about them at academic conferences! You
will learn to develop your own digital tools and how to manage digital
projects as teamwork.
Class Web Resources:
| Week 1 |
Class topics
|
Do before class
|
M 01-12
|
- Welcome! Intro to the course and theme of text analysis and
re-mediation, and visualization.
- Hands-on warm-up with Scaleable Vector Graphics: SVG in oXygen
XML Editor.
- Genuary activities.
|
Respond to Dr. B’s Canvas announcement,
install/update oXygen XML Editor. |
W 01-14
|
Data to numbers to shapes with legible, human-readable SVG
code. |
- Install/update oXygen XML Editor if you have not done so
already.
- SVG Exercise 1: Orientation
- Join / reactivate the Digit Coder's Slack
|
F 01-16
|
- Class protocols for handling code files: GitHub and version
controlled file management. Making a branch on the
textAnalysis-Hub. Review adding, pulling, adding, committing,
and pushing.
- Gentle XPath orientation / review: Pulling data from Digit 110
projects to plot in SVG
|
|
| Week 2 |
Class topics
|
Do before class
|
M 01-19
|
Martin Luther King Jr. Day: No
classes. |
... |
W 01-21
|
Git Branching and Pull Requests |
SVG + Git Branching Exercise |
F 01-23
|
Orientation: Programming your visual design: XSLT to SVG |
SVG Exercise 3 |
| Week 3 |
Class topics
|
Do before class
|
M 01-26
|
- Contemplating the flow of text to image via code.
- Improving designs / layout on websites: XSLT to SVG
- Preview Regular Expressions (Regex) unit
|
XSLT to SVG Orientation Exercise, with Git PR Practice |
W 01-28
|
- Structuring and regularizing data from documents with
markup.
- Introduce document analysis with Regular Expressions: the dot,
the backslash, numbers (
\d, repetition indicators,
matching on lines, and autotagging. Greedy and non-greedy
matching.
- Preview Intro to Regular Expressions
- Choosing a license for
your project GitHub repo.
|
XSLT to SVG Orientation Exercise 2, with Git PR
Practice |
F 01-30
|
- Regular Expressions: Thinking (and writing) in markdown,
algorithmically. Regex resources: Character sets, symbols, capturing groups.
|
-
XSLT to SVG Orientation Exercise 3, with Git PR
Practice
- Watch Regex Orientation Videos:
-
Regex Orientation Exercise
|
| Week 4 |
Class topics
|
Do before class
|
M 02-02
|
Regex greedy and non-greedy matches. |
Regex Exercise 1 |
W 02-04
|
Debugging, simplifying, optimizing regex |
- Regex Exercise 2
- (By the end of the day): Git/GitHub Test Part
1: Record completion on Canvas as part of GitHub
Test
|
F 02-06
|
- Regex in XSLT:
xsl:analyze-string
- Semester project ideas
|
-
Regex Exercise 3
- Git/GitHub Test Part 2: Record completion on
Canvas as part of GitHub Test
|
| Week 5 |
Class topics
|
Do before class
|
M 02-09
|
- XSLT / Regex review
- Validity for a project: what is a schema? What is schema
validation?
- Validation for Google Sheets
- How to write a Relax NG schema (review for some / intro for
others)
|
-
Regex Exercise 4: applying
xsl:analyze-string
- Git/GitHub Test Part 3: Record completion on
Canvas as part of GitHub Test
|
W 02-11
|
Creating shell aliases: shortcuts for your shell. Shell practice. Regex in the shell with grep. |
- Looking ahead: Building project text corpora: Resources and
approaches to
scraping
- What is invisible XML and how could it be useful?
Orientation to ixml via John Lumley's workbench
- Git/GitHub Test Part 4: Record completion on
Canvas as part of GitHub Test
|
F 02-13
|
- Copyright, proprietary ownership, legality issues
- Relax NG schemas for project management
- Project ideas
|
- Make your shell alias file.
- Git/GitHub Test Part 5: Record completion on
Canvas as part of GitHub Test
|
| Week 6 |
Class topics
|
Do before class
|
M 02-16
|
Invisible XML (iXML): crafting your own grammar |
- Installations for ixml and XProc: Calabash and CoffeePot
- Git/GitHub Test Part 6 (and last): Record completion on
Canvas as part of GitHub Test
|
W 02-18
|
Pattern matching algorithms and pipeline processes for digital text: regex patterns, iXML, XSLT / XProc |
Installations for ixml and XProc: Markup Blitz and Morgana
Read Norm Tovey-Walsh’s Invisible XML introductory tutorial and annotate with
Hypothes.is |
F 02-20
|
- iXML grammars and applications. Debugging and coping with ambiguity in iXML.
-
Introduce Regex Test
|
iXML Exercise 1 |
| Week 7 |
Class topics
|
Do before class
|
M 02-23
|
Eliminating ambiguity in iXML and making a pipeline with XProc. Project applications for iXML. |
iXML Exercise 2 |
W 02-25
|
XSLT transformations in XProc |
|
F 02-27
|
XProc pipelines, scripting outputs and inputs: examples |
|
| Week 8 |
Class topics
|
Do before class
|
M 03-02
|
XProc for projects. / GitHub websites review |
XProc Exercise 2: Revised pipeline |
W 03-04
|
Project team work day |
Project GitHub web development |
F 03-06
|
... |
Project milestone |
Sun 3-08 – Sat 3-14
|
Spring Break
|
Enjoy this week! |
| Week 10 |
Class topics
|
Do before class
|
M 03-16
|
- ixml / XProc vs. Python in projects: Pipelines for text
processing, discussion of next steps
- Checking / troubleshooting Pycharm and Python installations
- Pycharm Edu tutorial work together. Manipulating strings wtih
Python, and Pythonic data structures (lists, tuples,
dictionaries).
|
|
W 03-18
|
- Python tutorial Q/A: tinkering.
- Python at command line vs. in the Pycharm IDE (or oXygen, VS
Code, etc)
|
- Pycharm Edu tutorials: through Strings
unit (submit evidence of completion via screen
capture on Canvas).
|
F 03-20
|
Getting started with Natural Language Processing (NLP)
with Python: installations/imports: nltk, spaCy, gensim |
Pycharm Edu Community tutorials: Complete the Tutorial
through the Condition expressions unit (submit evidence
of completion via screen capture on Canvas). |
| Week 11 |
Class topics
|
Do before class
|
M 03-23
|
- Word embeddings and the concept of
cosine similarity : a
humanities perspective
- NLP and large language models, vs. customized, specialized
modeling.
|
Pycharm Edu Community tutorials: Get at least
partway through Classes and Objects unit. |
W 03-25
|
Writing your own Python: Exploring LLM outputs to a prompt for similarity |
Finish Pycharm Edu Intro to Python tutorials: Classes
and objects, Modules and packages, File input and output. Submit
evidence of completion via screen capture on Canvas. |
F 03-27
|
Python for Natural Language Processing: Introducing Beautiful Soup for web scraping |
Python and GitHub configuration (.gitignore file) |
| Week 12 |
Class topics
|
Do before class
|
M 03-30
|
LXML e-tree vs. Beautiful Soup. NLTK book and practice |
Python exercise: Web scraping with Beautiful Soup |
W 04-01
|
NLTK: lexical diversity, frequency distributions |
Python exercise: Exploring NLTK on project files |
F 04-03
|
Wordnet and NLP on project files |
Python exercise: (NLTK and LLM output analysis) |
| Week 13 |
Class topics
|
Do before class
|
M 04-06
|
- Network analysis with text data
- When to work with XSLT vs. Python
- Pipelines for both
|
- Words to Network Data: TSV output (via XSLT or Python)
- Install Cytoscape
|
W 04-08
|
Cytoscape network analysis styling |
Cytoscape exercise: Create network from TSV |
F 04-10
|
Returning to SVG: make your own viz or use a library? |
Cytoscape exercise 2: Visualizing the network: edges, nodes, weights |
| Week 14 |
Class topics
|
Do before class
|
M 04-13
|
Python and XML together: Small Language Models with Model Context Protocol to read and refine project data |
SLM + MCP Readings + installation |
W 04-15
|
Python and XML together: Small Language Models as Agents with Model Context Protocol to read and refine project data |
SLM + MCP Configuration / Customization for projects |
F 04-17
|
LLMs and AI: Vector embeddings, statistical approximation, vs. ground truth |
Readings + Annotations on LLMs and AI |
| Week 15 |
Class topics
|
Do before class
|
M 04-20
|
Documenting processes: Mermaid flowcharts from markdown |
... |
W 04-22
|
- Data Visualizations in the Pipelines
- Python and XML handshake: Saxon C Library: XPath, XSLT,
XQuery in Python
- Customizing your SLM + MCP for projects
|
Mermaid chart exercise |
F 04-24
|
- Project documentation and reflection: What do you know? What is
not certain? Disclosing the limits and possibilities of text analysis / Using text analysis to reveal gaps/ what isn't known.
Example: Blues Analysis project.
- Documenting your methods, software, tools
|
Project Milestone: Python, SLM + MCP |
| Week 16 |
Class topics
|
Do before class
|
M 04-27
|
Putting it all together: Discussion, analysis,
documentation, web work. Ethics in public-facing digital data
representation. |
Project development sprint, prep for DIGIT Works
presentation |
W 04-29
|
Team sprint in class |
Project development sprint, prep for DIGIT Works
presentation |
F 05-01
|
Last Day! Project Milestone: Teams deliver DIGIT
Works presentations |
Prep for presentations |
|
Finals Week: May 4 – 8
|
To Complete
|
W 05-06
|
Semester projects due by 11:59pm
Finish developing projects, and send a post to me on GitHub and
Canvas to indicate your team is finished. |