Resource Nameurlcreatorchronological and geographical scopescope of text typessearch and browsing capabilitiesinterface typefile format and encodinglicenselinking possibilitiesmetadataannotationsizedescription contributor
ALIM - Archivio della Latinità Italiana del Medioevo Smiraglia (Union Académique Internationale); Gian Carlo Alessio (Università di Venezia); Antonio De Prisco (Università di Verona)XI-XV centuries, Italy, LatinCharters and Literary Texts, provided that they are in philological critical editionBy genre, author, time, work, word, phraseexclusively web serviveplain texts ready for XML encoding. ;Free consultation, Licensed.About 250 texts, many millions chars. More than 300,000 visits till 2012.Francesco Stella
ANDEl – The Anglo-Norman Dictionaryhttp://www.anglo-norman.netDavid Trotter (technical: Michael Beddow)1100-1500, British Isles (documents in French)all text typessearch by headword; graphical form (built in concordancer, starting with, exact match, ending with), proximity of forms (x before/after y, within n words of y); RegEx allowed for; search by translation (= definition);webTEI-compliant XML; delivered to user as HTML (variety of browsers and hardware platforms supported: e.g. tablet, netbook, PC)freely accessible, no limitationslink to: (stable URL)some online alreadyc. 30Mb total XMLDavid Trotter
BFM – Base de Français Médiéval ICAR UMR5191 ENS LSH / CNRS, École normale supérieure de Lyon: Céline Guillot, Serge Heiden, Alexei Lavrentiev, Christian Marchello-Nizia, Sophie Prevost"842-1467; Entire “oïl” domain"As many genres as possible (litterary and non-litterary) in prose and in verse."Concordancer (on words and POS); Cooccurrence search; Contrastive statistics on subsets; Syntactic search (via SRCMF →"TXM Online GUI (ANR project Textométrie)"XML-TEI; Manual:""Various license, depending on the original publishers. ;"Each text has a unique identifier. In some texts, words are identified with a unique URI"Extensive metadata:;"; "CATTEX POS annotation for ;"3.3 mill. tokensNicolas Mazziotta
CORPUS CORPORUM, repositorium operum Latinorum apud Universitatem Turicensemwww.mlat.uzh.chPhilipp Roelli, Max Bänziger, Mittellateinisches Seminar, Universität Zürichplatform: depending on the individual corpusdepending on the individual corpus, mostly medieval Latinfull text search, complex searches, wordlists, concordances, in the future: lemmatised search, chronological searchweb interfaceTEI xml fed into mySQL databasefreeneeds to be discussedcurrently simplified TEI: header, text structure, edition page numbers, typeface, occasionally text critical notesrecently exceeded 100 Million wordsPhilipp Roelli
Corpus de testaments de Saint-Dié-des-Vosges--David Trotter, Aberystwythend 13th century – 1450documents (mainly wills) emanating from (mostly) members of the clergy in Saint-Dié, typically bequeathing their belongings to the cathedralplain text search though see annotation below (could be converted to XML)--at present, MS Word but with visible encoding for abbreviations etc., i.e.  proto-XMLn/an/avarious descriptive and anlytical articles have been published, based on this corpusidentifies recipients, place-names, persons, notaries, dates80 documentsDavid Trotter
Corpus Diacrónico del Español (CORDE) Academia EspañolaCORDE is an historical corpus of 250 million records that covers the history of Spanish from the origins to 1975.All genres (literary and non literary) written in SpanishSearch by word (exact or inflected); search by author, work, chronological period, media (book, journal, review, oral, etc.), geography (all Spanish domain), genre (lyric, narrative, encyclopedias, literature, mathematics, science, etc.)Web serviceAll texts have been encoded using SGMLC, according the recommendations of the TEI Guidelines.FreeLinking just between occurrence and text.Year, author, title, country, genre, publicationMorphological, syntactical and pragmatical annotation250 million forms (April 2005)Susanna Allés
Corpus OVI dell'Italiano antico (Old Italian Database) (= - as a hub; Corpus OVI:;Istituto Opera del Vocabolario Italiano (The Italian Dictionary) del Consiglio Nazionale delle Ricerche (Italian National Research Council) - Responsible Persons: Pär Larson & Elena ArtaleTexts from ca 800 to ca 1400 - All Italo-romance varieties written in Medieval ItalyAll texts, belonging to any genre, available in modern editionsSearch by wordforms; by lemmata; by grammatical category; search of co-occurrences (with punctuation)WebSqLite Database; encoding; proprietary (XML in the forthcoming version, under development)Free AccessSearches through the Corpus OVI can be run from any web interface (one can setup the search by wordform or lemmas, or define a subset)all samples (results of searches) are linked to a bibliographical record (philological information will be appended to such records in the forthcoming version).;lemmatisation operated by GATTO Software (see the manual at http://www.ovi.cnr.it2,315 texts (files), 23,178,540 occurrencesPietro Beltrami
Corpus Rhythmorum Musicumwww.corimu.unisi.itFrancesco Stella - Università di Siena/ArezzoEarly medieval Latin poems set to musicSongs and lyricsBy texts, manuscript transcriptions, musical transcriptions, apparatuses, linguistic features, language statystics, codicological informations, bibliography, audio records, cross-search.There is a web service and also a cdrom version provided with mss. reproductions, that can be purchased together with the printed volumeMySqL, very light xml encoding, rich db annotationFree consultation, downloadable data, license by L. Tessarolo and F. Stella, cd-rom and book sold by SISMELMulti-level, no TEIFrancesco Stella
DÉCT – Dictionnaire Électronique de Chrétien de Troyes Kunstmann (Laboratoire de Français Ancien, University of Ottawa) (together with Hiltrud Gerner [ATILF, CNRS Université de Lorraine], May Plouzeau [Université de Provence], Ineke Hardy [LFA, University of Ottawa], Gilles Souvay [ATILF, CNRS Université de Lorraine, IT])5 romances of Chrétien de Troyes (Érec, Cligès, Lancelot or the Chevalier à la Charrette, Yvain or the Chevalier au Lion, Perceval or the Conte du Graal); April 2013: letters A-Q; semi-diplomatic transcription of only one ms.: BN fr. 794 [champ. ca. 1235]s. aboveDefinition of corpus; text search and dictionary search; search by lemma, form, two-word collocates, locutions, proverbs, etyma, etc.; reading of entire texts, links to bit-map images of ms.Web; dictionary articles A-K and texts downloadable.Texts : XML-TEI; lexicon : XMLOpen accessNo(lexicon bibliography in progress)Lemmatised (based on Tobler-Lommatzsch lemata); so far only qualifying adjectives, adverbs ending in -ment, substantives, and verbs; an upgraded version of DÉCT is planned5 texts, 220,262 words
3394 entries A-Q. Will be completed R-Z in June 2013
Gilles Souvay, Sabine Tittel
Diccionario del castellano del siglo XV en la Corona de Aragón Lleal Galcerán (dir.) / Servei de Tecnologia Lingüística – Universitat de BarcelonaDiCCA-XV is an historical dictionary of Spanish used in the Crown of Aragon in the fifteenth century.The corpus, consisting of original works and translations, contains both literary texts and non-literary texts.Search by lemma (just browsing the listing on the left) and by “Onomastics” (same method by listing all forms on the left); a high quantity of listings (word list by source, word list of the 15th neologisms, word list by function, etc.); browsing lemmas through their collocation inside the corpus.Web service.Database implemented with Filemaker. Information not available.FreeFew possibilities of linking and just inside the dicctionary DiCCA-XV.Information not available.Information not available.1,422,000 occurrences.Susanna Allés
Dictionnaire étymologique de l’ancien français DEAFwww.deaf-page.deHeidelberg Academy of Sciences and Humanities, founded 1968 by Kurt Baldinger 1971, director 1984-07/2007 Frankwalt Möhren, director since 08/2007 Thomas Städtlerprimary texts from 842 – ca.1350 (dating of senses regard their living until Modern French), Old French - domaine d’oïlprimary literature (all text types), secondary literature, dictionariesDEAFpré: search by lemmata and graphical variants; DEAFplus: currently only search by lemmata; + concept of complexe search functions (including ca. 20 functions for DEAFpré and DEAFplus, v. ; publication of DEAFplus: currently only bitmap-images of G-K, letters F-K will be published in digital version by DeGruyter (publication delayed; was planned by DeGryter for spring 2013; publication will also include DEAFpré)webMySQL-Database with integrated XML-encoded data; export in XML, XHTML, LaTeXDEAFpré: open access via; DEAFplus: subscription (includes DEAFpré) [currently DEAFplus as bitmap-images open access via]DEAFpré: currently linking to article-id (; DEAFplus: to be defined by DeGruyter; currently linking to bitmap-version ( sigla linked with bibliography (with information on title, author, date, scriptae, mss., mss.-dates, mss.-scriptae, editor/publisher, concordances with 13 other dictionaries / bibliographies)lemmatisationDEAFpré: ca. 18,000 articles (effective 2012); DEAFplus: 10,509 (effective 2012)Sabine Tittel
DMF – Dictionnaire du Moyen Français Martin, ATILF CNRS1330-1500 (Middle French)Literature, non-fictional texts, documents (via dictionaries)Dictionary : form, lemma, etymon, word in dictionary structure (definition, syntagma, example…), combination of elements…; Texts : form, lemmaWebDictionary: XML; Texts: XML/TEIFree access; source of examples; Texts : author, title, date,  localisation, DEAF id (concordance with DEAFBiblEl for every text possible)Dictionary 2012 edition: 62,371 entries -  455,969 examples - 185 Mo; Textual data base: 229 texts - 6,121,994 words; Morphological lexicon: 850,000 entries (form, lemma, part of speech)Gilles Souvay
DocLing – Les plus anciens documents linguistiques de la France Gleßgen [in collaboration with Frédéric Duval and Paul Videsott, formerly with Françoise Vielliard and Olivier Guyotjeannin; founded by Jacques Monfrin]1204-1335, Northern FranceJuridical documentsDisplay text w/o XML-markup; Upload/display image(s) associated with a text; Search occurrences (with Regular Expressions); by type (word surface); by lemma; Annotate occurrence(s):; Lemma; Morphology; Grapheme; Export occurrences (plus context and some metadata); CSV; XLS;Web-based (XHTML/PHP/JavaScript)Idiosyncratic XML with three Subtypes/XSDs, UTF8; Relational Database (MySQL), UTF8Open sourceSOAP Service with two main functions:; 1. getOccurrences(Lemma) => OccurrenceCollection; 2. getOccurrenceDetails(OccurrenceID) => OccurrenceDetails; Our test server can be reached via SOAP/WSDL here: . The current documentation is available here: (WSDL), (Data Types)"Title; Date; Location; Social position of the editor; Genre; Regest; Author (diplomatic sense); Disposant (diplomatic sense); Seal; Beneficiary; Actors; Editor; Scribe; Form (material); Location of convervation; Edition; Analysis; Observations concerning the writing; Language; Text on the back of the charter (verso)Token type (punctuation, number, word); Lemma; Morphology (Part-of-Speech [Cattex09], Genus, Numerus, Tempus, Aspect, Vox, Person, Grade, Case); Grapheme2185 deeds in 14 subcorporaSamuel Läubli
Du Cange (et al.), Glossarium mediæ et infimæ latinitatishttp://ducange.enc.sorbonne.frÉcole nationale des chartes ; ANR Omnia ; développement : Frédéric Glorieux et al.Europe médiévale au sens large (de l'Antiquité tardive au XVIIIe siècle)tous types de textesrecherche possible des vedettes et sous-vedettes ; requêtes possibles en plein texte ou dans les citations ; le tout avec ou sans recherche floueWeb serviceFichiers source : XML (TEI) UTF-8
Web : SQLite
CC BY-NC ; LGPL 3lien possible (au moins) sur les vedettesModèle d'annotation : TEI ; encodage typographique et lexicographique. Annotation la plus importante : indication des langues des citations ; indication de l'auteur des interventions90.000+ articles
Source : 24 fichiers XML, pour un total de 80 Mo.
SQLite : 224 Mo
Renaud Alexandre (Analyse et Traitement Informatique de la Langue Française) and
Université de Lorraine
1180 – 2012 ; divided in several part Frantext Intégral (whole texts), Frantext catégorisé (tagged), Frantext Moyen Français (Middle French), Frantext AFNOR (technical), Frantext CTLF (Linguistic texts in de , en, es, fr, it and latin)allsearch for words (exact match; modern flexion; regular expression; medieval or 17th century flexion and variations)
search for collocations, frequencies, distribution, word's neighborhood
advanced search using ‚expression de séquence‘: complex language of interrogation using syntactic schemes and grammar rules
[advenced search capabilities non-intuitive; definition of search corpus difficult]
WebXML/TEISubscription for intégral and catégorisé, open access for démonstration AFNOR, Middle French and CTLF; bibliography open accessNo, due to subscription and editor’s rightsgenre (bibliographie with few information; bad search capabilities) a new version is in progress; concordance with DEAFBiblEl for Middle French textsLemma and part of speech for ‘Frantext catégorisé’Intégral : 4,248 texts; 262,000,000 words
Catégorisé: 1,940 texts; 127,500,000 words
CTLF : 234 texts; 8,360,000 words
Moyen-Français : 218 texts; 6,790,794 words
Gilles Souvay, Sabine Tittel
MCVF – Modéliser le changement: les voies du français/ Modelling Change : the Paths of Frenchhttp://www.voies.uottawa.caFrance Martineau (dir.), University of OttawaOld French, Middle French, 16th century French, classical French (17th-18th century)all genresPhiloLogic; the texts are not accessible in full-text format, but users can search them and obtain results in the form of a concordance or a tree structure (depending on the search engine) for different query typeswebXML-TEIfree access after registrationtitle, author, date, form, domaine, genre, region, mss., date of mss., ms.-origin, bibliograph. data (no sigla, no concordance with DEAFBiblEl for the Old French texts)morphological (v. and syntactical annotation (v. BUT:  search tool does not give access to the morphological or syntactic tags.24 texts until 1350 (of a total of 270 texts ); number of words?Sabine Tittel
NCA – Nouveau Corpus d'Amsterdam Dees (Corpus d'Amsterdam); enriched by Achim Stein, Pierre Kuntsmann and Martin-Dietrich Gleßgen; currently maintained by Achim Stein, StuttgartEntire “oïl” domain, 11th-14th C.Mainly literary, Some manuscriptsConcordancer, TigerSearchTWIC (Web GUI), TigerSearchIdiosyncratic XMLSpecial license: XML structure (Todo)Extensive bibliographic reference; see Gleßgen, Martin-Dietrich & Vachon, Claire (2010): Répertoire bibliographique du Nouveau Corpus d'Amsterdam, établi par Anthonij Dees et Piet Van Reenen (Amsterdam 1987), revu et élargi par M.-D.G. et C.V., 3. ed., Stuttgart: Institut für Linguistik/Romanistik. Dees POS annotation and TreeTagger automatic annotation140 texts (in 301 manuscripts) > 3 mill. wordsNicolas Mazziotta
TFA – Textes de Français Ancien Kunstmann, Laboratoire de français ancien (Ottawa), data hosted by ARTFL – American and French Research on the Treasury of the French Language (Ottawa / Chicago), Mark Olsen12-15th century, bibliographie v. No geographical limitation.romances, chanson de geste, etc. (especially works of Chrétien de Troyes [5 romances ≈ DÉCT] and cycle de Guillaume); used were mostly existing text editions + ms.-transcriptions (by P. Kunstmann: BN fr. 794 → DÉCT).search and reporting software PhiloLogic; search results are shown in KWIC report or concordance report; links lead to full texts; word counts for texts (results sortable by words or frequencies); search corpus is definable; wildcards and boolean operators; [gross index of each of the TFA texts; few lemmatised indexes; for some texts, one can read notices (explanations and commentaries): information given, but hard to find]webPhiloLogic uses a combination of (unqualified) Dublin Core 1.0 metadata specification with very basic HTML (ignores SGML tagging which is not specifically described). „In contrast to the TEI and other specifications, which work from the top-down to provide an infrastructure for all possible encodings in a system independent fashion, ATE is system dependent and built from the bottom-up. We specify only tagging that we actually have in production or that we are planning to treat in PhiloLogic. Extensions to this basic specification will be, as always, completely optional and will use existing schemes, preferably TEI specifications, whenever possible.“ ( DB (works under copyright are marked with an asterisk in the bibliography; in this case, words are shown in a more limited context).ex. of link to attestation: (problem of subscription)title of text, creator (= medieval author), date of text, publisher, genre, type (verse / prose), identifier (= siglum [no concordance with DEAFBiblEl])Lemmatised, according to given information, but search results don’t include inflected forms3 mill. wordsSabine Tittel
TLIO - Tesoro della Lingua Italiana delle Origini (Historical Dictionary of Old Italian) Opera del Vocabolario Italiano (The Italian Dictionary) del Consiglio Nazionale delle Ricerche (Italian National Research Council) -; - Responsible Persons: Pietro Beltrami, Pär Larson, Paolo Squillacioti; - Responsible Persons for the software: Domenico Iorio Fili, Andrea BoccellariTexts from ca 800 to ca 1400 - All Italo-romance varieties written in Medieval ItalyAll texts, belonging to any genre, available in modern editionsSearch entries, wordforms, text in definitionsWebMySQL Database; HTML file formatFree AccessTLIO entries can be linked from any web interfaceall samples (results of searches) are linked to a bibliographical record (philological information will be appended to such records in the forthcoming version).;in progress: 25,889 entriesPietro Beltrami