In a tagged text, every word is marked with grammatical information (e.g. tense, voice, mood of verbs, case, gender, number of nouns, etc.). Usually the lemma (dictionary form) and sometimes the meaning of the words is also included.
Greek words change their spelling (morphology) based on the function of the word in a sentence. Thus if a noun is the direct object (accusative case) it will be spelled differently than if it is the subject (nominative). English does the same thing to a lesser extent, particularly with pronouns (I, me, mine, my) and verbs (go, going, went). Thus if you want to find a word that is a direct object (accusative) in a Greek sentence, you cannot search for the dictionary form of the word in an untagged Bible text such as those in the Online Bible.
Many Hebrew words are composed of several syntactical units (morphemes), such as a conjunction, preposition, article and noun. For example, be-re-shith in Gen 1:1 consists of a preposition and noun and wa-ha-eretz in Gen 1:2 consists of a conjunction, article and noun. It is essential to be able to search each component part separately to do accurate searches in Hebrew.
It is important that Hebrew morphemes be parsed and searchable separately. For example, be-re- shith ("in the beginning") should be searchable as a preposition ("in") as well as a noun ("beginning"). Programs vary in their treatment of this tricky issue:
The grammatical tagging systems used for Greek texts differ significantly between programs. The selection of grammatical tags is based on subtle and often unstated assumptions. Although the function of a Greek word is indicated largely by its spelling (its "morphology"), at times the function of a word must be determined by its relation to the context. There is always a tension between purely morphological analysis based on word forms and a more functional analysis based on the interaction of a word with other words in the sentence.
Grammatical tagging schemes range along a spectrum from formal (morphological) to functional classifications. No scheme for classifying Greek words is purely formal or purely functional, since the function of a word is determined both by its morphology and its relation to the context. However, the more a tagging system tends toward the functional end of the spectrum, the more subjective the classifications become.
For example, Bauer's lexicon classifies óu as an adverb of place However, in 5 instances (Mt 18:20; Rom 4:15; 5:20; 1 Cor 16:6; 2 Cor 3:17), the Friberg text, which is a largely functional system, classifies it as a conjunction, based on the nuances of the word in the context. A user would need to be aware of such functional classifications in order to find every occurrence of a particular part of speech.
There are four major morphologically-tagged Greek NT texts used in Bible-search software:
There are two versions of the grammatically tagged Septuagint text avaialble:
These variations are due to several major factors:
The following discussion focuses on the Greek New Testament, but the principles are applicable to searching the Hebrew Bible and Septuagint.
As has been shown, there is a considerable variation in the tagging schemes used in Greek New Testament texts. The Friberg texts use a more functional classification method than other texts. Even the Friberg 2 text still has many functional and unusual classifications. The Gramcord and CCAT texts use largely formal classifications.
Unfortunately, except for Gramcord, the manuals for popular Bible-search programs rarely discuss the assumptions used in the classification of words. Yet it is essential that researchers understand the nature of the underlying machine-readable biblical text if their analysis of the text is to be meaningful.
The print edition of the Friberg 1 text has an appendix outlining the criteria used for the tags (Barbara and Timothy Friberg, eds., Analytical Greek New Testament, Grand Rapids: Baker, 1981). Unfortunately there is no similar book explaining the classification philosophy of the revised Friberg text. In many instances TheWord deviates from the Friberg 1 tags, without documenting the differences. No program makes use of more than one of the Friberg multiple classifications of ambiguous words and no program documents the selection criteria.
Although users assume the accuracy of Bible-search tools, the underlying texts are rarely completely free from error. When the databases are created, the classifications of lemmas (dictionary forms) and grammatical forms are often performed initially by an automatic parsing program. Sometimes the human proofreaders may fail to catch errors.
Most errors fall into three classes:
Many tagged texts have some functional or unusual classifications of words which can produce unexpected search results.
In Gramcord, many foreign words such a hosanna are classified as interjections. However, foreign proper nouns are classified as nouns and parsed by function in context. By contrast, Bible Windows and Bible Works classify hosanna as a particle.
Conjunctions and particles are particularly difficult words to classify. A beginning user might miss many occurrences of kai if he only searches for the word as a conjunction. Since kai also functions as an adverb in some cases, most programs will sometimes classify it as an adverb. However, as the following chart shows, the classification choices in individual instances vary considerably:
Program: Conjunction: Adverb: No subclass: Copulative: Correlative: GRAMCORD 8049 words 187 words 656 words BWorks 8214 words 801 words 4727 verses 733 verses Bwin 750 words (?) 750 words (?) TheWord 5126 verses 753 verses
Bible Windows was unable to report the total number of occurrences of kai, because it only allows 750 matches in a search. Since it is hard to predict how a program will classify the word in any given passage, the safest approach is to search for all possible classifications and manually eliminate invalid matches. The Gramcord manual documents how many times each word is classified as a conjunction, particle or adverb, which makes it easier to define searches that will find all occurrences of such words.
Since the Friberg text (Bible Works and TheWord) attempts to classify many words by function based on discourse analysis, some classifications may be surprising to users. Friberg 1 uses the category of "substantive adjective" to refer to adjectives which are used as nouns in context. For example, agathos ("good") is classified as a substantive adjective in Mt 5:45 ("he makes the sun shine on the evil and the good). This type of classification affects 4131 occurrences of 1068 words in 3009 verses! While adjectives can certainly function as substantives, the term "substantive adjective" is not a part of speech used by most Greek grammars. It would be easy for a user to accidentally miss many important occurrences of adjectives unless he searches both for "adjectives" and "substantive adjectives". The Friberg 2 text eliminates the substantive adjective classification, but it introduces other surprising functional classifications. For example, in most cases Friberg 2 classifies relative pronouns as adjectives, with an adjective subtype of "relative." It introduces a category of participial imperative (168 occurrences of 120 words in 135 verses) and (7813 occurrences of 1726 words in 4792 verses).
Functional classifications such as those frequently used in Friberg's text are more subjective than formal classifications. Their value depends largely on the accuracy of the classifier's interpretation of the text. While they appear to be objective raw data, in fact they contain the prior conclusions of another researcher, which tends to skew the search results to fit the classifier's own viewpoint.
Even the strictest formal classification method must classify certain words by function in context, since the morphology of these words is inconclusive. While in most cases the meaning is clear in the context, in some instances the grammatical classification is subject to scholarly debate. For example, the gender of potamou could be either neuter or masculine. In Mt 6:13 the meaning is debated: Does the Lord's Prayer ask for deliverance from "evil" (neuter) or "the evil one" (masculine)? Since Bible Windows 2, Gramcord and Accordance classify potamou in Mt 6:13 as neuter, a search for masculine adjectives will not find the verse. By contrast, TheWord and Bible Works classify the word as masculine and do not allow the word to be found in a search for masculine adjectives! Only Bible Windows 3 acknowledges both possible parsings and allows the word to be found with either search.
Bible-search programs would be more useful if they marked such words as ambiguous and allowed searching on the multiple classifications. The print version of the Friberg text includes multiple classifications in many instances. However, at this time only Bible Windows 3 allows searching on Friberg's multiple classifications. Although Bible Works and TheWord both remove the multiple parsings in Friberg 1, the documentation does not explain the criteria used to make these choices.
Gramcord makes a good attempt at handling ambiguous classifications. In many cases, it tags words in multiple ways and flags the ambiguous classification in the resulting concordance. The documentation lists all ambiguous classifications which are used. However, even Gramcord could be improved in this area. For example, it does not include the ambiguous classification of potamou in Mt 6:13.
Wildcards are symbols that indicate that any letter or letters will be accepted at a certain point in a word. Thus a search for "apo*" will find apoluw, apodidwmi and other words which begin with "apo".
Some programs place a limit on the number of words that can match wildcards. TheWord allows a maximum of 300 words to match a wildcard and may not warn if this limit is exceeded. Logos 1.6 only allows 32 words to match a wildcard. Version 2 has no limits and even lets you choose multiple words from a pick list matching the wildcards.
Most programs (e.g. TheWord, Gramcord, Bible Works, Logos) assume that the search expression includes the whole word, unless wildcards are explicitly included. However Bible Windows uses full word searches for grammatical searches and double wildcard searches for word and phrase searches. (In a double wildcard search, the search letters can be found anywhere within a word.) This inconsistent behavior in Bible Windows can easily confuse users and result in erroneous searches.
Programs also differ in how they interpret grammatical wildcards. For example, Gramcord finds participles and infinitives in a wildcard search for verbs. In Logos 2, participles and infinitives are not found in a wildcard search for verbs. You must explicitly set up a different set of wildcards for infinitive, participles and finite verbs. Both programs use the Gramcord Greek NT database, but make different search assumptions.
Bible Windows has an undocumented limit of 750 matches per search. Since there is no error message that warns that the maximum number of matches has been exceeded, this can lead to misleading conclusions.
TheWord reports matches in terms of the number of verses which contain the desired construction. Bible Windows and Gramcord report the number of occurrences of the desired construction. Bible Works and Logos report a count of occurrences and verses.
For some searches, word order is very important. For example, a search for substantival adjectives should find all occurrences of an article in agreement with an adjective only when the article appears just prior to the adjective, not after the adjective. In other cases, it is important to find all permutations of word order. For example, a search for genitive absolutes should allow either the genitive noun or the participle to appear first.
Programs differ in the importance they place on word order in search expressions. Gramcord requires an exact match of the order of the search elements. However, searches can be defined that include several combinations of word order and distinguish them in the resulting concordance. Bible Works and TheWord do not distinguish the word order of search elements. This can result in many false matches. For example, a search for "men . . . de" finds 10 verses in which the order of words is "de . . . men". Bible Windows is sensitive to word order in grammatical searches but not in word searches, which produces inconsistent search results. By default Logos 2 is not sensitive to word order. However, search expressions can require that certain elements precede or follow others.
Many grammatical constructions require that the same search term appear more than once (e.g. "de . . . de"). Gramcord, Bible Windows and TheWord allow the same term to appear more than once. They properly find verses in which the word de occurs twice. However, Bible Works simply finds all verses in which the word de occurs at least once. Logos 2 also finds all verses in which de occurs at least once. Although a search expression can require that a certain word precede another, Logos 2.0a does not accurately execute such as search if the search terms are identical.
False matches can frequently be eliminated by specifying terms that must not appear between search elements. For example, a future perfect periphrastic construction requires a future tense of eimi and the perfect participle of another verb in the same clause. Since it is highly unlikely that a finite verb will occur between these two search terms, the search can be improved if it specifies that no finite verb can intervene.
Gramcord and Accordance allow multiple intervening exclusion and inclusion terms. They can specify words and parts of speech that may occur between search terms as well as words and parts of speech that may not occur between search terms. Other programs do not include a true "exclude intervening term" option. At first glance it would appear that the "and not" Boolean operator which is available in Bible Windows, Logos 2 and TheWord would accomplish the same thing. However, the "and not" operator defines what a search term may not be, not the types of words that cannot appear between search terms. Thus this feature can produce undesired interactions between the search terms.
The following chart illustrates the effect of excluding intervening terms in a search for future perfect periphrastics:
Source: Matches: Invalid: Missing: Nigel Turner, Syntax, p. 89[1] 6 GRAMCORD Not exclude intervening 12 6 0 verbs Exclude intervening finite 8 2 0 verbs BWin Not exclude intervening 9[2] 3 0 verbs Use "and not" indicative 0 0 6 verbs
Bible Windows has no true exclusion command. When the second search term is set to "and not an indicative verb", there are no matches, because any verse with a future eimi also has a finite verb (i.e. eimi).
Many grammatical constructions require that two or more words be in close proximity, though not necessarily side by side. A search program should allow restriction of search expressions to a definable number of words.
Gramcord allows the user to specify that up to 200 words span from beginning to end of a construction and Bible Windows allows specifying up to 20 words. By default both programs assume that all elements in a search expression are juxtaposed. If this number is not set appropriately many valid examples of a construction will be missed.
Logos 2 allows you to specify that one word occurs within a certain number of words from another word. By default it assumes that multiple words can occur anywhere in a verse.
Bible Works has less flexibility than either of these programs. By default, all words must appear somewhere in the same verse. The user can specify a maximum number of verses in which to find the search terms. This is far less valuable for grammatical searches than a limit by number of words, though it can have value for discourse-level research.
For grammatical searches it is more valuable to set the search boundaries by sentences or clauses than by verses. A program based on verse boundaries would have difficulty with sentences that span several verses (e.g. Eph. 1:3- 14). For discourse analysis, search boundaries should be set at the paragraph, chapter or book level. An ideal program would allow setting search boundaries by clause, sentence, a specific number of verses, paragraph, chapter, book. It would also allow the option of stopping at or ignoring various types of punctuation marks.
Accordance allows boundaries to be clause, sentence, verse, paragraph, chapter or book. TheWord allows boundaries to be verse, paragraph, chapter or book. Most other programs are more restricted. Bible Windows does not allow specifying boundaries, though it will cross verse boundaries if the word proximity is set high enough. Bible Works and Logos use verses as boundaries, but search expressions can cross verse boundaries if the proximity is set to 2 verses. Gramcord uses the sentence as a boundary, so it is more likely than a verse-oriented program to find all occurrences of a grammatical construction.
Logos 2 is inconsistent about search boundaries. For searches with Boolean operators (AND, OR, NOT), multiple search terms must occur in the same verse. However, if you specify that one word must occur within a certain number of words of another word (or before or after the word), the default search boundary is a sentence, not a verse.
Programs differ in how they handle a conflict between the number of words in
proximity and the search boundary. Bible Windows will cross verse boundaries
in an effort to compare the specified number of words. Gramcord will never
cross a full stop (period, semi-colon (raised dot) or question mark),
regardless of the maximum number of words allowed in the proximity. This
subtle difference can produce significantly different search results.
Gramcord misses 12 examples of
Many grammatical constructions require either that certain grammatical features agree or not agree. For example, a genitive absolute requires a clause with a genitive noun and a genitive participle that agree in gender and number. If agreement cannot be required between search elements, many false matches must be manually removed.
Since Logos does not allow specifying agreement of grammatical features, it makes it difficult to find genitive absolutes without substantial manual labor. Bible Windows and Bible Works allow specifying agreement, but they cannot limit the agreement to specific search terms. Thus a search for genitive absolutes is quite simple with Bible Windows or Bible works, but it is difficult to find constructions in which individual pairs of search terms agree. For example, it would be difficult to find the common Greek construction "article1 article2 noun2 noun1", where article1 and noun1 agree with each other and article2 and noun2 are genitive and agree with each other (e.g. ó tou tektonos úios in Mt 13:55). Gramcord and Accordance allow great flexibility in agreement. Any grammatical feature of selected pairs of words can be required or forbidden to agree. As a result, these programs can find very complex grammatical constructions with relatively few false matches.
TheWord requires that Greek accents and breathing marks be entered as they appear in the biblical text. This makes entry of search expressions tedious and error prone. It also results in missed matches, where the context changes the accents.
On the other hand, required entry of breathing marks is desirable, since otherwise it is difficult to distinguish similar word pairs such as eis and éis. Bible Windows includes breathing marks in the word pick list for grammatical searches and gives the option of including them in word searches. If they are not included in word searches, breathing marks are ignored. With Gramcord ambiguities can be resolved by specifying the desired part of speech. Bible Works has no way to easily distinguish óu (an adverb of place) from ou (the negative particle), since it classifies both as adverbs. Logos 2 ignores diacritical marks in word searches. A morphological search is necessary to distinguish words which only differ by diacritical marks.
A search for
Source: Matches: Invalid: Duplicate: Missing:
Nigel Turner, Syntax, 110
p. 332
GRAMCORD
Not exclude inter- 112 0 14 12
vening de
Exclude intervening de 98 0 0 12
BWorks
Context of 1 verse 104 11 0 17
Context of 2 verses 157 47 0 0
BWin
Word search mode 134 27 0 3
Grammatical search mode 97 0 0 13
Logos 2
Terms up to 20 words 667 (!) 0 567 (?) 13
apart
TheWord
Context of 1 verse 97 0 0 13
The following table summarizes the search capabilities of several programs:
Feature: GRAMCORD: BWorks: BWin: TheWord: Logos 2
Gram. Word
Search Search
Wildcards: optional optional none implicit optional optional
Match limit: unlimited unlimited 750 750 unlimited unlimited
Statistics
reported: occurrences occurrences occur. occur. verses occurrences
and verses and verses
Word order
sensitivity: yes yes yes no no optional
Allow duplicate
terms: yes no yes no yes no
Exclude interven-
ing terms: yes no no no no no
Proximity:
Type: words verses words words verse, words or
para- verses
graph,
chapter,
book
Limit: 200 unlimited 20 20 1 unlimited
Boundary:
Type: full stop specified none none verse, full stop
(period, number of para-
question verses graph,
mark, chapter,
semicolon book
or weak stop
(comma, colon)
Priority: boundary proximity proximity proximity proximity boundary
Agreement of any none all search none none none
grammatical combination terms
features: of search
Diacritical
marks: ignored ignored required optional required ignored
Historical and Theological Texts with Search Software:
Using Text Analysis Software to Study the Bible and Theological Texts:
Types of analysis
General purpose text analysis programs can be used for analyzing biblical and
theological texts. Programs are available that allow textual studies such
as:
Some Texts You May Want to Study:
Obtaining Texts to Study
Many biblical and non-biblical texts are available over the Internet. Usually
there is no charge if the texts are used for personal research. Here are some
useful places to look:
Some Useful Text Analysis Programs
A description of several general purpose text analysis programs is
available from
http://info.ox.ac.uk/departments/humanities/general.html.
Some of the more useful ones are:
Here are some useful starting places for information on computer-assisted literary research: