py4sci

This Page

Bibulous Module

Bibulous is a drop-in replacement for BibTeX, with the primary advantage that the bibliography template format is compact and very easy to modify.

The basic program flow is as follows:

  1. Read the .aux file and get the names of the bibliography databases (.bib files), the style templates (.bst files) to use, and the entire set of citations.
  2. Read in all of the bibliography database files into one long dictionary (bibdata), replacing any abbreviations with their full form. Cross-referenced data is not yet inserted at this point. That is delayed until the time of writing the BBL file in order to speed up parsing.
  3. Read in the Bibulous style template file as a dictionary (bstdict).
  4. Now that all the information is collected, go through each citation key, find the corresponding entry key in bibdata. From the entry type, select a template from bstdict and begin inserting the variables one-by-one into the template. If any data is missing, check for cross-references and use crossref data to fill in missing values.
bibulous.get_bibfilenames(filename, debug=False)[source]

If the input is a filename ending in ‘.aux’, then read through the .aux file and locate the lines ibdata{...} and ibstyle{...} to get the filename(s) for the bibliography database and style template.

If the input is a list of filenames, then assume that this is the complete list of files to use.

Parameters :

filename : str

The “auxiliary” file, containing citation info, TOC info, etc.

Returns :

filedict : dict

A dictionary with keys ‘bib’ and ‘bst’, each entry of which contains a list of filenames.

bibulous.sentence_case(s)[source]

Reduce the case of the string to lower case, except for the first character in the string, and except if any given character is at nonzero brace level.

Parameters :

s : str

The string to be modified.

Returns :

t : str

The resulting modified string.

bibulous.stringsplit(s, sep=u' |(?<!\\\\)~')[source]

Split a string into tokens, taking care not to allow the separator to act unless at brace level zero.

Parameters :

s : str

The string to split.

Returns :

tokens : list of str

The list of tokens.

bibulous.finditer(a_str, sub)[source]

A generator replacement for re.finditer() but without using regex expressions.

Parameters :

a_str : str

The string to query.

sub : str

The substring to match to.

bibulous.namefield_to_namelist(namefield, key=None, nameabbrev=None, disable=None)[source]

Parse a name field (“author” or “editor”) of a BibTeX entry into a list of dicts, one for each person.

Parameters :

namefield : str

Either the “author” field of a database entry or the “editor” field.

key : str

The bibliography data entry key.

nameabbrev : dict

The dictionary for translating names to their abbreviated forms.

disable : list of int, optional

The list of warning message numbers to ignore.

Returns :

namelist : list

A list of dictionaries, with one dictionary for each name. Each dict has keys “first”, “middle”, “prefix”, “last”, and “suffix”. The “last” key is the only one that is required.

bibulous.namedict_to_formatted_namestr(namedict, options=None, use_firstname_initials=True, namelist_format=u'first_name_first', nameabbrev=None)[source]

Convert a name dictionary into a formatted name string.

Parameters :

namedict : dict

The name dictionary (contains a required key “last” and optional keys “first”, “middle”, “prefix”, and “suffix”.

options : dict, optional

Includes formatting options such as ‘use_name_ties’: Whether to use ‘~’ instead of spaces to tie together author initials. ‘terse_inits’: Whether to format initialized author names like “RMA Azzam” instead of the default form “R. M. A. Azzam”. ‘french_intials’: Whether to initialize digraphs with two letters instead of the default of one. For example, if use_french_initials==True, then “Christian” -> “Ch.”, not “C.”. ‘period_after_initial’: Whether to include a ‘.’ after the author initial.

use_firstname_initials : bool

Whether or not to initialize first names.

namelist_format : str

The type of format to use for name formatting. (“first_name_first”, “last_name_first”)

nameabbrev : list of str

A list of names and their abbreviations.

Returns :

namestr : str

The formatted string of the name.

bibulous.initialize_name(name, options=None, debug=False)[source]

From an input name element (first, middle, prefix, last, or suffix) , convert it to its initials.

Parameters :

name : str

The name element to be converted.

options : dict, optional

Includes formatting options such as ‘use_name_ties’: Whether to use ‘~’ instead of spaces to tie together author initials. ‘terse_inits’: Whether to format initialized author names like “RMA Azzam” instead of the default form “R. M. A. Azzam”. ‘french_intials’: Whether to initialize digraphs with two letters instead of the default of one. For example, if use_french_initials==True, then “Christian” -> “Ch.”, not “C.”. ‘period_after_initial’: Whether to include a ‘.’ after the author initial.

Returns :

new_name : str

The name element converted to its initials form.

bibulous.get_delim_levels(s, delims=(u'{', u'}'), operator=None)[source]

Generate a list of level numbers for each character in a string.

Parameters :

s : str

The string to characterize.

ldelim : str

The left-hand-side delimiter.

rdelim : str

The right-hand-side delimiter.

is_regex : bool

Whether the delimiters are expressed as regular expressions or as simple strings.

Returns :

oplevels : list of ints

A list giving the operator delimiter level (or “brace level” if no operator is given) of each character in the string.

bibulous.get_quote_levels(s, disable=None, debug=False)[source]

Return a list which gives the “quotation level” of each character in the string.

Parameters :

s : str

The string to analyze.

disable : list of int, optional

The list of warning message numbers to ignore.

Returns :

alevels : list

The double-quote-level for (``,’‘) pairs in the string.

blevels : list

The single-quote-level for (`,’) pairs in the string.

clevels : list

The neutral-quote-level for (”,”) pairs in the string.

Notes

When using double-quotes, it is easy to break the parser, so they should be used only sparingly.

bibulous.splitat(s, ilist)[source]

Split a string at locations given by a list of indices.

This can be used more flexibly than Python’s native string split() function, when the character you are splitting on is not always a valid splitting location.

Parameters :

s : str

The string to split.

ilist : list

The list of string index locations at which to split the string.

Returns :

slist : list of str

The list of split substrings.

bibulous.multisplit(s, seps)[source]

Split a string using more than one separator.

Copied from http://stackoverflow.com/questions/1059559/python-strings-split-with-multiple-separators.

Parameters :

s : str

The string to split.

sep : list of str

The list of separators.

Returns :

res : list

The list of split substrings.

bibulous.enwrap_nested_string(s, delims=(u'{', u'}'), odd_operator=u'\\textbf', even_operator=u'\\textrm', disable=None)[source]

This function will return the input string if it finds there are no nested operators inside (i.e. when the number of delimiters found is < 2).

Parameters :

s : str

The string whose nested operators are to be modified.

delims : tuple of two strings

The left and right delimiters for all matches.

odd_operator : str

The nested operator (applied to the left of the delimiter) currently used within string “s”.

even_operator : str

The operator used to replace the currently used one for all even nesting levels.

disable : list of int, optional

The list of warning message numbers to ignore.

Returns :

s : str

The modified string.

bibulous.enwrap_nested_quotes(s, disable=None, debug=False)[source]

Find nested quotes within strings and, if necessary, replace them with the proper nesting (i.e. outer quotes use ``...'' while inner quotes use `...').

Parameters :

s : str

The string to modify.

disable : list of int, optional

The list of warning message numbers to ignore.

Returns :

s : str

The new string in which nested quotes have been properly reformatted.

bibulous.purify_string(s)[source]

Remove the LaTeX-based formatting elements from a string so that a sorting function can use alphanumerical sorting on the string.

Parameters :

s : str

The string to “purify”.

Returns :

p : str

The “purified” string.

Notes

Currently purify_string() does not allow LaTeX markup such as ‘i to refer to the Unicode character which is correctly written as ‘i. Add functionality to allow that?

bibulous.latex_to_utf8(s)[source]

Translate LaTeX-markup special characters to their Unicode equivalents.

Parameters :

s : str

The string to translate.

Returns :

s : str

The translated version of the input.

bibulous.parse_bst_template_str(bst_template_str, bibentry, variables, undefstr=u'???')[source]

From an “options train” [...|...|...], find the first fully defined block in the train.

A Bibulous type of bibliography style template string contains grammatical featues called options trains, of the form [...|...|...]. Each “block” in the train (divided from the others by a | symbol), contains fields which, if defined, replace the entire options train in the returned string.

Parameters :

bst_template_str : str

The string containing a complete entrytype bibliography style template.

variables : list of str

The list of variables defined within the template string.

bibentry : dict

An entry from the bibliography database.

undefstr : str

The string to replace undefined required fields with.

Returns :

arg : str

The string giving the entrytype block to replace an options train.

bibulous.namestr_to_namedict(namestr, disable=None)[source]

Take a BibTeX string representing a single person’s name and parse it into its first, middle, last, etc pieces.

BibTeX allows three different styles of author formats.
  1. A space-separated list, [firstname middlenames suffix lastname]
  2. A two-element comma-separated list, [prefix lastname, firstname middlenames]
  3. a three-element comma-separated list, [prefix lastname, suffix, firstname middlenames].

So, we can separate these three categories by counting the number of commas that appear.

Parameters :

namestr : str

The string containing a single person’s name, in BibTeX format

disable : list of int, optional

The list of warning message numbers to ignore.

Returns :

namedict : dict

A dictionary with keys “first”, “middle”, “prefix”, “last”, and “suffix”.

bibulous.search_middlename_for_prefixes(namedict)[source]

From the middle name of a single person, check if any of the names should be placed into the “prefix” and move them there.

Parameters :

namedict : dict

The dictionary containing the key “middle”, containing the string with the person’s middle names/initials.

Returns :

namedict : dict

The dictionary augmented with the key “prefix” if a prefix is indeed found.

bibulous.create_edition_ordinal(bibentry, disable=None)[source]

Given a bibliography entry’s edition number, format it as an ordinal (i.e. “1st”, “2nd” instead of “1”, “2”) in the way that it will appear on the formatted page.

Parameters :

bibentry : dict

The bibliography database entry.

disable : list of int, optional

The list of warning message numbers to ignore.

Returns :

editionstr : str

The formatted form of the edition, with ordinal attached.

bibulous.export_bibfile(bibdata, filename)[source]

Write a bibliography database dictionary into a .bib file.

Parameters :

filename : str

The filename of the file to write.

bibdata : dict

The bibliography dictionary to write out.

bibulous.parse_pagerange(pages_str, citekey=None, disable=None)[source]

Given a string containing the “pages” field of a bibliographic entry, figure out the start and end pages.

Parameters :

pages_str : str

The string to parse.

citekey : str, optional

The citation key (useful for debugging messages).

disable : list of int, optional

The list of warning message numbers to ignore.

Returns :

startpage : str

The starting page.

endpage : str

The ending page. If endpage==startpage, then endpage is set to None.

bibulous.parse_nameabbrev(abbrevstr)[source]

Given a string containing either a single “name” > “abbreviation” pair or a list of such pairs, parse the string into a dictionary of names and abbreviations.

Parameters :

abbrevstr : str

The string containing the abbreviation definitions.

Returns :

nameabbrev_dict : dict

A dictionary with names for keys and abbreviations for values.

bibulous.make_sortkey_unique(sortkey, sortdict)

Given a key that matches an already-present key in the input dictionary, generate a new key by appending zeros to the key string.

Parameters :

sortkey : str

The key to be modified.

sortdict : dict

The dictionary whose keys we can query to check for uniqueness.

Returns :

newkey : str

The new (and unique) key.

bibulous.filter_script(line)

Remove elements from a Python script which are provide the most egregious security flaws; also replace some identifiers with their correct namespace representation.

Parameters :

line : str

The line of source code to filter.

Returns :

filtered : str

The filtered line of source code.

bibulous.str_is_integer(s)

Check is an input string represents an integer value. Although a trivial function, it will be useful for user scripts.

Parameters :

s : str

The input string to test.

Returns :

is_integer : bool

Whether the string represents an integer value.

bibulous.warn(msg, disable=None)

Print a warning message, with the option to disable any given message.

Parameters :

msg : str

The warning message to print.

disable : list of int, optional

The list of warning message numbers that the user wishes to disable (i.e. ignore).

class bibulous.Bibdata(filename, disable=[], debug=False)[source]

Bibdata is a class to hold all data related to a bibliography database, a citation list, and a style template.

To initialize the class, either call it with the filename of the ”.aux” file containing the relevant file locations (for the ”.bib” database files and the ”.bst” template files) or simply call it with a list of all filenames to be used (”.bib”, ”.bst” and ”.aux”). The output file (the LaTeX-formatted bibliography) is assumed to have the same filename root as the ”.aux” file, but with ”.bbl” as its extension.

Attributes

abbrevs dict The list of abbreviations given in the bibliography database file(s). The dictionary keys are the abbreviations, and the values are their full forms.
bibdata dict The database of bibliography entries and fields derived from parsing the bibliography database file(s).
bstdict dict The style template for formatting the bibliography. The dictionary keys are the entrytypes, with the dictionary values their string template.
citedict dict The dictionary of citation keys and their corresponding numerical order of citation.
debug bool Whether to turn on debugging features.
filedict dict The ditionary of filenames associated with the bibliographic data. The dictionary consists of keys bib, bst, aux, tex, and bbl. The first two are lists of filenames, while the others contain only a single filename.
filename str (For error messages and debugging) The name of the file currently being parsed.
i int (For error messages and debugging) The line of the file currently being parsed.
options dict The dictionary containing the various option settings from the style template (BST) files.
abbrevkey_pattern compiled regular expression object The regex used to search for abbreviation keys.
anybrace_pattern compiled regular expression object The regex used to search for curly braces { or }.
anybraceorquote_pattern compiled regular expression object The regex used to search for curly braces or for double-quotes, i.e. {, }, or .
endbrace_pattern compiled regular expression object The regex used to search for an ending curly brace, i.e. ‘}’.
quote_pattern compiled regular expression object The regex used to search for a double-quote, i.e. .
startbrace_pattern compiled regular expression object The regex used to search for a starting curly brace, {.

Methods

parse_bibfile(filename) Parse a ”.bib” file to generate a dictionary representing a bibliography database.
parse_bibentry(entrystr, entrytype) Given a string representing the entire contents of the BibTeX-format bibliography entry,
parse_bibfield(entrystr) For a given string representing the raw contents of a BibTeX-format bibliography entry,
parse_auxfile(filename[, debug]) Read in an ”.aux” file and convert the citation{} entries found there into a dictionary of citekeys and citation order number.
parse_bstfile(filename) Convert a Bibulous-type bibliography style template into a dictionary.
write_bblfile([filename, write_preamble, ...]) Given a bibliography database bibdata, a dictionary containing the citations called out citedict, and a bibliography style template bstdict write the LaTeX-format file for the formatted bibliography.
create_citation_list() Create the list of citation keys, sorted into the proper order.
format_bibitem(citekey[, debug]) Create the “ibitem{...}” string to insert into the ”.bbl” file.
generate_sortkey(citekey) From a bibliography entry and the formatting template options, generate a sorting key for the entry.
create_namelist(key, nametype) Deconstruct the bibfile string following “author = ...” (or “editor = ...”), and create a new field authorlist or editorlist that is a list of dictionaries (one dict for each person).
format_namelist(namelist, nametype) Format a list of dictionaries (one dict for each person) into a long string, with the format according to the directives in the bibliography style template.
insert_crossref_data(entrykey[, fieldname]) Insert crossref info into a bibliography database dictionary.
write_citeextract(outputfile[, debug]) Extract a sub-database from a large bibliography database, with the former containing only those entries cited in the .aux file.
write_authorextract(searchname[, ...]) Extract a sub-database from a large bibliography database, with the former containing only those entries citing the given author/editor.
replace_abbrevs_with_full(fieldstr, resultstr) Given an input str, locate the abbreviation key within it and replace the abbreviation with its full form.
create_citation_list()[source]

Create the list of citation keys, sorted into the proper order.

create_namelist(key, nametype)[source]

Deconstruct the bibfile string following “author = ...” (or “editor = ...”), and create a new field authorlist or editorlist that is a list of dictionaries (one dict for each person).

Parameters :

key : str

The key in bibdata defining the current entry being formatted.

nametype : str, {‘author’,’editor’}

Which bibliography field to use for parsing names.

format_bibitem(citekey, debug=False)[source]

Create the “ibitem{...}” string to insert into the ”.bbl” file.

This is the workhorse function of Bibulous. For a given key, find the resulting entry in the bibliography database. From the entry’s entrytype, lookup the relevant template in bstdict and start replacing template variables with formatted elements of the database entry. Once you’ve replaced all template variables, you’re done formatting that entry.

This function is also where we compile any scripts present in the BST files.

Parameters :

citekey : str

The citation key.

Returns :

itemstr : str

The string containing the ibitem{} citation key and LaTeX-formatted string for the formatted bibliography. (That is, this string is designed to be inserted directly into the LaTeX document.)

format_namelist(namelist, nametype)[source]

Format a list of dictionaries (one dict for each person) into a long string, with the format according to the directives in the bibliography style template.

Parameters :

namelist : str

The list of dictionaries containing all of the names to be formatted.

nametype : str, {‘author’, ‘editor’}

Whether the names are for authors or editors.

Returns :

namestr : str

The formatted form of the “name string”. This is generally a list of authors or list of editors.

generate_sortkey(citekey)[source]

From a bibliography entry and the formatting template options, generate a sorting key for the entry.

Parameters :

citekey : str

The key for the current entry.

Returns :

sortkey : str

A string to use as a sorting key.

insert_crossref_data(entrykey, fieldname=None)[source]

Insert crossref info into a bibliography database dictionary.

Loop through a bibliography database dictionary and, for each entry which has a “crossref” field, locate the crossref entry and insert any missing bibliographic information into the main entry’s fields.

Parameters :

entrykey : str

The key of the bibliography entry to query.

fieldname : str, optional

The name of the field to check. If fieldname==None, then check all fields.

Returns :

foundit : bool

Whether the function found a crossref for the queried field. If multiple fieldnames were input, then foundit will be True if a crossref is located for any one of them.

parse_auxfile(filename, debug=False)[source]

Read in an ”.aux” file and convert the citation{} entries found there into a dictionary of citekeys and citation order number.

Parameters :

filename : str

The filename of the ”.aux” file to parse.

parse_bibentry(entrystr, entrytype)[source]

Given a string representing the entire contents of the BibTeX-format bibliography entry, parse the contents and place them into the bibliography preamble string, the set of abbreviations, and the bibliography database dictionary.

Parameters :

entrystr : str

The string containing the entire contents of the bibliography entry.

entrytype : str

The type of entry (“article”, “preamble”, etc.).

parse_bibfield(entrystr)[source]

For a given string representing the raw contents of a BibTeX-format bibliography entry, parse the contents into a dictionary of key:value pairs corresponding to the field names and field values.

Parameters :

entrystr : str

The string containing the entire contents of the bibliography entry.

parse_bibfile(filename)[source]

Parse a ”.bib” file to generate a dictionary representing a bibliography database.

Parameters :

filename : str

The filename of the .bib file to parse.

parse_bstfile(filename)[source]

Convert a Bibulous-type bibliography style template into a dictionary.

The resulting dictionary consists of keys which are the various entrytypes, and values which are the template strings. In addition, any formatting options are stored in the “options” key as a dictionary of option_name:option_value pairs.

Parameters :

filename : str

The filename of the Bibulous style template to use.

replace_abbrevs_with_full(fieldstr, resultstr)

Given an input str, locate the abbreviation key within it and replace the abbreviation with its full form.

Once the abbreviation key is found, remove it from the “fieldstr” and add the full form to the “resultstr”.

Parameters :

fieldstr : str

The string to search for the abbrevation key.

resultstr : str

The thing to hold the abbreviation’s full form. (Note that it might not be empty on input.)

Returns :

fieldstr : str

The string to search for the abbrevation key.

resultstr : str

The thing to hold the abbreviation’s full form.

end_of_field : bool

Whether the abbreviation key was at the end of the current field.

write_authorextract(searchname, outputfile=None, debug=False)[source]

Extract a sub-database from a large bibliography database, with the former containing only those entries citing the given author/editor.

Parameters :

searchname : str or dict

The string or dictionary for the author’s name. This can be, for example, “Bugs E. Bunny” or {‘first’:’Bugs’, ‘last’:’Bunny’, ‘middle’:’E’}.

outputfile : str, optional

The filename of the extracted BIB file to write.

write_bblfile(filename=None, write_preamble=True, write_postamble=True, bibsize=None, verbose=False)[source]

Given a bibliography database bibdata, a dictionary containing the citations called out citedict, and a bibliography style template bstdict write the LaTeX-format file for the formatted bibliography.

Start with the preamble and then loop over the citations one by one, formatting each entry one at a time, and put end{thebibliography} at the end when done.

Parameters :

filename : str, optional

The filename of the ”.bbl” file to write. (Default is to take the AUX file and change its extension to ”.bbl”.)

write_preamble : bool, optional

Whether to write the preamble. (Setting this to False can be useful when writing the bibliography in separate steps, as in the testing suite.)

write_postamble : bool, optional

Whether to write the postamble. (Setting this to False can be useful when writing the bibliography in separate steps, as in the testing suite.)

bibsize : str, optional

A string the length of which is used to determine the label margin for the bibliography.

write_citeextract(outputfile, debug=False)[source]

Extract a sub-database from a large bibliography database, with the former containing only those entries cited in the .aux file.

Parameters :

filedict : str

The dictionary filenames must have keys “aux”, “bst”, and “bib”.

outputfile : str, optional

The filename to use for writing the extracted BIB file.