anchorman.main module

anchorman.main.annotate(text, elements, own_validator=[], config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Find and annotate elements in text.

Create an invaltree with elements and units of text, validate the rules to apply elements and augment the text with this result.

Parameters:
  • text (str) – The first parameter.
  • elements (list) – It is a list of element dicts like the following: {‘fox’: {‘value’: ‘/wiki/fox’, ‘data-type’: ‘animal’}}
  • own_validator (list) – A list of functions that will be applied in the validation of an element, if it will be applied in the text.
  • config (dict) – Load default config from etc/ or get_config the default config andd update to your own rules.
Returns:

text – The annotated text.

Return type:

str

Examples

Basic example with config overwrite:

>>> text = 'The quick brown fox jumps over the lazy dog.'
>>> elements = [
        {'fox': {
            'value': '/wiki/fox', 'data-type': 'animal'}},
        {'dog': {
            'value': '/wiki/dog', 'data-type': 'animal'}}]
>>> cfg = get_config()
>>> cfg['setting']['replaces_at_all'] = 1
>>> print annotate(text, elements, config=cfg)
'The quick brown <a href="/wiki/fox" data-type="animal">fox</a> jumps over the lazy dog .'
anchorman.main.clean(text, config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Remove elements from text.

Use config to identify elements.

Todo

Implement me ...clean(text .

Anchorman

turns your text into hypertext.