Auto docs

anchorman package

Subpackages

anchorman.generator package

Submodules
anchorman.generator.candidate module
anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]

Get all items / elements of the actual unit to validate.

anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]

Loop the units and validate each item in unit.

anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]

Apply the rules specified in setting to the item.

Take care of candidates already validated and the items already added to this_unit.

Todo

check context of replacement: do not add links in links, or inline of overlapping elements, ... replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...

anchorman.generator.element module
anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]

Create the element that will be inserted in the text.

anchorman.generator.element.create_element_pattern(mode, markup)[source]

Create the basic element pattern based on mode and markup.

anchorman.generator.highlight module
anchorman.generator.highlight.augment_highlight(highlight, item)[source]

Fill the base highlight element with data of the item.

anchorman.generator.highlight.create_highlight(highlight_markup)[source]

Use format to create a base highlight element.

anchorman.generator.tag module
anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]

Fill the base bs4tag element with data of the item.

anchorman.generator.tag.create_bs4tag(tag_markup)[source]

Use BeautifulSoup to create a base tag element.

anchorman.generator.text module
anchorman.generator.text.augment(text, to_be_applied)[source]

Augment the text with the elements in to be applied.

Module contents

anchorman.positioner package

Submodules
anchorman.positioner.interval module
anchorman.positioner.interval.intervals(text, elements, setting)[source]

From the slices of elements and units create an intervaltree.

anchorman.positioner.interval.to_intervaltree(data, t=None)[source]

Create an intervaltree of all elements (elements, units, ...).

anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]

Loop the intervaltree to get the text unit interval items.

anchorman.positioner.slices module
anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]

Get slices of all elements in text.

anchorman.positioner.slices.unit_slices(text, text_unit)[source]

Get slices of the text units specified in setting.

Module contents

Submodules

anchorman.configure module

anchorman.configure.get_config(project_conf=True)

Load default configuration.

anchorman.main module

anchorman.main.annotate(text, elements, own_validator=[], config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Find and annotate elements in text.

Create an invaltree with elements and units of text, validate the rules to apply elements and augment the text with this result.

Parameters:
  • text (str) – The first parameter.
  • elements (list) – It is a list of element dicts like the following: {‘fox’: {‘value’: ‘/wiki/fox’, ‘data-type’: ‘animal’}}
  • own_validator (list) – A list of functions that will be applied in the validation of an element, if it will be applied in the text.
  • config (dict) – Load default config from etc/ or get_config the default config andd update to your own rules.
Returns:

text – The annotated text.

Return type:

str

Examples

Basic example with config overwrite:

>>> text = 'The quick brown fox jumps over the lazy dog.'
>>> elements = [
        {'fox': {
            'value': '/wiki/fox', 'data-type': 'animal'}},
        {'dog': {
            'value': '/wiki/dog', 'data-type': 'animal'}}]
>>> cfg = get_config()
>>> cfg['setting']['replaces_at_all'] = 1
>>> print annotate(text, elements, config=cfg)
'The quick brown <a href="/wiki/fox" data-type="animal">fox</a> jumps over the lazy dog .'
anchorman.main.clean(text, config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Remove elements from text.

Use config to identify elements.

Todo

Implement me ...clean(text .

anchorman.utils module

Module contents

anchorman.configure module

anchorman.configure.get_config(project_conf=True)

Load default configuration.

anchorman.generator package

Submodules

anchorman.generator.candidate module

anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]

Get all items / elements of the actual unit to validate.

anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]

Loop the units and validate each item in unit.

anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]

Apply the rules specified in setting to the item.

Take care of candidates already validated and the items already added to this_unit.

Todo

check context of replacement: do not add links in links, or inline of overlapping elements, ... replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...

anchorman.generator.element module

anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]

Create the element that will be inserted in the text.

anchorman.generator.element.create_element_pattern(mode, markup)[source]

Create the basic element pattern based on mode and markup.

anchorman.generator.highlight module

anchorman.generator.highlight.augment_highlight(highlight, item)[source]

Fill the base highlight element with data of the item.

anchorman.generator.highlight.create_highlight(highlight_markup)[source]

Use format to create a base highlight element.

anchorman.generator.tag module

anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]

Fill the base bs4tag element with data of the item.

anchorman.generator.tag.create_bs4tag(tag_markup)[source]

Use BeautifulSoup to create a base tag element.

anchorman.generator.text module

anchorman.generator.text.augment(text, to_be_applied)[source]

Augment the text with the elements in to be applied.

Module contents

anchorman.generator.candidate module

anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]

Get all items / elements of the actual unit to validate.

anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]

Loop the units and validate each item in unit.

anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]

Apply the rules specified in setting to the item.

Take care of candidates already validated and the items already added to this_unit.

Todo

check context of replacement: do not add links in links, or inline of overlapping elements, ... replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...

anchorman.generator.element module

anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]

Create the element that will be inserted in the text.

anchorman.generator.element.create_element_pattern(mode, markup)[source]

Create the basic element pattern based on mode and markup.

anchorman.generator.highlight module

anchorman.generator.highlight.augment_highlight(highlight, item)[source]

Fill the base highlight element with data of the item.

anchorman.generator.highlight.create_highlight(highlight_markup)[source]

Use format to create a base highlight element.

anchorman.generator.tag module

anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]

Fill the base bs4tag element with data of the item.

anchorman.generator.tag.create_bs4tag(tag_markup)[source]

Use BeautifulSoup to create a base tag element.

anchorman.generator.text module

anchorman.generator.text.augment(text, to_be_applied)[source]

Augment the text with the elements in to be applied.

anchorman.main module

anchorman.main.annotate(text, elements, own_validator=[], config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Find and annotate elements in text.

Create an invaltree with elements and units of text, validate the rules to apply elements and augment the text with this result.

Parameters:
  • text (str) – The first parameter.
  • elements (list) – It is a list of element dicts like the following: {‘fox’: {‘value’: ‘/wiki/fox’, ‘data-type’: ‘animal’}}
  • own_validator (list) – A list of functions that will be applied in the validation of an element, if it will be applied in the text.
  • config (dict) – Load default config from etc/ or get_config the default config andd update to your own rules.
Returns:

text – The annotated text.

Return type:

str

Examples

Basic example with config overwrite:

>>> text = 'The quick brown fox jumps over the lazy dog.'
>>> elements = [
        {'fox': {
            'value': '/wiki/fox', 'data-type': 'animal'}},
        {'dog': {
            'value': '/wiki/dog', 'data-type': 'animal'}}]
>>> cfg = get_config()
>>> cfg['setting']['replaces_at_all'] = 1
>>> print annotate(text, elements, config=cfg)
'The quick brown <a href="/wiki/fox" data-type="animal">fox</a> jumps over the lazy dog .'
anchorman.main.clean(text, config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Remove elements from text.

Use config to identify elements.

Todo

Implement me ...clean(text .

anchorman.positioner package

Submodules

anchorman.positioner.interval module

anchorman.positioner.interval.intervals(text, elements, setting)[source]

From the slices of elements and units create an intervaltree.

anchorman.positioner.interval.to_intervaltree(data, t=None)[source]

Create an intervaltree of all elements (elements, units, ...).

anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]

Loop the intervaltree to get the text unit interval items.

anchorman.positioner.slices module

anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]

Get slices of all elements in text.

anchorman.positioner.slices.unit_slices(text, text_unit)[source]

Get slices of the text units specified in setting.

Module contents

anchorman.positioner.interval module

anchorman.positioner.interval.intervals(text, elements, setting)[source]

From the slices of elements and units create an intervaltree.

anchorman.positioner.interval.to_intervaltree(data, t=None)[source]

Create an intervaltree of all elements (elements, units, ...).

anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]

Loop the intervaltree to get the text unit interval items.

anchorman.positioner.slices module

anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]

Get slices of all elements in text.

anchorman.positioner.slices.unit_slices(text, text_unit)[source]

Get slices of the text units specified in setting.

anchorman.utils module

anchorman

anchorman package

Subpackages

anchorman.generator package
Submodules
anchorman.generator.candidate module
anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]

Get all items / elements of the actual unit to validate.

anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]

Loop the units and validate each item in unit.

anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]

Apply the rules specified in setting to the item.

Take care of candidates already validated and the items already added to this_unit.

Todo

check context of replacement: do not add links in links, or inline of overlapping elements, ... replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...

anchorman.generator.element module
anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]

Create the element that will be inserted in the text.

anchorman.generator.element.create_element_pattern(mode, markup)[source]

Create the basic element pattern based on mode and markup.

anchorman.generator.highlight module
anchorman.generator.highlight.augment_highlight(highlight, item)[source]

Fill the base highlight element with data of the item.

anchorman.generator.highlight.create_highlight(highlight_markup)[source]

Use format to create a base highlight element.

anchorman.generator.tag module
anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]

Fill the base bs4tag element with data of the item.

anchorman.generator.tag.create_bs4tag(tag_markup)[source]

Use BeautifulSoup to create a base tag element.

anchorman.generator.text module
anchorman.generator.text.augment(text, to_be_applied)[source]

Augment the text with the elements in to be applied.

Module contents
anchorman.positioner package
Submodules
anchorman.positioner.interval module
anchorman.positioner.interval.intervals(text, elements, setting)[source]

From the slices of elements and units create an intervaltree.

anchorman.positioner.interval.to_intervaltree(data, t=None)[source]

Create an intervaltree of all elements (elements, units, ...).

anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]

Loop the intervaltree to get the text unit interval items.

anchorman.positioner.slices module
anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]

Get slices of all elements in text.

anchorman.positioner.slices.unit_slices(text, text_unit)[source]

Get slices of the text units specified in setting.

Module contents

Submodules

anchorman.configure module
anchorman.configure.get_config(project_conf=True)

Load default configuration.

anchorman.main module
anchorman.main.annotate(text, elements, own_validator=[], config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Find and annotate elements in text.

Create an invaltree with elements and units of text, validate the rules to apply elements and augment the text with this result.

Parameters:
  • text (str) – The first parameter.
  • elements (list) – It is a list of element dicts like the following: {‘fox’: {‘value’: ‘/wiki/fox’, ‘data-type’: ‘animal’}}
  • own_validator (list) – A list of functions that will be applied in the validation of an element, if it will be applied in the text.
  • config (dict) – Load default config from etc/ or get_config the default config andd update to your own rules.
Returns:

text – The annotated text.

Return type:

str

Examples

Basic example with config overwrite:

>>> text = 'The quick brown fox jumps over the lazy dog.'
>>> elements = [
        {'fox': {
            'value': '/wiki/fox', 'data-type': 'animal'}},
        {'dog': {
            'value': '/wiki/dog', 'data-type': 'animal'}}]
>>> cfg = get_config()
>>> cfg['setting']['replaces_at_all'] = 1
>>> print annotate(text, elements, config=cfg)
'The quick brown <a href="/wiki/fox" data-type="animal">fox</a> jumps over the lazy dog .'
anchorman.main.clean(text, config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]

Remove elements from text.

Use config to identify elements.

Todo

Implement me ...clean(text .

anchorman.utils module

Module contents

Uml view

digraph "packages_anchorman" {
charset="utf-8"
rankdir=TB
ratio = auto
# rankdir=LR
"1" [label="{anchorman.configuration|get_config(project_conf=True)\l}", shape=record];
"2" [label="anchorman.generator", shape="box"];

"3" [label="{anchorman.generator.candidate|validate(item, candidates, this_unit, setting, own_validator)\nelements_of_unit(intervaltree, unit, setting)\lretrieve_hits(intervaltree, units, config, own_validator)\l}", shape=record];

"4" [label="{anchorman.generator.element|create_element_pattern(mode, markup)\lcreate_element(element_pattern, item, mode, markup)\l}", shape=record];

"5" [label="{anchorman.generator.highlight|augment_highlight(highlight, item)\lcreate_highlight(highlight_markup)}", shape=record];

"6" [label="{anchorman.generator.tag|augment_bs4tag(bs4tag, item, tag_markup)\lcreate_bs4tag(tag_markup)\l}", shape=record];

"7" [label="{anchorman.generator.text|augment(text, to_be_applied)\l}",  shape=record];

"8" [label=<<table BORDER="1" CELLBORDER="0" CELLSPACING="0" CELLPADDING="4"><tr><td ALIGN="CENTER" BGCOLOR="#dddddd" HREF="/indext.html" FACE="times-bold">anchorman.main</td></tr><tr><td ALIGN="LEFT">annotate(text, elements)</td></tr></table>>, shape=plaintext];

"9" [label="anchorman.positioner", shape="box"];

"10" [label="{anchorman.positioner.interval|to_intervaltree(data, t=None)\lunit_intervals(intervaltree, text_unit)\lintervals(text, elements, setting)\l}", shape=record];


"11" [label="{anchorman.positioner.slices|element_slices(text, elements, element_identifier)\lunit_slices(text, text_unit)\l}", shape=record];

#"12" [label="anchorman.utils", shape="box"];

"8" -> "9" [arrowhead="open", arrowtail="none"];
"8" -> "2" [arrowhead="open", arrowtail="none"];
"3" -> "4" [arrowhead="open", arrowtail="none"];
"4" -> "5" [arrowhead="open", arrowtail="none"];
"4" -> "6" [arrowhead="open", arrowtail="none"];
"8" -> "1" [arrowhead="open", arrowtail="none"];
"2" -> "3" [arrowhead="open", arrowtail="none"];
"2" -> "7" [arrowhead="open", arrowtail="none"];
"9" -> "10" [arrowhead="open", arrowtail="none"];
"10" -> "11" [arrowhead="open", arrowtail="none"];
{rank=same; "1"; "8";}
}

Danger

Beware killer rabbits!

************* Module anchorman.configuration
C: 10, 0: Line too long (85/80) (line-too-long)
C: 13, 0: Line too long (86/80) (line-too-long)
************* Module anchorman.main
C: 38, 0: Line too long (112/80) (line-too-long)
W:  8, 0: Dangerous default value [] as argument (dangerous-default-value)
************* Module anchorman.utils
C: 14, 0: Line too long (121/80) (line-too-long)
C: 17, 0: Final newline missing (missing-final-newline)
************* Module anchorman.generator.candidate
C: 97, 0: Line too long (82/80) (line-too-long)
C:  5, 0: Empty function docstring (empty-docstring)
R: 12, 0: Too many local variables (18/15) (too-many-locals)
************* Module anchorman.generator.text
C:  7, 4: Invalid variable name "x" (invalid-name)
************* Module anchorman.positioner.interval
C:  6, 0: Invalid argument name "t" (invalid-name)

Report

184 statements analysed.

Statistics by type

type number old number difference %documented %badname
module 13 13 = 30.77 0.00
class 0 0 = 0 0
method 0 0 = 0 0
function 18 17 +1.00 94.44 0.00

External dependencies

anchorman
  \-configuration (anchorman.main)
  \-generator
  | \-candidate (anchorman.main)
  | \-element (anchorman.generator.candidate)
  | \-highlight (anchorman.generator.element)
  | \-tag (anchorman.generator.element)
  | \-text (anchorman.main)
  \-positioner
    \-interval (anchorman.main)
bs4 (anchorman.generator.tag,anchorman.positioner.slices)
intervaltree (anchorman.positioner.interval)
yaml (anchorman.configuration)

Raw metrics

type number % previous difference
code 200 50.38 188 +12.00
docstring 54 13.60 57 -3.00
comment 41 10.33 72 -31.00
empty 102 25.69 108 -6.00

Duplication

  now previous difference
nb duplicated lines 0 0 =
percent duplicated lines 0.000 0.000 =

Messages by category

type number previous difference
convention 9 10 -1.00
refactor 1 1 =
warning 1 1 =
error 0 0 =

% errors / warnings by module

module error warning refactor convention
anchorman.main 0.00 100.00 0.00 11.11
anchorman.generator.candidate 0.00 0.00 100.00 22.22
anchorman.utils 0.00 0.00 0.00 22.22
anchorman.configuration 0.00 0.00 0.00 22.22
anchorman.positioner.interval 0.00 0.00 0.00 11.11
anchorman.generator.text 0.00 0.00 0.00 11.11

Messages

message id occurrences
line-too-long 5
invalid-name 2
too-many-locals 1
missing-final-newline 1
empty-docstring 1
dangerous-default-value 1

Global evaluation

Your code has been rated at 9.40/10 (previous run: 9.31/10, +0.09)

Todo list

Todo

check context of replacement: do not add links in links, or inline of overlapping elements, ... replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/anchorman/checkouts/latest/anchorman/generator/candidate.py:docstring of anchorman.generator.candidate.validate, line 6.)

Todo

Implement me ...clean(text .

(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/anchorman/checkouts/latest/anchorman/main.py:docstring of anchorman.main.clean, line 5.)

Credits and contributions

We published this at github and pypi to provide our solution to others, to get feedback and find contributers in the open source.

Thanks Tarn Barford for inspiration and first steps.

Indices and tables

Anchorman

turns your text into hypertext.