Auto docs
anchorman package
Subpackages
anchorman.generator package
Submodules
anchorman.generator.candidate module
-
anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
-
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]
Get all items / elements of the actual unit to validate.
-
anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]
Loop the units and validate each item in unit.
-
anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]
Apply the rules specified in setting to the item.
Take care of candidates already validated and the items already
added to this_unit.
Todo
check context of replacement: do not add links in links, or inline of overlapping elements, ...
replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...
anchorman.generator.element module
-
anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]
Create the element that will be inserted in the text.
-
anchorman.generator.element.create_element_pattern(mode, markup)[source]
Create the basic element pattern based on mode and markup.
anchorman.generator.highlight module
-
anchorman.generator.highlight.augment_highlight(highlight, item)[source]
Fill the base highlight element with data of the item.
-
anchorman.generator.highlight.create_highlight(highlight_markup)[source]
Use format to create a base highlight element.
anchorman.generator.tag module
-
anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]
Fill the base bs4tag element with data of the item.
-
anchorman.generator.tag.create_bs4tag(tag_markup)[source]
Use BeautifulSoup to create a base tag element.
anchorman.generator.text module
-
anchorman.generator.text.augment(text, to_be_applied)[source]
Augment the text with the elements in to be applied.
Module contents
anchorman.positioner package
Submodules
anchorman.positioner.interval module
-
anchorman.positioner.interval.intervals(text, elements, setting)[source]
From the slices of elements and units create an intervaltree.
-
anchorman.positioner.interval.to_intervaltree(data, t=None)[source]
Create an intervaltree of all elements (elements, units, ...).
-
anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]
Loop the intervaltree to get the text unit interval items.
anchorman.positioner.slices module
-
anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]
Get slices of all elements in text.
-
anchorman.positioner.slices.unit_slices(text, text_unit)[source]
Get slices of the text units specified in setting.
Module contents
Submodules
anchorman.main module
-
anchorman.main.annotate(text, elements, own_validator=[], config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]
Find and annotate elements in text.
Create an invaltree with elements and units of text, validate the rules
to apply elements and augment the text with this result.
Parameters: |
- text (str) – The first parameter.
- elements (list) – It is a list of element dicts like the following:
{‘fox’: {‘value’: ‘/wiki/fox’, ‘data-type’: ‘animal’}}
- own_validator (list) – A list of functions that will be applied in the
validation of an element, if it will be applied in the text.
- config (dict) – Load default config from etc/ or get_config the default
config andd update to your own rules.
|
Returns: | text – The annotated text.
|
Return type: | str
|
Examples
Basic example with config overwrite:
>>> text = 'The quick brown fox jumps over the lazy dog.'
>>> elements = [
{'fox': {
'value': '/wiki/fox', 'data-type': 'animal'}},
{'dog': {
'value': '/wiki/dog', 'data-type': 'animal'}}]
>>> cfg = get_config()
>>> cfg['setting']['replaces_at_all'] = 1
>>> print annotate(text, elements, config=cfg)
'The quick brown <a href="/wiki/fox" data-type="animal">fox</a> jumps over the lazy dog .'
-
anchorman.main.clean(text, config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]
Remove elements from text.
Use config to identify elements.
Todo
Implement me ...clean(text .
anchorman.utils module
Module contents
anchorman.generator package
Submodules
anchorman.generator.candidate module
-
anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
-
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]
Get all items / elements of the actual unit to validate.
-
anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]
Loop the units and validate each item in unit.
-
anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]
Apply the rules specified in setting to the item.
Take care of candidates already validated and the items already
added to this_unit.
Todo
check context of replacement: do not add links in links, or inline of overlapping elements, ...
replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...
anchorman.generator.element module
-
anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]
Create the element that will be inserted in the text.
-
anchorman.generator.element.create_element_pattern(mode, markup)[source]
Create the basic element pattern based on mode and markup.
anchorman.generator.highlight module
-
anchorman.generator.highlight.augment_highlight(highlight, item)[source]
Fill the base highlight element with data of the item.
-
anchorman.generator.highlight.create_highlight(highlight_markup)[source]
Use format to create a base highlight element.
anchorman.generator.tag module
-
anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]
Fill the base bs4tag element with data of the item.
-
anchorman.generator.tag.create_bs4tag(tag_markup)[source]
Use BeautifulSoup to create a base tag element.
anchorman.generator.text module
-
anchorman.generator.text.augment(text, to_be_applied)[source]
Augment the text with the elements in to be applied.
Module contents
anchorman.generator.candidate module
-
anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
-
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]
Get all items / elements of the actual unit to validate.
-
anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]
Loop the units and validate each item in unit.
-
anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]
Apply the rules specified in setting to the item.
Take care of candidates already validated and the items already
added to this_unit.
Todo
check context of replacement: do not add links in links, or inline of overlapping elements, ...
replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...
anchorman.generator.element module
-
anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]
Create the element that will be inserted in the text.
-
anchorman.generator.element.create_element_pattern(mode, markup)[source]
Create the basic element pattern based on mode and markup.
anchorman.generator.highlight module
-
anchorman.generator.highlight.augment_highlight(highlight, item)[source]
Fill the base highlight element with data of the item.
-
anchorman.generator.highlight.create_highlight(highlight_markup)[source]
Use format to create a base highlight element.
anchorman.generator.tag module
-
anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]
Fill the base bs4tag element with data of the item.
-
anchorman.generator.tag.create_bs4tag(tag_markup)[source]
Use BeautifulSoup to create a base tag element.
anchorman.generator.text module
-
anchorman.generator.text.augment(text, to_be_applied)[source]
Augment the text with the elements in to be applied.
anchorman.main module
-
anchorman.main.annotate(text, elements, own_validator=[], config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]
Find and annotate elements in text.
Create an invaltree with elements and units of text, validate the rules
to apply elements and augment the text with this result.
Parameters: |
- text (str) – The first parameter.
- elements (list) – It is a list of element dicts like the following:
{‘fox’: {‘value’: ‘/wiki/fox’, ‘data-type’: ‘animal’}}
- own_validator (list) – A list of functions that will be applied in the
validation of an element, if it will be applied in the text.
- config (dict) – Load default config from etc/ or get_config the default
config andd update to your own rules.
|
Returns: | text – The annotated text.
|
Return type: | str
|
Examples
Basic example with config overwrite:
>>> text = 'The quick brown fox jumps over the lazy dog.'
>>> elements = [
{'fox': {
'value': '/wiki/fox', 'data-type': 'animal'}},
{'dog': {
'value': '/wiki/dog', 'data-type': 'animal'}}]
>>> cfg = get_config()
>>> cfg['setting']['replaces_at_all'] = 1
>>> print annotate(text, elements, config=cfg)
'The quick brown <a href="/wiki/fox" data-type="animal">fox</a> jumps over the lazy dog .'
-
anchorman.main.clean(text, config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]
Remove elements from text.
Use config to identify elements.
Todo
Implement me ...clean(text .
anchorman.positioner package
Submodules
anchorman.positioner.interval module
-
anchorman.positioner.interval.intervals(text, elements, setting)[source]
From the slices of elements and units create an intervaltree.
-
anchorman.positioner.interval.to_intervaltree(data, t=None)[source]
Create an intervaltree of all elements (elements, units, ...).
-
anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]
Loop the intervaltree to get the text unit interval items.
anchorman.positioner.slices module
-
anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]
Get slices of all elements in text.
-
anchorman.positioner.slices.unit_slices(text, text_unit)[source]
Get slices of the text units specified in setting.
Module contents
anchorman.positioner.interval module
-
anchorman.positioner.interval.intervals(text, elements, setting)[source]
From the slices of elements and units create an intervaltree.
-
anchorman.positioner.interval.to_intervaltree(data, t=None)[source]
Create an intervaltree of all elements (elements, units, ...).
-
anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]
Loop the intervaltree to get the text unit interval items.
anchorman.positioner.slices module
-
anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]
Get slices of all elements in text.
-
anchorman.positioner.slices.unit_slices(text, text_unit)[source]
Get slices of the text units specified in setting.
anchorman.utils module
anchorman
anchorman package
Subpackages
anchorman.generator package
Submodules
anchorman.generator.candidate module
-
anchorman.generator.candidate.data_val(item, replaces_per_attribute)[source]
-
anchorman.generator.candidate.elements_of_unit(intervaltree, unit, setting)[source]
Get all items / elements of the actual unit to validate.
-
anchorman.generator.candidate.retrieve_hits(intervaltree, units, config, own_validator)[source]
Loop the units and validate each item in unit.
-
anchorman.generator.candidate.validate(item, candidates, this_unit, setting, own_validator)[source]
Apply the rules specified in setting to the item.
Take care of candidates already validated and the items already
added to this_unit.
Todo
check context of replacement: do not add links in links, or inline of overlapping elements, ...
replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...
anchorman.generator.element module
-
anchorman.generator.element.create_element(element_pattern, item, mode, markup)[source]
Create the element that will be inserted in the text.
-
anchorman.generator.element.create_element_pattern(mode, markup)[source]
Create the basic element pattern based on mode and markup.
anchorman.generator.highlight module
-
anchorman.generator.highlight.augment_highlight(highlight, item)[source]
Fill the base highlight element with data of the item.
-
anchorman.generator.highlight.create_highlight(highlight_markup)[source]
Use format to create a base highlight element.
anchorman.generator.tag module
-
anchorman.generator.tag.augment_bs4tag(bs4tag, item, tag_markup)[source]
Fill the base bs4tag element with data of the item.
-
anchorman.generator.tag.create_bs4tag(tag_markup)[source]
Use BeautifulSoup to create a base tag element.
anchorman.generator.text module
-
anchorman.generator.text.augment(text, to_be_applied)[source]
Augment the text with the elements in to be applied.
Module contents
anchorman.positioner package
Submodules
anchorman.positioner.interval module
-
anchorman.positioner.interval.intervals(text, elements, setting)[source]
From the slices of elements and units create an intervaltree.
-
anchorman.positioner.interval.to_intervaltree(data, t=None)[source]
Create an intervaltree of all elements (elements, units, ...).
-
anchorman.positioner.interval.unit_intervals(intervaltree, text_unit)[source]
Loop the intervaltree to get the text unit interval items.
anchorman.positioner.slices module
-
anchorman.positioner.slices.element_slices(text, elements, element_identifier)[source]
Get slices of all elements in text.
-
anchorman.positioner.slices.unit_slices(text, text_unit)[source]
Get slices of the text units specified in setting.
Module contents
Submodules
anchorman.main module
-
anchorman.main.annotate(text, elements, own_validator=[], config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]
Find and annotate elements in text.
Create an invaltree with elements and units of text, validate the rules
to apply elements and augment the text with this result.
Parameters: |
- text (str) – The first parameter.
- elements (list) – It is a list of element dicts like the following:
{‘fox’: {‘value’: ‘/wiki/fox’, ‘data-type’: ‘animal’}}
- own_validator (list) – A list of functions that will be applied in the
validation of an element, if it will be applied in the text.
- config (dict) – Load default config from etc/ or get_config the default
config andd update to your own rules.
|
Returns: | text – The annotated text.
|
Return type: | str
|
Examples
Basic example with config overwrite:
>>> text = 'The quick brown fox jumps over the lazy dog.'
>>> elements = [
{'fox': {
'value': '/wiki/fox', 'data-type': 'animal'}},
{'dog': {
'value': '/wiki/dog', 'data-type': 'animal'}}]
>>> cfg = get_config()
>>> cfg['setting']['replaces_at_all'] = 1
>>> print annotate(text, elements, config=cfg)
'The quick brown <a href="/wiki/fox" data-type="animal">fox</a> jumps over the lazy dog .'
-
anchorman.main.clean(text, config={'markup': {'highlight': {'pre': '${{', 'post': '}}'}, 'tag': {'attributes': ['style color:blue;cursor:pointer;', 'class anchorman'], 'value_key': 'href', 'tag': 'a', 'exclude_keys': ['score']}, 'coreferencer': {'attribute': 'token', 'value_key': 'text'}}, 'setting': {'mode': 'tag', 'element_identifier': 'entity', 'text_unit': {'name': 'html-paragraph', 'key': 'p', 'number_of_items': None}, 'longest_match_first': True, 'replaces_at_all': None, 'case_sensitive': True}})[source]
Remove elements from text.
Use config to identify elements.
Todo
Implement me ...clean(text .
anchorman.utils module
Module contents
Uml view
Danger
Beware killer rabbits!
************* Module anchorman.configuration
C: 10, 0: Line too long (85/80) (line-too-long)
C: 13, 0: Line too long (86/80) (line-too-long)
************* Module anchorman.main
C: 38, 0: Line too long (112/80) (line-too-long)
W: 8, 0: Dangerous default value [] as argument (dangerous-default-value)
************* Module anchorman.utils
C: 14, 0: Line too long (121/80) (line-too-long)
C: 17, 0: Final newline missing (missing-final-newline)
************* Module anchorman.generator.candidate
C: 97, 0: Line too long (82/80) (line-too-long)
C: 5, 0: Empty function docstring (empty-docstring)
R: 12, 0: Too many local variables (18/15) (too-many-locals)
************* Module anchorman.generator.text
C: 7, 4: Invalid variable name "x" (invalid-name)
************* Module anchorman.positioner.interval
C: 6, 0: Invalid argument name "t" (invalid-name)
Report
184 statements analysed.
Statistics by type
type |
number |
old number |
difference |
%documented |
%badname |
module |
13 |
13 |
= |
30.77 |
0.00 |
class |
0 |
0 |
= |
0 |
0 |
method |
0 |
0 |
= |
0 |
0 |
function |
18 |
17 |
+1.00 |
94.44 |
0.00 |
External dependencies
anchorman
\-configuration (anchorman.main)
\-generator
| \-candidate (anchorman.main)
| \-element (anchorman.generator.candidate)
| \-highlight (anchorman.generator.element)
| \-tag (anchorman.generator.element)
| \-text (anchorman.main)
\-positioner
\-interval (anchorman.main)
bs4 (anchorman.generator.tag,anchorman.positioner.slices)
intervaltree (anchorman.positioner.interval)
yaml (anchorman.configuration)
Raw metrics
type |
number |
% |
previous |
difference |
code |
200 |
50.38 |
188 |
+12.00 |
docstring |
54 |
13.60 |
57 |
-3.00 |
comment |
41 |
10.33 |
72 |
-31.00 |
empty |
102 |
25.69 |
108 |
-6.00 |
Duplication
|
now |
previous |
difference |
nb duplicated lines |
0 |
0 |
= |
percent duplicated lines |
0.000 |
0.000 |
= |
Messages by category
type |
number |
previous |
difference |
convention |
9 |
10 |
-1.00 |
refactor |
1 |
1 |
= |
warning |
1 |
1 |
= |
error |
0 |
0 |
= |
% errors / warnings by module
module |
error |
warning |
refactor |
convention |
anchorman.main |
0.00 |
100.00 |
0.00 |
11.11 |
anchorman.generator.candidate |
0.00 |
0.00 |
100.00 |
22.22 |
anchorman.utils |
0.00 |
0.00 |
0.00 |
22.22 |
anchorman.configuration |
0.00 |
0.00 |
0.00 |
22.22 |
anchorman.positioner.interval |
0.00 |
0.00 |
0.00 |
11.11 |
anchorman.generator.text |
0.00 |
0.00 |
0.00 |
11.11 |
Messages
message id |
occurrences |
line-too-long |
5 |
invalid-name |
2 |
too-many-locals |
1 |
missing-final-newline |
1 |
empty-docstring |
1 |
dangerous-default-value |
1 |
Global evaluation
Your code has been rated at 9.40/10 (previous run: 9.31/10, +0.09)
Todo list
Todo
check context of replacement: do not add links in links, or inline of overlapping elements, ...
replace only one item of an entity > e.g. A. Merkel, Mum Merkel, ...
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/anchorman/checkouts/latest/anchorman/generator/candidate.py:docstring of anchorman.generator.candidate.validate, line 6.)
Todo
Implement me ...clean(text .
(The original entry is located in /home/docs/checkouts/readthedocs.org/user_builds/anchorman/checkouts/latest/anchorman/main.py:docstring of anchorman.main.clean, line 5.)
Credits and contributions
We published this at github and pypi to provide our solution to others, to get feedback and find contributers in the open source.
Thanks Tarn Barford for inspiration and first steps.