Ontology Class
Working With Ontologies¶
skills-ml is introducing the CompetencyOntology class, for a rich, flexible representation of competencies, occupations, and their relationships with each other. The CompetencyOntology class is backed by JSON-LD, and based on Credential Engine's CTDL-ASN format for Competencies. The goal is to be able to read in any CTDL-ASN framework and produce a CompetencyOntology object for use throughout the skills-ml library.
Furthermore, skills-ml contains pre-mapped versions of open frameworks like ONET for use out of the box.
Competency¶
A competency, in the CTDL-ASN context, refers some knowledge, skill, or ability that a person can possess or learn. Each competency contains:
- A unique identifier within the ontology. If you're familiar with ONET, think the table of contents identifiers (e.g. '1.A.1.a.1')
- Some basic textual information: a name (e.g. Oral Comprehension) and/or description (e.g. 'The ability to listen to and understand information and ideas presented through spoken words and sentences.'), , and maybe a general textual category (e.g. Ability)
- Associative information with other competencies. A basic example is a parent/child relationship, for instance ONET's definition of 'Oral Comprehension' as the child of another competency called 'Verbal Abilities'. CTDL-ASN encodes this using the 'hasChild' and 'isChildOf' properties, and this is used in skills-ml. There many other types of associations competencies can have with each other that the Competency class in skills-ml does not yet address, you can read more at the Credential Engine's definition ofCompetency.
The Competency class tracks all of this. It can be created using either keyword arguments in the class' Constructor or through a class method that loads from JSON-LD.
Basic Example¶
Using Python Constructor
from skills_ml.ontologies import Competency
dinosaur_riding = Competency(
identifier='12345',
name='Dinosaur Riding',
description='Using the back of a dinosaur for transportation'
)
Using JSON-LD
from skills_ml.ontologies import Competency
dinosaur_riding = Competency.from_jsonld({
'@type': 'Competency',
'@id': '12345',
'name': 'Dinosaur Riding',
'description': 'Using the back of a dinosaur for transportation'
})
To aid in bi-directional searching, the Competency object is meant to include a parent/child relationshiop on both the parent and child objects. The add_parent and add_child methods modify both the parent and child objects to easily maintain this bi-directional relationship.
Example parent/child relationship¶
Using Python Constructor
from skills_ml.ontologies import Competency
dinosaur_riding = Competency(
identifier='12345',
name='Dinosaur Riding',
description='Using the back of a dinosaur for transportation'
)
extreme_transportation = Competency(
identifier='123',
name='Extreme Transportation',
description='Comically dangerous forms of transportation'
)
dinosaur_riding.add_parent(extreme_transportation)
print(dinosaur_riding.parents)
print(extreme_transportation.children)
Using JSON-LD
dinosaur_riding = Competency.from_jsonld({
'@type': 'Competency',
'@id': '12345',
'name': 'Dinosaur Riding',
'description': 'Using the back of a dinosaur for transportation',
'isChildOf': [{'@type': 'Competency', '@id': '123'}]
})
extreme_transportation = Competency.from_jsonld({
'@type': 'Competency',
'@id': '123',
'name': 'Extreme Transportation',
'description': 'Comically dangerous forms of transportation',
'hasChild': [{'@type': 'Competency', '@id': '12345'}]
Occupation¶
An Occupation is a job or profession that a person can hold. CTDL-ASN does not define this, so skills-ml models the Occupation similarly to the Competency, albeit with far less detail.
- A unique identifier within the ontology. If you're familiar with ONET, think of an ONET SOC code (11-1011.00)
- Some basic textual information: a name (e.g. Civil Engineer), maybe a description.
- Associative information with other occupations. So far the only relationship modeled in skills-ml between occupations is a parent/child one, similarly to Competency. Going back to the ONET example, an occupation representing the major group (identifier 11) may be thought of as the parent of SOC code 11-1011.00.
Basic Example¶
Using Python Constructor
from skills_ml.ontologies import Occupation
dinosaur_rider = Occupation(
identifier='9999',
name='Dinosaur Rider',
)
Using JSON-LD
from skills_ml.ontologies import Occupation
dinosaur_rider = Occupation.from_jsonld({
'@type': 'Occupation',
'@id': '9999',
'name': 'Dinosaur Rider'
})
CompetencyOccupationEdge¶
A CompetencyOccupationEdge is simply a relationship between a Competency and an Occupation. Currently, tthere are no further properties defined on this edge, though this will likely change in the future.
Basic Example¶
Using Python Constructor
from skills_ml.ontologies import CompetencyOccupationEdge
CompetencyOccupationEdge(
occupation=dinosaur_rider,
competency=dinosaur_riding
)
Using JSON-LD
from skills_ml.ontologies import CompetencyOccupationEdge
CompetencyOccupationEdge.from_jsonld({
'@type': 'CompetencyOccupationEdge',
'@id': 'competency=12345;occupation=9999',
'competency': {'@type': 'Competency', '@id': '12345'},
'occupation': {'@type': 'Occupation', '@id': '9999'}
})
CompetencyFramework¶
A CompetencyFramework
represent a collection of competencies and some metadata about them. The identifiers for given Competencies are used to disambiguate between them. The metadata exists so any code that uses the CompetencyFramework
object can pass on useful knowledge about the framework to its output.
The metadata has only two pieces of data:
- name: A machine-readable name. Should be in snake case (e.g. onet_ksat
)
- description: A human-readable description.
Basic Example¶
Using Python Constructor
from skills_ml.ontologies import Competency, CompetencyFramework
framework = CompetencyFramework(
name='Sample Framework',
description='A few basic competencies',
competencies=[
Competency(identifier='a', name='Organization'),
Competency(identifier='b', name='Communication Skills'),
Competency(identifier='c', name='Cooking')
]
)
CompetencyOntology¶
An ontology represents a collection of competencies, a collection of occupations, and a collection of all relationships between competencies and occupations. The CompetencyOntology class represents each of these three collections using a set
object. The identifiers for all of those objects are used to disambiguate between items in each of these sets. The JSON-LD representation of the ontology mirrors this internal structure.
Below is an example of the objects defined above arranged into a CompetencyOntology. For brevity, the descriptions are omitted.
Note in the Python example that importing the CompetencyOccupationEdge class is not necessary when using the Ontology; the add_edge
method of Ontology can simply take a competency and occupation directly.
Basic Example¶
Using Python Constructor
from skills_ml.ontologies import Competency, Occupation, CompetencyOntology
ontology = CompetencyOntology(
competency_name='caveman_games',
competency_description='Competencies Useful to Characters in NES title Caveman Games'
)
dinosaur_riding = Competency(identifier='12345', name='Dinosaur Riding')
extreme_transportation = Competency(identifier='123', name='Extreme Transportation')
dinosaur_riding.add_parent(extreme_transportation)
dinosaur_rider = Occupation(identifier='9999', name='Dinosaur Rider')
ontology.add_competency(dinosaur_riding)
ontology.add_competency(extreme_transportation)
ontology.add_occupation(dinosaur_rider)
ontology.add_edge(occupation=dinosaur_rider, competency=dinosaur_riding)
Using JSON-LD
from skills_ml.ontologies import CompetencyOntology
jsonld_string = """
'name': 'test_ontology',
'competencies': [{
'@type': 'Competency',
'@id': '12345',
'name': 'Dinosaur Riding',
'description': 'Using the back of a dinosaur for transportation',
'isChildOf': [{'@type': 'Competency', '@id': '123'}]
}, {
'@type': 'Competency',
'@id': '123',
'name': 'Extreme Transportation',
'description': 'Comically dangerous forms of transportation',
'hasChild': [{'@type': 'Competency', '@id': '12345'}]
}],
'occupations': [{
'@type': 'Occupation',
'@id': '9999',
'name': 'Dinosaur Rider'
}],
'edges': [{
'@type': 'CompetencyOccupationEdge',
'@id': 'competency=12345;occupation=9999',
'competency': {'@type': 'Competency', '@id': '12345'},
'occupation': {'@type': 'Occupation', '@id': '9999'}
}]
}"""
ontology = CompetencyOntology(jsonld_string=jsonld_string)
Using URL
If you have access to a URL that contains compatible JSON-LD, you can send this URL right to the constructor.
ontology = CompetencyOntology(url='https://myhost.com/ontology.json')
Using Research Hub
The Data@Work Research Hub hosts a number of ontologies publicly. The Research Hub ontology base URL is bundled into Skills-ML, so you can also just initialize one with the saved name on the Research Hub.
You can view the list of available ontologies at the Research Hub
ontology = CompetencyOntology(research_hub_name='esco')
Creating CompetencyOntology from CandidateSkills¶
To evaluate a method of skill extraction, it can be useful to format the output (a collection of CandidateSkill objects) as a CompetencyOntology. Importing skills_ml.ontologies.from_candidate_skills.ontology_from_candidate_skills
enables this conversion. At present, the ontology_from_candidate_skills
simply adds each of the found competencies to a bare ontology, and optionally associates them with the source object's occupation if tagged with one.
Included Ontologies¶
ONET¶
The skills_ml.ontologies.onet
module contains a Onet class inherited from CompetencyOntology. This class can be built either from a hosted JSON-LD file on the Research Hub, or by default will build the ontology during the instantiation from a variety of files on the ONET site, using at the time of writing the latest version of onet (db_v22_3):
- Content Model Reference.txt
- Knowledge.txt
- Skills.txt
- Abilities.txt
- Tools and Technology.txt
- Occupation Data.txt
from skills.ml.ontologies.onet import Onet
ONET = Onet()
# this will take a while as it downloads the relatively large files and processes them
ONET.filter_by(lambda edge: 'forklift' in edge.competency.name)
If you pass in an ONET cache object, the raw ONET files can be cached on your filesystem so that building it the second time will be faster.
from skills_ml.storage import FSStore
from skills_ml.datasets.onet_cache import OnetSiteCache
from skills_ml.ontologies.onet import Onet
ONET = Onet(OnetSiteCache(FSStore('onet_cache')))
To build from the research hub, pass manual_build=False
. This may be slightly quicker, and will be resilient to potential changes to the ONET site format.
ONET = Onet(manual_build=False)
ESCO¶
The skills_ml.ontologies.esco
module contains an Esco class inherited from CompetencyOntology that implements the European Skills and Competences and Occupations site. This class by default is built from a hosted JSON-LD file on the Research Hub, but you can also have it build right from the ESCO site. Building right from the ESCO site involves thousands of API calls, so we recommend building it from the Research Hub JSON-LD
from skills_ml.ontologies.esco import Esco
ESCO = Esco() # will build from premade Research Hub JSON-LD
ESCO = Esco(manual_build=True) # will build from ESCO site, may take hours
Uses of Ontologies¶
Filtering¶
You can filter the graph to produce subsets based on the list of edges. This will return another CompetencyOntology object, so any code that takes an ontology as input will work on the subsetted graph.
You can optionally supply competency_name
and competency_description
keyword arguments to apply to the CompetencyFramework
in the returned ontology object. This is necessary if you wish to send the CompetencyFramework
object in the resulting ontology to algorithms in skills-ml.
# Return an ontology that consists only of competencies with 'python' in the name, along with their related occupations
ontology.filter_by(lambda edge: 'python' in edge.competency.name.lower())
# Return an ontology that consists only of occupations with 'software' in the name, along with their associated competencies
ontology.filter_by(lambda edge: 'software' in edge.competency.name.lower())
# Return an ontology that is the intersection of 'python' competencies and 'software' occupations
ontology.filter_by(lambda edge: 'software' in edge.occupation.name.lower() and 'python' in edge.competency.name.lower())
# Return only competencies who have a parent competency containing 'software'
ontology.filter_by(lambda edge: any('software' in parent.name.lower() for parent in edge.parents)
# Return an ontology with only 'python' competencies, and set a name/description for the resulting CompetencyFramework
ontology.filter_by(lambda edge: 'python' in edge.competency.name.lower(), competency_name='python', competency_description='Python-related competencies')
Skill Extraction: competencies-only¶
Many list-based skill extraction require a CompetencyFramework as input. This can be retrieved directory from the CompetencyOntology
object.
from skills_ml.algorithms.skill_extractors import ExactMatchSkillExtractor
from skills_ml.job_postings.common_schema import JobPostingCollectionSample
skill_extractor = ExactMatchSkillExtractor(ontology.competency_framework)
for candidate_skill in skill_extractor.candidate_skills(JobPostingCollectionSample()):
print(candidate_skill)
Skill Extraction: filtered competencies¶
If you wish to filter a CompetencyOntology
and then use it for skill extraction, you must make sure it has a name and description, either through the optional filter_by
keyword argument or through modifying the CompetencyFramework
instance directly.
from skills_ml.algorithms.skill_extractors import ExactMatchSkillExtractor
from skills_ml.job_postings.common_schema import JobPostingCollectionSample
# Option 1: Using filter_by keyword arguments (recommended)
competency_framework = ontology.filter_by(
lambda edge: 'python' in edge.competency.name.lower(),
competency_name='python',
competency_description='Python-related competencies'
).competency_framework
# Option 2: Modifying competency_framework afterwards
competency_framework = ontology.filter_by(lambda edge: 'python' in edge.competency.name.lower()).competency_framework
competency_framework.name = 'python'
competency_framework.description = 'Python-related competencies'
skill_extractor = ExactMatchSkillExtractor(competency_framework)
for candidate_skill in skill_extractor.candidate_skills(JobPostingCollectionSample()):
print(candidate_skill)
Skill Extraction: full ontology¶
The SocScopedExactMatchSkillExtractor requires both occupation and competency data, so it takes in the entire CompetencyOntology
as input.
from skills_ml.algorithms.skill_extractors import ExactMatchSkillExtractor
from skills_ml.job_postings.common_schema import JobPostingCollectionSample
skill_extractor = SocScopedExactMatchSkillExtractor(ontology)
for candidate_skill in skill_extractor.candidate_skills(JobPostingCollectionSample()):
print(candidate_skill)
Exporting as JSON-LD¶
You can export an ontology as a JSON-LD object for storage that you can later import
import json
with open('out.json', 'w') as f:
json.dump(ontology.jsonld, f)