[TOC]

skills_ml.datasets.cbsa_shapefile

Use the Census CBSA shapefile

download_shapefile

download_shapefile(cache_dir)

Download Tiger 2015 CBSA Shapefile

Downloads the zip archive and unzips the contents

Args

cache_dir (string) local path to download files to
  • Returns: (string) Path to the extracted shapefile

skills_ml.datasets.cousub_ua

Retrieve County Subdivision->Urbanized Area crosswalk

cousub_ua

cousub_ua(city_cleaner)

Construct a County Subdivision->UA Lookup table from Census data Returns: dict { StateCode: { CountySubdivisionName: UA Code } }

skills_ml.datasets.job_titles

Process lists of job titles into a common format

skills_ml.datasets.job_titles.elasticsearch

Index job title/occupation pairs in Elasticsearch.

JobTitlesMasterIndexer

JobTitlesMasterIndexer(self, job_title_generator, alias_name, **kwargs)

Args: job_title_generator (iterable). Each record is expected to be a dict with keys 'Title' for the job title and 'Original Title' for the occupation

skills_ml.datasets.job_titles.onet

Process ONET job titles into a common format

Onet_Title

Onet_Title(self, onet_cache)

An object representing job title data from different ONET files

Originally written by Kwame Porter Robinson

OnetTitleExtractor

OnetTitleExtractor(self, output_filename, onet_source, hash_function)

An object that creates a job titles CSV based on ONET data

skills_ml.datasets.nber_county_cbsa

Retrieve county->CBSA crosswalk file from the NBER

cbsa_lookup

cbsa_lookup()

Construct a County->CBSA Lookup table from NBER data Returns: dict each key is a (State Code, County FIPS code) tuple each value is a (CBSA FIPS code, CBSA Name) tuple

skills_ml.datasets.negative_positive_dict

negative_positive_dict

negative_positive_dict()

Construct a dictionary of terms that are considered not to be in job title, including states, states abv, cities Returns: dictionary of set

skills_ml.datasets.onet_cache

OnetCache

OnetCache(self, s3_conn, s3_path, cache_dir)

An object that downloads and caches ONET files from S3

OnetSiteCache

OnetSiteCache(self, storage=None)

An object that downloads files from the ONET site

skills_ml.datasets.onet_source

Download ONET files from their site

OnetToMemoryDownloader

OnetToMemoryDownloader(self, /, *args, **kwargs)

Downloads newest version of ONET as of time of writing and returns it as text

OnetToDiskDownloader

OnetToDiskDownloader(self, /, *args, **kwargs)

skills_ml.datasets.partner_updaters

Update raw job postings from external partners

skills_ml.datasets.partner_updaters.usa_jobs

Update raw job postings from the USAJobs API

USAJobsUpdater

USAJobsUpdater(self, auth_key, key_email, session=None)

skills_ml.datasets.place_ua

Retrieve Census Place->Urbanized Area crosswalk

place_ua

place_ua(city_cleaner)

Construct a Place->UA Lookup table from Census data Returns: dict { StateCode: { PlaceName: UA Code } }

skills_ml.datasets.sba_city_county

Retrieve county lookup tables from the SBA for each state

county_lookup

county_lookup()

Retrieve county lookup tables if they are not already cached

Returns: (dict) each key is a state, each value is a dict {city_name: (fips_county_code, county_name)}

skills_ml.datasets.skill_importances

Process lists of occupation skill importances into a common format

skills_ml.datasets.skill_importances.onet

Process ONET data to create a dataset with occupations and their skill importances

OnetSkillImportanceExtractor

OnetSkillImportanceExtractor(self, storage, output_dataset_name, hash_function=None)

An object that creates a skills importance CSV based on ONET data

Originally written by Kwame Porter Robinson

skills_ml.datasets.skills

Process lists of skills into a common format

skills_ml.datasets.skills.ceasn_from_onet

skills_ml.datasets.skills.onet_ksat

Process ONET skill lists of various types into a common format

OnetSkillListProcessor

OnetSkillListProcessor(self, onet_source, output_filename, hash_function, ksa_types=None)

An object that creates a skills CSV based on ONET data

Originally written by Kwame Porter Robinson

skills_ml.datasets.ua_cbsa

Retrieve Urbanized Area->CBSA crosswalk

ua_cbsa

ua_cbsa()

Construct a UA->CBSA Lookup table from Census data Returns: dict { UA Fips: [(CBSA FIPS, CBSA Name)] }