[TOC]
skills_ml.datasets.cbsa_shapefile
Use the Census CBSA shapefile
download_shapefile
download_shapefile(cache_dir)
Download Tiger 2015 CBSA Shapefile
Downloads the zip archive and unzips the contents
Args
cache_dir (string) local path to download files to
- Returns: (string) Path to the extracted shapefile
skills_ml.datasets.cousub_ua
Retrieve County Subdivision->Urbanized Area crosswalk
cousub_ua
cousub_ua(city_cleaner)
Construct a County Subdivision->UA Lookup table from Census data Returns: dict { StateCode: { CountySubdivisionName: UA Code } }
skills_ml.datasets.job_titles
Process lists of job titles into a common format
skills_ml.datasets.job_titles.elasticsearch
Index job title/occupation pairs in Elasticsearch.
JobTitlesMasterIndexer
JobTitlesMasterIndexer(self, job_title_generator, alias_name, **kwargs)
Args: job_title_generator (iterable). Each record is expected to be a dict with keys 'Title' for the job title and 'Original Title' for the occupation
skills_ml.datasets.job_titles.onet
Process ONET job titles into a common format
Onet_Title
Onet_Title(self, onet_cache)
An object representing job title data from different ONET files
Originally written by Kwame Porter Robinson
OnetTitleExtractor
OnetTitleExtractor(self, output_filename, onet_source, hash_function)
An object that creates a job titles CSV based on ONET data
skills_ml.datasets.nber_county_cbsa
Retrieve county->CBSA crosswalk file from the NBER
cbsa_lookup
cbsa_lookup()
Construct a County->CBSA Lookup table from NBER data Returns: dict each key is a (State Code, County FIPS code) tuple each value is a (CBSA FIPS code, CBSA Name) tuple
skills_ml.datasets.negative_positive_dict
negative_positive_dict
negative_positive_dict()
Construct a dictionary of terms that are considered not to be in job title, including states, states abv, cities Returns: dictionary of set
skills_ml.datasets.onet_cache
OnetCache
OnetCache(self, s3_conn, s3_path, cache_dir)
An object that downloads and caches ONET files from S3
OnetSiteCache
OnetSiteCache(self, storage=None)
An object that downloads files from the ONET site
skills_ml.datasets.onet_source
Download ONET files from their site
OnetToMemoryDownloader
OnetToMemoryDownloader(self, /, *args, **kwargs)
Downloads newest version of ONET as of time of writing and returns it as text
OnetToDiskDownloader
OnetToDiskDownloader(self, /, *args, **kwargs)
skills_ml.datasets.partner_updaters
Update raw job postings from external partners
skills_ml.datasets.partner_updaters.usa_jobs
Update raw job postings from the USAJobs API
USAJobsUpdater
USAJobsUpdater(self, auth_key, key_email, session=None)
skills_ml.datasets.place_ua
Retrieve Census Place->Urbanized Area crosswalk
place_ua
place_ua(city_cleaner)
Construct a Place->UA Lookup table from Census data Returns: dict { StateCode: { PlaceName: UA Code } }
skills_ml.datasets.sba_city_county
Retrieve county lookup tables from the SBA for each state
county_lookup
county_lookup()
Retrieve county lookup tables if they are not already cached
Returns: (dict) each key is a state, each value is a dict {city_name: (fips_county_code, county_name)}
skills_ml.datasets.skill_importances
Process lists of occupation skill importances into a common format
skills_ml.datasets.skill_importances.onet
Process ONET data to create a dataset with occupations and their skill importances
OnetSkillImportanceExtractor
OnetSkillImportanceExtractor(self, storage, output_dataset_name, hash_function=None)
An object that creates a skills importance CSV based on ONET data
Originally written by Kwame Porter Robinson
skills_ml.datasets.skills
Process lists of skills into a common format
skills_ml.datasets.skills.ceasn_from_onet
skills_ml.datasets.skills.onet_ksat
Process ONET skill lists of various types into a common format
OnetSkillListProcessor
OnetSkillListProcessor(self, onet_source, output_filename, hash_function, ksa_types=None)
An object that creates a skills CSV based on ONET data
Originally written by Kwame Porter Robinson
skills_ml.datasets.ua_cbsa
Retrieve
Urbanized Area->CBSA crosswalk
ua_cbsa
ua_cbsa()
Construct a UA->CBSA Lookup table from Census data Returns: dict { UA Fips: [(CBSA FIPS, CBSA Name)] }