deid.dicom package

Submodules

deid.dicom.fields module

class deid.dicom.fields.DicomField(element, name, uid, is_filemeta=False)[source]

Bases: object

A dicom field.

A dicom field holds the element, and a string that represents the entire nested structure (e.g., SequenceName__CodeValue).

name_contains(expression)[source]

Determine if a name contains a pattern or expression.

Use re to search a field for a regular expression, meaning the name, the keyword (nested) or the string tag.

name.lower: includes nested keywords (e.g., Sequence_Child) self.tag: is the string version of the tag self.element.name: is the human friendly name “Sequence Child” self.element.keyword: is the name without nesting “Child”

select_matches(expression)[source]

Determine whether the element has a specific selected attribute

property stripped_tag

Return the stripped element tag

property tag

Return a string of the element tag.

value_contains(expression)[source]

Use re to search a field value for a regular expression

deid.dicom.fields.expand_field_expression(field, dicom, contenders=None)[source]

Get a list of fields based on an expression.

If no expression found, return single field. Options for fields include:

endswith: filter to fields that end with the expression startswith: filter to fields that start with the expression contains: filter to fields that contain the expression select: filter based on DICOM element properties allfields: include all fields exceptfields: filter to all fields except those listed ( | separated)

Returns: a list of DicomField objects

deid.dicom.fields.extract_item(item, prefix=None, entry=None)[source]

Extract values from a dicom sequence depending on the type.

A helper function to extract sequence, will extract values from a dicom sequence depending on the type.

Parameters:item (an item from a sequence.) –
deid.dicom.fields.extract_sequence(sequence, prefix=None)[source]

Extract a sequence recursively.

return a pydicom.sequence.Sequence recursively as a flattened list of items. For example, a nested FieldA and FieldB would return as:

{‘FieldA__FieldB’: ‘111111’}

Parameters:
  • sequence (the sequence to extract, should be pydicom.sequence.Sequence) –
  • prefix (the parent name) –
deid.dicom.fields.get_fields(dicom, skip=None, expand_sequences=True, seen=None)[source]

Expand all dicom fields into a list.

Each entry is a DicomField. If we find a sequence, we unwrap it and represent the location with the name (e.g., Sequence__Child)

deid.dicom.filter module

deid.dicom.filter.apply_filter(dicom, field, filter_name, value)[source]

essentially a switch statement to apply a filter to a dicom file.

Parameters:
  • dicom (the pydicom.dataset Dataset (pydicom.read_file)) –
  • field (the name of the field to apply the filter to) –
  • filer_name (the name of the filter to apply (e.g., contains)) –
  • value (the value to set, if filter_name is valid) –
deid.dicom.filter.compareBase(self, field, expression, func, ignore_case=True)[source]

Search a field for an expression.

compareBase takes either re.search (for contains) or re.match (for matches) and returns True if the given regular expression is contained or matched

deid.dicom.filter.contains(self, field, expression)[source]

Determine if a field value contains an expression.

contains returns true if the value of the identifier contains the the string argument anywhere within it; otherwise, it returns false.

deid.dicom.filter.empty(self, field)[source]

Determine if the value is empty.

Empty returns True if the value is found to be “”. If the field is not present for the dicom, then we return False (missing != empty)

deid.dicom.filter.endsWith(self, field, term)[source]

Determine if a field value ends with an expression.

endsWith returns true if the value of the identifier ends with the string argument; otherwise, it returns false.

deid.dicom.filter.equals(self, field, term)[source]

returns true if the value of the identifier exactly equals the string argument; otherwise, it returns false.

deid.dicom.filter.equalsBase(self, field, term, ignore_case=True, not_equals=False)[source]

base of equals, with variable for ignore case (default True)

deid.dicom.filter.matches(self, field, expression)[source]

Determine if a field value matches an expression.

matches returns true if the value of the identifier matches the regular expression specified in the string argument; otherwise, it returns false.

deid.dicom.filter.missing(self, field)[source]

Determine if the dicom is missing a field.

Missing returns True if the dicom is missing the field entirely This means that the entire field is None

deid.dicom.filter.notContains(self, field, expression)[source]

Determine if a field value does not contain an expression.

notContains returns true if the value of the identifier does not contain the the string argument anywhere within it;

deid.dicom.filter.notEquals(self, field, term)[source]
deid.dicom.filter.startsWith(self, field, term)[source]

Determine if a field value starts with an expression.

startsWith returns true if the value of the identifier starts with the string argument; otherwise, it returns false.

deid.dicom.groups module

deid.dicom.groups.extract_fields_list(dicom, actions, fields=None)[source]

Given a list of actions for a named group (a list) extract values from the dicom based on the list of actions provided. This function always returns a list intended to update some lookup to be used to further process the dicom.

deid.dicom.groups.extract_values_list(dicom, actions, fields=None)[source]

Given a list of actions for a named group (a list) extract values from the dicom based on the list of actions provided. This function always returns a list intended to update some lookup to be used to further process the dicom.

deid.dicom.header module

deid.dicom.header.get_identifiers(dicom_files, force=True, config=None, strip_sequences=False, remove_private=False, disable_skip=False, expand_sequences=True)[source]

Extract all identifiers from a dicom image.

This function returns a lookup by file name, where each value indexed includes a dictionary of nested fields (indexed by nested tag).

Parameters:
  • dicom_files (the dicom file(s) to extract from) –
  • force (force reading the file (default True)) –
  • config (if None, uses default in provided module folder) –
  • strip_sequences (if True, remove all sequences) –
  • remove_private (remove private tags) –
  • disable_skip (do not skip over protected fields) –
  • expand_sequences (if True, expand sequences. otherwise, skips) –
deid.dicom.header.remove_private_identifiers(dicom_files, save=True, overwrite=False, output_folder=None, force=True)[source]

Remove private identifiers.

remove_private_identifiers is a wrapper for the simple call to dicom.remove_private_tags, it simply reads in the files for the user and saves accordingly

deid.dicom.header.replace_identifiers(dicom_files, ids=None, deid=None, save=False, overwrite=False, output_folder=None, force=True, config=None, strip_sequences=False, remove_private=False, disable_skip=False)[source]

Replace identifiers.

replace identifiers using pydicom, can be slow when writing and saving new files. If you want to replace sequences, they need to be extracted with get_identifiers and expand_sequences to True.

deid.dicom.parser module

class deid.dicom.parser.DicomParser(dicom_file, recipe=None, config=None, force=True, disable_skip=False)[source]

Bases: object

Parse a dicom, performing one or more actions on fields.

A dicom parser serves as a cache to read in all fields from a dicom file. For each, we store the element and child elements

add_field(field, value)[source]

Add a field to the dicom.

If it’s already present, update the value.

blank_field(field)[source]

Blank a field

define(name, value)[source]

Add a function or variable to the lookup for later usage.

This can be used for functions, lists, or variables.

delete_field(field)[source]

Delete a field from the dicom.

We do this by way of parsing all nested levels of a tag into actual tags, and deleting the child node.

property excluded_from_deletion

Return once-evaluated list of fields that are not removed by REMOVE ALL or REMOVE SomeField, as they later have to be changed by REPLACE / JITTER That allows whitelisting fields from REMOVE ALL/SomeField to change them if needed (i.e. obfuscation)

find_by_name(name)[source]

Find fields by name.

Given a string, find all field objects that contain the name. Name can correspond to:

  • a string of the tag, with or without the parens and comma/space
  • a keyword
  • a field name
find_by_values(values)[source]

Find fields by values.

Given a list of values, find fields in the dicom that contain any of those values, as determined by a regular expression search.

get_fields(expand_sequences=True)[source]

expand all dicom fields into a list, where each entry is a DicomField. If we find a sequence, we unwrap it and represent the location with the name (e.g., Sequence__Child)

get_nested_field(field, return_parent=False)[source]

Retrieve a nested field.

Based on a DicomField, return the one referenced in self.dicom. If a delete is needed, then the parent should be returned as well.

property keep

Return a list of fields to keep original, as defined by all KEEP actions in recipe Those fields are not impacted by REPLACE/JITTER actions

load(dicom_file, force=True)[source]

Load the dicom file.

Ensure that the dicom file exists, and use full path. Here we load the file, and save the dicom, dicom_file, and dicom_name.

parse(strip_sequences=False, remove_private=False)[source]

Parse the dicom.

The parse action corresponds to iterating through fields, and for each one, saving a data structure with the full element, the string (with nested representation of the keywords) and the tag. We want to save all three in a flat list that is easy to search over, and also build up actions for the lookup on the first parsing.

perform_action(field, value, action, filemeta=False)[source]

Perform an action on a field.

perform action takes an action (dictionary with field, action, value) and performs the action on the loaded dicom.

Parameters:
  • field (a field for expand) –
  • value (field value) –
  • action (the action from the parsed deid to take) – “field” (eg, PatientID) the header field to process “action” (eg, REPLACE) what to do with the field “value”: if needed, the field from the response to replace with
remove_private()[source]

Remove private tags from the loaded dicom

replace_field(field, value)[source]

Replace a value in a field.

This uses the same function as ADD, but likely the dicom has the value.

reset_preamble()[source]

reset the preamble

save(filename, overwrite=False)[source]

Save a dicom to file.

property skip

Return a list of fields to skip, as defined in the self.config

deid.dicom.tags module

deid.dicom.tags.add_tag(identifier, VR='ST', VM=None, name=None, keyword=None)[source]

Add tag will take a string for a tag (e.g., ) and define a new tag for it. By default, we give the type “Short Text.”

deid.dicom.tags.find_tag(term, VR=None, VM=None, retired=False)[source]

find_tag will search over tags in the DicomDictionary and return the tags found to match some term.

deid.dicom.tags.get_private(dicom)[source]

get private tags

Parameters:dicom (the pydicom.dataset Dataset (pydicom.read_file)) –
deid.dicom.tags.get_tag(field)[source]

get_tag will return a dictionary with tag indexed by field. For each entry, a dictionary lookup is included with VR.

Parameters:field (the keyword to get tag for, eg "PatientIdentityRemoved") –
deid.dicom.tags.has_private(dicom)[source]

has_private will return True if the header has private tags

Parameters:dicom (the pydicom.dataset Dataset (pydicom.read_file)) –
deid.dicom.tags.remove_sequences(dicom)[source]

remove sequences from a dicom by removing the associated tag. We use dicom.iterall() to get all nested sequences.

Parameters:dicom (the loaded dicom to remove sequences) –
deid.dicom.tags.update_tag(dicom, field, value)[source]

update tag will update a value in the header, if it exists if not, nothing is added. This check is the only difference between this function and change_tag. If the user wants to add a value (that might not exist) the function add_tag should be used with a private identifier as a string.

Parameters:
  • dicom (the pydicom.dataset Dataset (pydicom.read_file)) –
  • field (the name of the field to update) –
  • value (the value to set, if name is a valid tag) –

deid.dicom.utils module

deid.dicom.utils.get_files(contenders, check=True, pattern=None, force=False, tempdir=None)[source]

Get a generator for files.

get_files will take a list of single dicom files or directories, and return a generator that yields complete paths to all files

Parameters:
  • contenders (a list of files or directories (contenders!)) –
  • check (boolean to indicate if we should validate dicoms (default True)) –
  • pattern (A pattern to use with fnmatch. If None, * is used) –
  • force (force reading of the files, if some headers invalid.) – Not recommended, as many non-dicom will come through
deid.dicom.utils.load_dicom(dcm_file)[source]
deid.dicom.utils.save_dicom(dicom, dicom_file, output_folder=None, overwrite=False)[source]

Save a dicom file to an output folder.

We make sure to not overwrite unless the user has enforced it

Parameters:
  • dicom (the pydicon Dataset to save) –
  • dicom_file (the path to the dicom file to save (we only use basename)) –
  • output_folder (the folder to save the file to) –
  • overwrite (overwrite any existing file? (default is False)) –

deid.dicom.validate module

deid.dicom.validate.validate_dicoms(dcm_files, force=False)[source]

Validate that dicom files can open and return valid set.

validate dicoms will test opening one or more dicom files, and return a list of valid files.

Parameters:dcm_files (one or more dicom files to test) –

Module contents