FAQ anonymization

What is DICOM?

DICOM is a standard for handling medical images from all kinds of medical imaging devices.  The DICOM standard describes how medical image files are stored and how they are transferred over computer networks.  The complexity of the DICOM standard can be appreciated from the home page http://medical.nema.org/standard.html.

What is Dicom2USB?

Dicom2USB is a hardware device used to export and anonymize DICOM data from within the hospital network.  It is a small, standalone, networked hardware device, which acts as a DICOM receiver from any radiographic workstation, scanner or PACS client at the hospital.  The main benefits are:

  • The easiest method to export and anonymize DICOM images from workstations and PACS clients – to your personal USB drive
  • The most complete anonymization, 3000 DICOM elements are known, and handled.

What kind of information is in a DICOM file?

The DICOM files contains the image itself, but also thousands of data elements describing the medical image in terms of whowhatwhen, and why.  This is equivalent to how modern digital cameras stores information such as time, date and GPS coordinates within the image files.

The purpose of this extra data, called elements, are to make sure that a patient is diagnosed and treated correctly.  The elements directly states a patient name, phone number, ethnic group, occupation, religious preference, mother’s birth name, and so on.  Also referring Physician’s information may be stored, and information about the hospital/institution where the image was scanned.

The date and time of the image scanning is stored, together with information about the technical conditions and how different images are related to each other.

Every single piece of information described above can be used to identify a patient, either directly (names, addresses) or indirectly (hospital, date, time, type of scanner).

May I use DICOM files on my computer?

No!  Most countries in the world have laws that prohibit medical images to be moved outside the hospital.

I have to use DICOM files on my computer

Since the need exists there is a common (often legal) practice to de-identify the images.  This really means to de-identify (synonym to anonymize) the extra data in the image files.

Typical uses of DICOM images outside the hospital are for research or teaching, but also giving a patient access to their scans.

What is de-identification, anonymization, pseudonymization?

The first two paragraphs quoting Wikipedia (2013-OCT-10):

De-identification is also a severing of a data set from the identity of the data contributor, but may include preserving identifying information which could only be re-linked by a trusted party in certain situations”

Anonymization refers to irreversibly severing a data set from the identity of the data contributor in a study to prevent any future re-identification, even by the study organizers under any condition”

Pseudonymization means de-identification and replacement of identifiers with a pseudonym, so that data from the same individual always have the same identity (within a specific research context).

Can I anonymize data myself?

A guidence has been published outlining the US rules [Guidance on De-identification].

The HIPAA Booklet states two methods that are acceptable [HIPAA booklet] :

  1. Remove all data defined by 18 classes of identifiers (the “safe harbor” method, see link)
  2. Obtain confirmation from a qualified statistician that the risk of identification is very small (the “expert determination” method)

Thus, the knowledge of what information is needed to identify a person should not be taken light-heartedly, and is not up to an individual to decide.

Work has been done by RSNA and the DICOM standard to clarify what data are included in these 18 classes of identifiers.  RSNA lists about 200 different DICOM elements that should be de-identified in different ways.  The question is how you should handle the 3000 different DICOM elements that exists!

So, do not attempt to anonymize data yourself, if you are not planning to at least implement all the rules that RSNA has listed.  This includes solving problems such as remapping of unique identifiers in sequence tags, changing dates, to the more obvious of creating new unique identifiers for each new file.

You also need to register your own base UID, since all new images have to get their unique identifier.

Can I trust anonymization software?

There was a comparison between different anonymizing softwares, which showed that “most DICOM anonymizers have bugs, which cause them to be unusable in some fashion or another.” [Comparison].

Also the best software had bugs that made something called unique identifers not work within a class of DICOM elements called sequences.  This is troublesome because images loose the linking to other images in the data set.

Even software from commercial vendors have shown problems destroying important information.

What are the dangers with anonymization softwares?

Quite a few!

Common to all anonymization softwares:

  • Often not doing a good job (see above: Can I trust anonymization software?)
  • The user needs to manually enter what DICOM fields are going to be modified, often for each exam
  • The user needs to manually invent a new identity for each exam
  • Data may become corrupted, in the way that two exams from the same patient gets different identities (due to manual edits)
  • Dates may or may not be changed.  If manually edited, the dates between two different exams are likely to be wrong, giving uncertainty in follow up exams.
  • Private tags in DICOM may contain non-standard information which reveals patient information.  Some private tags are important and should be saved.  Typically you decide on either keeping or removing private tags.

On a medical workstation there are software for anonymization.  They are limited by:

  • Does not support files from different scanner manufacturers.
  • This means that anonymization of a study involving equipment from two manufacturers involves anonymization at two different work stations.
  • Long hours of burning CD discs and entering data manually.
  • If USB devices are supported, they may introduce viruses etc.  It is not recommended to introduce USB devices to instruments being used for clinic. Better to use a stand-alone unit such as Dicom2USB so that USB devices do not come in contact with medical devices.
  • Using USB devices may (and should) not be allowed on medical equipment

Standalone software on your PC have similar dangers:

  • PC:s are often portable devices that can be lost or stolen.  Lets use the term laptop for a PC.
  • Original DICOM files with patient information has to be moved to the laptop.  Probably by CD:s that have to be burned at the workstation (time-consuming) and read into the laptop (time-consuming)
  • Original DICOM files are probably left on the laptop after anonymization
  • If you remember to delete the original DICOM files, they can still be easily undeleted from the harddisk.  This feature is included in modern Windows systems.
  • Viruses may be introduced to a USB device on a laptop, which can then be moved to the medical workstation next time the device is used.

Is Dicom2USB anonymization better?

Yes, it is better than the competition because all problems reported above (see above question) have been resolved.

The default settings are for complete anonymization, not like many softwares that rely on the user to decide what to anonymize and what to keep.

Dicom2USB does (in default settings) not save any original data anywhere.  Data is received through network connection and anonymized on-the-fly before written to a USB device.  Original data is not stored anywhere.

Dicom2USB effectively works as a firewall for the USB device against the medical workstations, image archives and scanning equipments.  Thus, Dicom2USB is an anti-virus insurance for your medical equipment.

What are the benefits with Dicom2USB?

  • Almost no time spent!
  • No trouble handling CDs
  • No USB devices on medical devices
  • No manual intervention needed (leads to better quality)
  • Rules for handling (anonymizing) more than 3000 elements
  • Image transfers using hospital standard (DICOM network)
  • All modalities and manufacturers supported (patient data from mixed manufacturers will be consistent)
  • A consistent folder structure, allowing humans to get correct files

What about text in images?

Text “burned-in” into the pixels of the image is a problem since such images will reveal this information to the human reader.  This is referred to as burned in annotations.

Dicom2USB is the only software known to us that rejects images that have text within the images.

Sometimes there is a flag in a DICOM element telling that there are burned-in annotations, but sometimes this flag does not exist.  If the flag is missing, these images are potentially dangerous.

Dicom2USB handles this two ways:

  1. Typical user – only gets images known to be safe.  Dangerous, and potentially dangerous images are discarded.
  2. Advanced user – may retrieve the images.  They are then stored in a QUARANTINE directory on the USB device.

Thus, Dicom2USB does not allow images containing text to be stored without special approval from an informed user.

Is it possible to make patient data untraceble?

A test was done on the Safe Harbour Method by the independent research organization NORC at the University of Chicago.  They found a match rate of only 0.01%, when the HIPAA Safe Harbour rules were used [link].  Thus, 99.99% were not identified correctly, and nobody performing this work would have a chance to know what patients were the 0.01% correctly identified.

Open source Licenses

The Dicom2USB is a hardware device built with integrated software, used under licences described in Open source credits.