EMBED Overview

Feature name	Description
`empi_anon`	Unique anonymized patient ID, all exams for a patient will have the same ID
`acc_anon`	Unique ID per exam, all rows for an exam will have the same acc_anon (and the same empi_anon). Negative exams or exams with only one finding will have a single row per acc_anon, and exams with multiple findings will have multiple rows.
`study_date_anon`	Anonymized date that the exam was signed. This may differ slightly than the date the exam was acquired. All dates are shifted randomly across patients, but the same within a patient to maintain temporality between multiple exams and pathology results for a patient.
`desc`	The study description such as screening or diagnostic mammogram
`tissueden`	BIRADS breast density 1: The breasts are almost entirely fat (BIRADS A) 2: Scattered fibroglandular densities (BIRADS B) 3: Heterogeneously dense (BIRADS C) 4: Extremely dense (BIRADS D) 5: Normal male**
`asses`	The BI-RADS score of the exam. This is assigned globally for an exam but repeated in each finding row. BIRADS 0: A – Additional evaluation BIRADS 1: N – Negative BIRADS 2: B - Benign BIRADS 3: P – Probably benign BIRADS 4: S – Suspicious BIRADS 5: M - Highly suggestive of malignancy BIRADS 6: K - Known biopsy proven Screening exams may have BIRADS 0, 1, 2, or 3. Diagnostic exams may have BIRADS 4, 5, or 6.
`numfind`	Index of the finding number for an exam beginning with 1
`side`	Side of the finding described in the current row L: left R: right B: bilateral
`total_l_find`	Number of unique findings for the left breast for a given exam.
`total_r_find`	Number of unique findings for the right breast for a given exam.
`massshape`	Mass shape according to BIRADS descriptors. Also includes asymmetries and architectural distortion (see ./tables/clinical_legend.csv)
`massmargin`	Mass margin according to BIRADS descriptors (see ./tables/clinical_legend.csv)
`massdens`	Mass density according to BIRADS descriptors (see ./tables/clinical_legend.csv)
`calcfind`	Type of calcification according to BIRADS descriptors (see ./tables/clinical_legend.csv)
`calcdistri`	Distribution of calcifications according to BIRADS descriptors (see ./tables/clinical_ legend.csv)
`bside`	Laterality of any pathology result L: left R: right
`procdate_anon`	Date of pathology result.
`type`	Source of the of tissue specimen obtained - biopsy, FNA, lumpectomy, etc. (see ./tables/clinical_legend.csv). This is helpful in cases where this is a biopsy followed by a lumpectomy. The pathology entries will contain information from both events, however the lumpectomy pathology results would typically supersede biopsy results
`path1` - `path10`	Individual pathologic diagnoses from a given specimen. For example, a given specimen may contain invasive ductal carcinoma (ID), ductal carcinoma in situ (DC), and radial scar (RS). This row would contain these entries in path 1 – path 3 (see ./tables/clinical_legend.csv)
`path_severity`	The most severe pathology result from a given specimen, abstracted from path1 – path10 (see see ./tables/pathology_legend.csv for classification schema) 0: invasive cancer 1: non-invasive cancer 2: high-risk lesion 3: borderline lesion 4: benign findings 5: negative (normal breast tissue) 6: non-breast cancer
`RACE_DESC`	Patient Race
`ETHNIC_GROUP_DESC`	Patient Ethnicity

Feature name	Description
`empi_anon`	Unique anonymized patient ID, all exams for a patient will have the same ID
`acc_anon`	Unique ID per exam, all images within an exam will have the same acc_anon (and the same empi_anon)
`anon_dicom_path`	Anonymized full dicom file path
`png_path`	Full png image path (not relevant for AWS Open Data release)
`png_filename`	Filename only of the png (not relevant for AWS Open Data release). Filenames are hashed and therefore are unique across all files, allowing this field to be used as an index if desired.
`study_date_anon`	Anonymized date of acquisition of the exam. This may differ slightly from study_date_anon in the clinical data which represents the date the report was signed.
`StudyDescription`	The exam type - for example a screening or diagnostic mammogram. On occasion, the study type may differ than what was recorded in the clinical data sheet due to mistakes in data entry at the time of acquisition. Images and other fields can be reviewed to troubleshoot, or these exams can be discarded.
`SeriesDescription`	The name of the series for an exam. This can vary depending on 2D, 3D, or C-view images but typically contains the type of view (CC, MLO, etc) and/or the laterality. This field is not frequently required. See FinalImageType and ImageLateralityFinal
`FinalImageType`	Derived by combining information from several other DICOM fields to ascertain the image type 2D: standard 2D digital mammogram 3D: digital breast tomosynthesis (DBT). These cannot be currently used as they are locked in a proprietary container and are not part of the AWS Open Data release. C-view: synthetic 2D image derived from the DBT ROI_SSC/ROI_SS: screensave images annotated by the radiologist with a circlular ROI burned into the pixel data. These ROIs have already been extracted and mapped to their source image. These are not part of the AWS Open Data release.
`ImageLateralityFinal`	Derived by combining information from several other DICOM fields. L: left breast R: right breast
`ViewPosition`	Type of view acquired such as CC or MLO
`spot_mag`	Indicates if the image is a special view such as spot compression or magnificiation. 0: image is a full field digital mammogram (FFDM). All screening studies are FFDM. 1: image is a special. Often used in diagnostic exams but should not occur for screening exams.
`num_roi`	Number of ROIs for a given file
`ROI_coords`	Coordinate(s) of any detected ROI on the image, represented as a list of lists. Sublists contains corner coordinates for ROI in the format ‘ymin, xmin, ymax, xmax’. For 2D and C-view images, this field is the location of the ROI on the image. For the screensave (ROI_SSC/ROI_SS) images, this field is the location of the burned in ROI on the screensave image which serves as the source of the ROI location. Screensaves are not part of the AWS Open Data release.

¶ Emory Breast Imaging Dataset

¶ Summary

¶ Primer on Mammography

¶ Summary Statistics

¶ File Structure

¶ Clinical Data

¶ Data Types

¶ Examples

¶ Key fields

¶ Image Metadata

¶ Key fields

¶ Special Note: Merging Clinical and Metadata files

¶ Special Note: Pathology Results in Clinical Data