Standards
for Privacy of Individually Identifiable Health Information
G. Section 164.514--Other
Requirements Relating to Uses and Disclosures of Protected
Health Information
2. Limited Data Sets
March 2002 NPRM
As noted above, the Department heard many concerns that the de-identification
standard in the Privacy Rule could curtail important research, public
health, and health care operations activities. Specific concerns
were raised by State hospital associations regarding their current
role in using patient information from area hospitals to conduct
and disseminate analyses that are useful for hospitals in making
decisions about quality and efficiency improvements. Similarly,
researchers raised concerns that the impracticality of using de-identified
data would significantly increase the workload of IRBs because waivers
of individual authorization would need to be sought more frequently
for research studies even though no direct identifiers were needed
for the studies. Many of these activities and studies were also
being pursued for public health purposes. Some commenters urged
the Department to permit covered entities to disclose protected
health information for research if the protected health information
is facially de-identified, that is, stripped of direct identifiers,
so long as the research entity provides assurances that it will
not use or disclose the information for purposes other than research
and will not identify or contact the individuals who are the subjects
of the information.
In response to these concerns, the Department, in the NPRM, requested
comments on an alternative approach that would permit uses and disclosures
of a limited data set which would not include direct identifiers
but in which certain potentially identifying information would remain.
The Department proposed limiting the use or disclosure of any such
limited data set to research, public health, and health care operations
purposes only.
From the de-identification safe harbor list of identifiers, we
proposed the following as direct identifiers that would have to
be removed from any limited data set: name, street address, telephone
and fax numbers, e-mail address, social security number, certificate/
license number, vehicle identifiers and serial numbers, URLs and
IP addresses, and full face photos and any other comparable images.
The proposed limited data set could include the following identifiable
information: admission, discharge, and service dates; date of death;
age (including age 90 or over); and five-digit zip code.
The Department solicited comment on whether one or more other geographic
units smaller than State, such as city, county, precinct, neighborhood
or other unit, would be needed in addition to, or be preferable
to, the five-digit zip code. In addition, to address concerns raised
by commenters regarding access to birth date for research or other
studies relating to young children or infants, the Department clarified
that the Privacy Rule de-identification safe harbor allows disclosure
of the age of an individual, including age expressed in months,
days, or hours. Given that the limited data set could include all
ages, including age in months, days, or hours (if preferable), the
Department requested comment on whether date of birth would be needed
and, if so, whether the entire date would be needed, or just the
month and year.
In addition, to further protect privacy, the Department proposed
to condition the disclosure of the limited data set on covered entities
obtaining from the recipients a data use or similar agreement, in
which the recipient would agree to limit the use of the limited
data set to the purposes specified in the Privacy Rule, to limit
who can use or receive the data, and agree not to re-identify the
data or contact the individuals.
Overview of Public Comments
The following discussion provides an overview of the public comment
received on this proposal. Additional comments received on this
issue are discussed below in the section entitled, "Response
to Other Public Comments."
Almost all those who commented on this issue supported the basic
premise of the limited data set for research, public health, and
health care operations. Many of these commenters used the opportunity
to reiterate their opposition to the safe harbor and statistical
de- identification methods, and some misinterpreted the limited
data set proposal as creating another safe-harbor form of de-identified
data. In general, commenters agreed with the list of direct identifiers
proposed in the preamble of the NPRM; some recommended changes.
The requirement of a data use agreement was similarly widely supported,
although a few commenters viewed it as unnecessary and others offered
additional terms which they argued would make the data use agreement
more effective. Others questioned the enforceability of the data
use agreements.
A few commenters argued that the limited data set would present
a significant risk of identification of individuals because of the
increased ability to use the other demographic variables (e.g.,
race, gender) in such data sets to link to other publicly available
data. Some of these commenters also argued that the development
of computer- based solutions to support the statistical method of
de-identification is advancing rapidly and can support, in some
cases better than the limited data set, many of the needs for research,
public health and health care operations. These commenters asserted
that authorization of the limited data set approach would undermine
incentives to further develop statistical techniques for de-identification
that may be more protective of privacy.
Most commenters who supported the limited data set concept favored
including the five-digit zip code, but also wanted other geographic
units smaller than a State to be included in the limited data set.
Examples of other geographic units that commenters argued are needed
for research, public health or health care operational purposes
were county, city, full zip code, census tract, and neighborhood.
Various analytical needs were cited to support these positions,
such as tracking the occurrence of a particular disease to the neighborhood
level or using county level data for a needs assessment of physician
specialties. A few commenters opposed inclusion of the 5-digit zip
code in the limited data set, recommending that the current Rule,
which requires data aggregation at the 3-digit zip code level, remain
the standard.
Similarly, the majority of commenters addressing the issue supported
inclusion of the full birth date in the limited data set. These
commenters asserted that the full birth date was needed for longitudinal
studies, and similar research, to assure accuracy of data. Others
stated that while they preferred access to the full birth date,
their data needs would be satisfied by inclusion of at least the
month and year of birth in the limited data set. A number of commenters
also opposed inclusion of the date of birth in the limited data
as unduly increasing the risk of identification of individuals.
Final Modifications
In view of the support in the public comments for the concept of
a limited data set, the Department determines that adoption of standards
for the use and disclosure of protected health information for this
purpose is warranted. Therefore, the Department adds at Sec. 164.514(e)
a new standard and implementation specifications for a limited data
set for research, public health, or health care operations purposes
if the covered entity (1) uses or discloses only a "limited
data set" as defined at Sec. 164.514(e)(2), and (2) obtains
from the recipient of the limited data set a "data use agreement"
as defined at Sec. 164.514(e)(4). In addition, the Department adds
to the permissible uses and disclosures in Sec. 164.502(a) express
reference to the limited data set standards.
The implementation specifications do not delineate the data that
can be released through a limited data set. Rather, the Rule specifies
the direct identifiers that must be removed for a data set to qualify
as a limited data set. As with the de-identification safe harbor
provisions, the direct identifiers listed apply to protected health
information about the individual or about relatives, employers,
or household members of the individual. The direct identifiers include
all of the facial identifiers proposed in the preamble to the NPRM:
(1) Name; (2) street address (renamed postal address information,
other than city, State and zip code); (3) telephone and fax numbers;
(4) e- mail address; (5) social security number; (6) certificate/license
numbers; (7) vehicle identifiers and serial numbers; (8) URLs and
IP addresses; and (9) full face photos and any other comparable
images. The public comment generally supported the removal of this
facially identifying information.
In addition to these direct identifiers, the Department designates
the following information as direct identifiers that must be removed
before protected health information will be considered a limited
data set: (1) Medical record numbers, health plan beneficiary numbers,
and other account numbers; (2) device identifiers and serial numbers;
and (3) biometric identifiers, including finger and voice prints.
Only a few commenters specifically stated a need for some or all
of these identifiers as part of the limited data set. For example,
one commenter wanted an (encrypted) medical record number to be
included in the limited data set to support disease management planning
and program development to meet community needs and quality management.
Another commenter wanted the health plan beneficiary number included
in the limited data set to permit researchers to ensure that results
indicating sex, gender or ethnic differences were not influenced
by the participant's health plan. And a few commenters wanted device
identifiers and serial numbers included in the limited data set,
to facilitate product recalls and patient safety initiatives. However,
the Department has not been persuaded that the need for these identifiers
outweighs the potential privacy risks to the individual by their
release as part of a limited data set, particularly when the Rule
makes other avenues available for the release of information that
may directly identify an individual.
The Department does not include in the list of direct identifiers
the "catch-all" category from the de-identification safe
harbor of "any other unique identifying number, characteristic
or code." While this requirement is essential to assure that
the de-identification safe harbor does in fact produce a de-identified
data set, it is difficult to define in advance in the context of
a limited data set. Since our goal in establishing a limited data
set is not to create de-identified information and since the data
use agreement constrains further disclosure of the information,
we determined that it would only add complexity to implementation
of the limited data set with little added protection.
In response to wide public support, the Department does not designate
as a direct identifier any dates related to the individual or any
geographic subdivision other than street address. Therefore, as
part of a limited data set, researchers and others involved in public
health studies will have access to dates of admission and discharge,
as well as dates of birth and death for the individual. We agree
with commenters who asserted that birth date is critical for certain
research, such as longitudinal studies where there is a need to
track individuals across time and for certain infant-related research.
Rather than adding complexity to the Rule by trying to carve out
an exception for these specific situations, and other justifiable
uses, we rely on the minimum necessary requirement to keep the Rule
simple while avoiding abuse. Birth date should only be disclosed
where the researcher and covered entity agree that it is needed
for the purpose of the research. Further, even though birth date
may be included with a limited data set, the Department clarifies,
as it did in the preamble to the proposed rulemaking, that the Privacy
Rule allows the age of an individual to be expressed in years or
in months, days, or hours as appropriate.
Moreover, the limited data set may include the five-digit zip code
or any other geographic subdivision, such as State, county, city,
precinct and their equivalent geocodes, except for street address.
We substitute for street address the term postal address information,
other than city, State and zip code in order to make clear that
individual elements of postal address such as street name by itself
are also direct identifiers. Commenters identified a variety of
needs for various geographical codes (county, city, neighborhood,
census tract, precinct) to support a range of essential research,
public health and health care operations activities. Some of the
examples provided included the need to analyze local geographic
variations in disease burdens or in the provision of health services,
conducting research looking at pathogens or patterns of health risks
which may need to compare areas within a single zip code, or studies
to examine data by county or neighborhood when looking for external
causes of disease, as would be the case for illnesses and diseases
such as bladder cancer that may have environmental links. The Department
agrees with these commenters that a variety of geographical designations
other than five- digit zip code are needed to permit useful and
significant studies and other research to go forward unimpeded.
So long as an appropriate data use agreement is in place, the Department
does not believe that there is any greater privacy risk in including
in the limited data set such geographic codes than in releasing
the five-digit zip code.
Finally, the implementation specifications adopted at Sec. 164.514(e)
require a data use agreement between the covered entity and the
recipient of the limited data set. The need for a data use agreement
and the core elements of such an agreement were widely supported
in the public comment.
In the NPRM, we asked whether additional conditions should be added
to the data use agreement. In response, a few commenters made specific
suggestions. These included prohibiting further disclosure of the
limited data set except as required by law, prohibiting further
disclosure without the written consent of the covered entity, requiring
that the recipient safeguard the information received in the limited
data set, prohibiting further disclosure unless the data has been
de- identified utilizing the statistical or safe harbor methods
of the Privacy Rule, and limiting use of the data to the purpose
for which it was received.
In response to these comments, in the final Rule we specify that
the covered entity must enter into a data use agreement with the
intended recipient which establishes the permitted uses and disclosures
of such information by the recipient, consistent with the purposes
of research, public health, or health care operations, limits who
can use or receive the data, and requires the recipient to agree
not to re- identify the data or contact the individuals. In addition,
the data use agreement must contain adequate assurances that the
recipient use appropriate safeguards to prevent use or disclosure
of the limited data set other than as permitted by the Rule and
the data use agreement, or as required by law. These adequate assurances
are similar to the existing requirements for business associate
agreements.
Since the data use agreement already requires the recipient to
limit who can use or receive the data, and to prevent uses and disclosures
beyond those stated in the agreement, and since we could not anticipate
all the possible scenarios under which a limited data set with a
data use agreement would be created, the Department concluded that
adding any of the other suggested restrictions would bring only
marginal additional protection while potentially impeding some of
the purposes intended for the limited data set. The Department believes
the provisions of the data use agreement provide a firm foundation
for protection of the information in the limited data set, but encourages
and expects covered entities and data recipients to further strengthen
their agreements to conform to current practices.
We do not specify the form of the data use agreement. Thus, private
parties might choose to enter into a formal contract, while two
government agencies might use a memorandum of understanding to specify
the terms of the agreement. In the case of a covered entity that
wants to create and use a limited data set for its own research
purposes, the requirements of the data use agreement could be met
by having affected workforce members sign an agreement with the
covered entity, comparable to confidentiality agreements that employees
handling sensitive information frequently sign.
A few commenters questioned the enforceability of the data use
agreements. The Department clarifies that, if the recipient breaches
a data use agreement, HHS cannot take enforcement action directly
against that recipient unless the recipient is a covered entity.
Where the recipient is a covered entity, the final modifications
provide that such covered entity is in noncompliance with the Rule
if it violates a data use agreement. See Sec. 164.514(e)(4)(iii)(B).
Additionally, the Department clarifies that the disclosing covered
entity is not liable for breaches of the data use agreement by the
recipient of the limited data set. However, similar to business
associate agreements, if a covered entity knows of a pattern of
activity or practice of the data recipient that constitutes a material
breach or violation of the data recipient's obligation under the
data use agreement, then it must take reasonable steps to cure the
breach or end the violation, as applicable, and, if unsuccessful,
discontinue disclosure of protected health information to the recipient
and report the problem to the Secretary. And the recipient is required
to report to the covered entity any improper uses or disclosures
of limited data set information of which it becomes aware. We also
clarify that the data use agreement requirements apply to disclosures
of the limited data set to agents and subcontractors of the original
limited data set recipient.
In sum, we have created the limited data set option because we
believe that this mechanism provides a way to allow important research,
public health and health care operations activities to continue
in a manner consistent with the privacy protections of the Rule.
We agree with those commenters who stated that the limited data
set is not de- identified information, as retention of geographical
and date identifiers measurably increases the risk of identification
of the individual through matching of data with other public (or
private) data sets. However, we believe that the limitations on
the specific uses of the limited data set, coupled with the requirements
of the data use agreement, will provide sufficient protections for
privacy and confidentiality of the data. The December 2000 Privacy
Rule preamble on the statistical method for de-identification discussed
the data use agreement as one of the techniques identified that
can be used to reduce the risk of disclosure. A number of Federal
agencies that distribute data sets for research or other uses routinely
employ data use agreements successfully to protect and otherwise
restrict further use of the information.
We note that, while disclosures of protected health information
for certain public health purposes is already allowed under Sec.
164.512(b), the limited data set provision may permit disclosures
for some public health activities not allowed under that section.
These might include disease registries maintained by private organizations
or universities or other types of studies undertaken by the private
sector or non-profit organizations for public health purposes.
In response to comments, the Department clarifies that, when a
covered entity discloses protected health information in a limited
data set to a researcher who has entered into an appropriate data
use agreement, the covered entity does not also need to have documentation
from an IRB or a Privacy Board that individual authorization has
been waived for the purposes of the research. However, the covered
entity may not disclose any of the direct identifiers listed in
Sec. 164.514(e) without either the individual's authorization or
documentation of an IRB or Privacy Board waiver of that authorization.
The Department further clarifies that there are other requirements
in the Privacy Rule that apply to disclosure of a limited data set,
just as they do to other disclosures. For example, any use, disclosure,
or request for a limited data set must also adhere to the minimum
necessary requirements of the Rule. The covered entity could accomplish
this by, for example, requiring the data requestor, in the data
use agreement, to specify not only the purposes of the limited data
set, but also the particular data elements, or categories of data
elements, requested. The covered entity may reasonably rely on a
requested disclosure as the minimum necessary, consistent with the
provisions of Sec. 164.514(d)(3)(iii). As an example of the use
of the minimum necessary standard, a covered entity who believes
that another covered entity's request to include date of birth in
the limited data set is not warranted is free to negotiate with
the recipient about that requirement. If the entity requesting a
limited data set including date of birth is not one on whose request
a covered entity may reasonably rely under Sec. 164.514(d)(3)(iii),
and the covered entity believes inclusion of date of birth is not
warranted, the covered entity must either negotiate a reasonably
necessary limited data set or not make a disclosure.
The Department amends Sec. 164.514(e)(3)(ii) to make clear that
a covered entity may engage a business associate to create a limited
data set, in the same way it can use a business associate to create
de- identified data. As with de-identified data, a business associate
relationship arises even if the limited data set is not being created
for the covered entity's own use. For instance, if a researcher
needs county data, but the covered entity's data contains only the
postal address of the individual, a business associate may be used
to convert the covered entity's geographical information into that
needed by the researcher. The covered entity may hire the intended
recipient of the limited data set as a business associate for this
purpose. That is, the covered entity may provide protected health
information, including direct identifiers, to a business associate
who is also the intended data recipient, to create a limited data
set of the information responsive to the business associate's request.
Finally, the Department amends Sec. 164.528 to make clear that
the covered entity does not need to include disclosures of protected
health information in limited data sets in any accounting of disclosures
provided to the individual. Although the Department does not consider
the limited data set to constitute de-identified information, all
direct identifiers are removed from the limited data set and the
recipient of the data agrees not to identify or contact the individual.
The burden of accounting for these disclosures in these circumstances
is not warranted, given that the data may not be used in any way
to gain knowledge about a specific individual or to take action
in relation to that individual.
Response to Other Public Comments
Comment: A small number of commenters argued that the development
of computer-based solutions to support the statistical method of
de- identification is advancing rapidly and can support, in some
cases better than the limited data set, many of the needs for research,
public health and health care operations. They also asserted that
authorization of the limited data set approach will undermine incentives
to further develop statistical techniques that will be more protective
of privacy than the limited data set. They proposed imposing a sunset
clause on the limited data set provision in order to promote use
of de-identification tools.
Response: We agree that progress is being made in the development
of electronic tools to de-identify protected health information.
However, the information presented by commenters did not convince
us that current techniques meet all the needs identified or are
easy enough to use that they can have the broad application needed
to support key research, public health and health care operations
needs. Where de-identification can provide better outcomes than
a limited data set, purveyors of such de-identification tools will
have to demonstrate to covered entities the applicability and ease
of use of their products. We do not believe a sunset provision on
the limited data set authority is appropriate. Rather, as part of
its ongoing review of the Privacy Rule in general, and the de-identification
provisions in particular, the Office for Civil Rights will periodically
assess the need for these provisions.
Comment: Some commenters said that if HHS clearly defines
direct identifiers and facially identifiable information, there
is no need for a data use agreement.
Response: We disagree. As previously noted, the resulting
limited data set is not de-identified; it still contains individually
identifiable health information. As a means to assure continued
protection of the information once it leaves the control of the
covered entity, we believe a data use agreement is essential.
Comment: Several commenters wanted to be able to have a
single coordinated data use agreement between a State hospital association
and its member hospitals where data collection is coordinated through
the hospital association. In addition, there was concern that requiring
a data use agreement and a business associate agreement in this
circumstance would create an excessive and unnecessary burden.
Response: Nothing in the requirement for a data use agreement
prevents a State hospital association and its member hospitals from
being parties to a common data use agreement. Furthermore, that
data use agreement can be combined with a business associate agreement
into a single agreement that meets the requirements of both Privacy
Rule provisions.
Comment: A few commenters argued that a data use agreement
should not be required for data users getting a limited data set
and performing data analysis as part of the Medicaid rebate validation
process under which third-party data vendors, working for pharmaceutical
companies, collect prescription claims data from State agencies
and analyze the results for errors and discrepancies. They argued
that State agencies often find entering into such contracts difficult
and time consuming. Consequently, if States have to establish data
use or similar agreements, then the Medicaid rebate validation process
could be adversely impacted.
Response: We are not persuaded that there is a compelling
reason to exempt this category of limited data set use from the
requirements for a data use agreement, as compared to other important
uses. The data use agreement is key to ensuring the integrity of
the limited data set process and avoiding inappropriate further
uses and disclosures.
Comment: One commenter stated that allowing disclosure of
the limited data set without IRB or Privacy Board review would create
a loophole in the Privacy Rule, with Federally funded research continuing
to undergo IRB review while private research would not.
Response: The Rule continues to make no distinction between
disclosure of protected health information to Federally and privately
funded researchers. To obtain a limited data set from a covered
entity, both Federally-funded and privately-funded researchers must
enter into a data use agreement with the covered entity. One of
the reasons for establishing the limited data set provisions is
that the concept of "personally identifiable information"
that triggers IRB review of research that is subject to the Common
Rule does not coincide with the definition of "individually
identifiable health information" in the Privacy Rule. The Department
believes that the limited data set comes closer to the type of information
not requiring IRB approval under the Common Rule than does the de-identified
data set of the Privacy Rule. However, there is no uniform definition
of "personally identifiable information" under the Common
Rule; rather, as a matter of practice, it is currently set by each
individual IRB.
Comment: A few commenters suggested expanding the allowable
purposes for the limited data set. One commenter proposed including
payment as an allowable purpose, in order to facilitate comparison
of premiums charged to insured versus uninsured patients. A few
commenters wanted to allow disclosures to journalists if the individual's
name and social security number have been removed and if, in the
context of the record or file, the identity of the patient has not
been revealed. A few commenters suggested that there was no need
to restrict the purpose at all as long as there is a data use agreement.
A couple of commenters wanted to extend the purpose to include creation
or maintenance of research databases and repositories.
Response: If the comparison of premiums charged to different
classes of patients is being performed as a health care operation
of another entity, then a limited data set could be used for this
purpose. It seems unlikely that this activity would occur in relation
to a payment activity, so a change to include payment as a permissible
purpose is not warranted. A "payment" activity must relate
to payment for an individual and, thus, will need direct identifiers,
and uses and disclosures of protected health information for such
purposes is permitted under Sec. 164.506.
With respect to disclosures to journalists, while recognizing the
important role performed by newspapers and other media in reporting
on public health issues and the health care system, we disagree
that the purposes of the limited data set should be expanded to
include journalists. A key element of the limited data set is that
the recipient enter into a data use agreement that would limit access
to the limited data set, prohibit any attempt to identify or contact
any individual, and limit further use or disclosure of the limited
data set. These limitations are inherently at odds with journalists'
asserted need for access to patient information.
The suggestion to allow disclosure of a limited data set for any
purpose if there is a data use agreement would undermine the purpose
of the Privacy Rule to protect individually identifiable health
information from unauthorized disclosures and would conflict with
the requirement in the data use agreement to restrict further use
to research, public health, health care operations purposes. The
Department clarifies that research encompasses the establishment
of research databases and repositories. Therefore, no change to
the proposal is necessary.
Comment: One commenter said that HHS should not create a
list of excluded direct identifiers; rather it should enunciate
principles and leave it to researchers to apply the principles.
Response: The statistical method of de-identification is
based on scientific principles and methods and leaves the application
to the researcher and the covered entity. Unfortunately, many have
viewed this approach as too complex or imprecise for broad use.
To allow broad discretion in selection of variables in the creation
of a limited data set would trigger the same concerns as the statistical
method, because some measure of reasonableness would have to be
established. Commenters have consistently asked for precision so
that they would not have to worry as to whether they were in compliance
with the requirements of the Privacy Rule. The commenter's proposal
runs counter to this desire for precision.
Comment: One commenter wanted prescription numbers allowed
in a limited data set because they do not include any "facially
identifiable information."
Response: Prescription numbers are medical record numbers
in that they are used to track an individual's encounter with a
health care provider and are uniquely associated with that individual.
The fact that an individual receives a new prescription number for
each prescription, even if it is randomly generated, is analogous
to an individual receiving a separate medical record number for
different hospital visits. Thus, a prescription number is an excluded
direct identifier under the medical record number exclusion for
the limited data set (and also must be excluded in the creation
of de-identified data).
Comment: One commenter wanted clarification that a sponsor
of a multi-employer group health plan could utilize the limited
data set approach for the purpose of resolving claim appeals. That
commenter also suggested that if the only information that a plan
sponsor received was the limited data set, the group health plan
should be able to give that information to the plan sponsor without
amending plan documents. In lieu of the limited data set, this commenter
wanted clarification that redacted information, as delineated in
their comment, is a reasonable way to meet the minimum necessary
standard if the plan sponsor has certified that the plan documents
have been amended pursuant to the requirements of the Privacy Rule.
Response: Uses and disclosures of a limited data set is
authorized only for public health, research, and health
care operations purposes. A claims appeal is more likely
to be a payment function, rather than a health care
operation. It is also likely to require use of protected
health information that includes direct identifiers.
The Department disagrees with the commenter's suggestions
that the Rule should allow group health plans to disclose
a limited data set to a plan sponsor without amending
the plan documents to describe such disclosures. Limited
data sets are not de-identified information, and thus
warrant this degree of protection. Therefore, only summary
health information and the enrollment status of the
individual can be disclosed by the group health plan
to the plan sponsor without amending the plan documents.
The Privacy Rule does not specify what particular data
elements constitute the minimum necessary for any particular
purpose.
|