2. Getting Started: Data & Documentation

Key Questions and Concepts


This module will provide an overview of claims data, discuss why claims are important for hotspotting, and walk you through the process for obtaining the data. In particular, we will cover planning questions to help you get started with the process, the legal and research framework for hotspotting, techniques for ensuring your data is secure and sufficiently protected, specific claims variables that are important for hotspotting, and lastly, we will begin to introduce the software employed throughout the hotspotting process.

What are Claims Data?

“Claims data” refers to the administrative data that hospitals collect to document their requests for reimbursement for the services they perform on behalf of their patients. Claims data are generated for every encounter with a patient, from an Emergency Department visit to a hospital admission, and include a wealth of information about the patient and what took place at the hospital interaction. Because unnecessary hospitalizations are one of the key downstream effects of a fragmented and expensive healthcare delivery system, claims data provide a great starting point for understanding the healthcare landscape of your community. Claims data are not the only imaginable starting point for a hotspotting intervention, but they do have several beneficial attributes that the should be at the foundation of any sustainable, data driven program.claims dataBecause claims are first and foremost an operational data set, generated by the hospital as part of the reimbursement process, they tend to follow a relatively consistent format across institutions and can be relied upon as a stable source of continuous, standardized data.  Claims data also links professional description of patients and their medical conditions to the large amounts of money associated with hospital-based care, making it by nature a high-stakes data set for institutions, patients, and practitioners. As a result, claims data are well suited to hold the attention of media, funders, and other institutional partners. Finally claims data provides at least some narrow but significant slices of insight into where patients are, where they live, and how their condition changes over time. Viewed together, this ensemble of what, where and when yield and all too rare third party look at patient behavior in a way that more common self-reported or survey data cannot.

Claims: Just One Piece of a Complicated Puzzle

Of course claims are not perfect. They are just one facet of a very messy world. It is useful to keep in mind that the claims data picture has critical holes. It represents just one kind of institution with which patients interact. Beyond the obvious role of on the ground experience, claims can and should be supplemented by other data sources, as fragmentation of care demands an integrative solution on many levels.DataPuzzleEven then, data from fundamentally reactive and specialized institutions don’t necessarily lend themselves to supporting integrative solutions. They only capture information about the individuals who enter their orbit, not those who are unable or unwilling, or not in need at a given time, no matter how much those unobserved people may resemble a particular hospital’s patients in other unknown respects. Each professional domain adds to the picture, but not seamlessly or without distortion. It brings its historical preoccupations and its own lense. Medical data can medicalize problems just as justice data can criminalize them. While the limits of these perspectives are hard enough to see, much less to overcome without considerable effort, it is reasonable to expect that some of the biases will cancel out or at least become conscious when disparate data sources are integrated. At the same time, tangible synergies clearly accrue to data integration. Just as linking data between multiple hospitals in Camden reveals the true scope of dis-integration of patient care and magnitude of overutilization in a way that data from a single hospital never could, integration with other datasets opens a much broader panorama on the determinants of health and opportunities for better outcomes at lower cost. With this in mind, even at the beginning, it is best to plan for the possibility of data integration across silos as a goal of the project and a reminder of its limits. In practice, this can lead to the any number of parallel strategies. Cultivate relationships with data generating entities in key institutions affecting your patient populations (from para-medicine and allied services to the critical spheres of housing, employment, education, policing, and government benefits) in the interest of future collaboration and pilot projects. Use open data standards (or at least academically documented ones) when re-encoding or categorizing your data. Consider variable names and definitions that are compatible and don’t cause unnecessary confusion with those used in other professions (see Module 4 on measurement for examples). In these and other ways you can ease future integration, reduce future costs and make your work part of an ongoing broader research conversation. 

Hotspotting Planning

Because data sharing relationships take time to build, and new data requests need to clear institutional (and sometime technical) hurdles, planning is an essential component of the hotspotting process. In this module and module 3, we will provide more detailed guides to subcomponents of the process and then step through the actual data cleaning and manipulation process. Your browser does not support scalable vector graphics.As you embark on the hotspotting journey, you should begin by assessing your community’s readiness. This self-reflective process will help to ensure that you have all the necessary components to tackle the challenges of hotspotting head-on.

1.) The Goal

The lodestone of any plan is its goal, separate from any exploratory work you can undertake with the data, the project will benefit from clear goals – even if those goals prove through experience to be the wrong ones.

The goal = Why you want to Hotspot =

1.) a clear problem statement

2.) a viable concept of intervention

Wrong goals are hard to replace when they are unclear. Likewise, the intervention to be tested should be clearly communicated. An intervention that is all things to all patients or stakeholders cannot be tested and improved, and is difficult to support. What does your community and population suffer from most? What aspect of that problem are you best equipped to address?

2.) What, and Who, Do You Have?

What expertise will help make your approach to your goal successful? Any intervention will be difficult and it makes sense to tailor your approach to your strengths, or to supplement your weaknesses. Hotspotting is all about maximizing leverage through appropriate focus. Taking advantage of the human and institutional comparative advantages already in your community (without being entirely limited by them) is another component of that leverage. Use and build on the particular expertise of your staff or plan for where you need to bolster those skills to address the focused goal defined above.hotspotting teamAt the institutions within your community boundary, clinical champions not only lend their experience and training, they form an authoritative bridge between types of stakeholders. Administrative champions help insure ongoing institutional participation in data sharing, support integrated intervention, help navigate bureaucratic complications, and, eventually, drive institutional change from the inside.

3.) Scope

Wherever you are and in whatever service areas your healthcare institutions have, the resources available will vary. Geographic scope may need to conform to the limitations of the data you are able to obtain, but that should be a conscious limitation. Where choices can be made, they will directly affect the data requested and the scope of the hypothesis tested or the results you can aim for.

In Camden, we’ve benefitted from long-term residents with deep local knowledge who also work in the healthcare field, local non-profits doing street outreach, Philadelphia based experts just across the river working in policy areas like homelessness that were initially outside of our scope, but which have since become central to it.MobilityIf your population of interest crosses the chosen community boundary more often than not, then the patterns your data can show will be systematically limited or too skewed to be truly compelling. If altering markets is not itself a central task of the envisioned intervention, then operating in a region that crosses boundaries will detract from your efforts in a practical sense as well. Taking into consideration what boundaries would make your goal more feasible to accomplish can be a valid criteria for deciding your geographic scope– so long as it is not the only one.

Camden City is very different from Camden County. Data geographic extent matters.

In Camden, while the city has much in common with some sections of neighboring Philadelphia, it is quite distinct from its surrounding county, which becomes very clear at the aggregate level. Unfortunately most of the readily available statistics, particularly public health data, are available at the county level at best. While Camden has high poverty and a mostly African American and Hispanic population, the county does not. It has a mix of wealthy, mostly White and more ethnically mixed communities across the wealth and income spectrum. When developing benchmarks and baseline metrics and establishing goals, properly characterizing the target population and sub communities will be crucial. The second aspect of scope, inextricable from your defined geography, is the institutional landscape in which you work.Slide1Camden is home to three hospitals which belong to distinct regional health systems. CCHP has been fortunate that these three institutions recognize a special responsibility towards the city of Camden which encourages their cooperation. Nonetheless, the hospitals still do compete and fear a shrinking economic pie. They increasingly see their data as a proprietary advantage and are wary of revealing their negotiated reimbursement rates or data which could allow their competitors to reverse engineer those rates.Slide1As a result of these sensitivities, we have carefully cultivated our role as a neutral third party, a good faith honest broker and confidential steward of data. Data sharing requires building strong relationships with a full array of partners and is a long-term process leaning heavily on trust.

Building Support

Once you have understood your goal, team, and scope, it is time to put agreements in place and get the necessary parties on board. You should have a good idea about who already cares, but it is important to think about who is missing from the conversation and what formal frameworks will support your collaboration. In spite of the fact that hotspotting is a technique designed to integrate care and breakdown unproductive silos, full inclusion of all possible stakeholders is impossible. Not everyone wants, needs, or has the capacity to sustain “full” inclusion. Building an intervention, or even a data sharing system that relies on the same level of participation from all stakeholders almost guarantees inefficiency and underestimates the mediating and leveling role that an effective data intermediary can play. For other participants, the process data-driven collaboration can offer a variety of levels of involvement. Whatever their level of involvement, failing to include important players in some capacity at the beginning, particularly those whose deeper participation you may need later on or with whom relationships may take a long time to build can prove to be a barrier down the road. Maintain your focus, but think early on about who is missing from the table.

Making Your Case

CCHP’s successes thus far have benefited from a well-made and well-presented case for data sharing, for the need to get out ahead of health system change in a more effective, patient-centered, and data-driven way. With hindsight, we can analyze our efforts to develop and build a movement around what has come to be called hotspotting. We see a mix of good decisions and instructive errors, planning, effort, and happenstance. What hindsight can’t tell us is exactly how much of our success in making our case has been due to luck.

Until very recently the Camden Coalition has grown slowly and organically. Thanks to our successes and hard work, it is easy to view our progress as the result of intentional strategy and perceptive choices. While we have clearly enjoyed both, we are data driven enough to recognize the perils of confusing correlation with causation and of missing a survivorship bias in our theory of change.


landscapeAs the healthcare landscape rapidly changes in the United States, and payment reform strategies that incentivize value over volume take hold, there will be powerful new market drivers for change. For hospitals and other healthcare providers to ensure that they stay ahead of the curve, they will have to begin employing strategies such as hotspotting to help identify opportunities for better care at a lower cost. Understanding and being able to articulate how assembling claims data into a community data set helps drive the hotspotting process will prove vital in convincing stakeholders across your community to begin sharing data.push no catchOur current health care system, which resembles more of a disease treatment system than a prevention or health promotion system, has a tendency to only recognize individuals as they become acutely ill and in need of an expensive hospitalization. We scan, cut, and zap and then release them only to send them back into the community to face the same social and environmental challenges that caused them to be admitted.pushThe good news is that as the payment process shifts, and institutions begin to be held accountable for the long term well-being of their patients, the field of population health has been working to ensure that transitions in care are better coordinated and the hospital discharge, or “push”, results in a softer landing for the patient. In the Camden Coalition’s case, by going out into the community and employing multi-disciplinary teams of nurses, social workers, community health workers, and health coaches, we work together with our patients to empower them to better manage their care and navigate the healthcare and social service landscape.researchThere is also an emerging body of research to support the merits of care management and coordination. Rigorous, peer-reviewed research has demonstrated that providing nurse-based care management and coordination focused on the sickest, most complex patients, can result in better quality of care, lower mortality rates, and ultimately lower costs.

One of the best case studies for the merits of hotspotting exists in Doylestown, Pennsylvania, where Health Quality Partners, a community-based nurse-led intervention, provides ongoing case management for elderly patients with complex medical histories. Through a randomized study of over 1700 adults, Health Quality Partners was able to show a 25% reduction in the risk of death, a 33% reduction in hospitalizations, and a cost savings of 22% for the highest risk cohort of patients. These initial studies are just the start of the movement. By embracing hotspotting, you and your community will be helping to advance this emerging and exciting field.

Legal Framework

Hotspotting requires us to work with ethically sensitive and legally protected information. Throughout this module, we’ll explore the legal and research frameworks that govern how we work with patient data to hotspot. There are two main channels for beginning to hotspot, one geared towards research and the other geared towards business operations. Which path makes sense for you and your community well depend on your position within the community and how you intend to make use of the data.

If you are approaching hotspotting from a more academic lens, where you are interested in testing and measuring research hypotheses, you will likely want to pursue the research path. This requires engaging the Institutional Review Board at your, or a partner’s, institution and submitting a study for approval. IRBs exist to review and monitor research involving human subjects to ensure that appropriate steps are taken to protect the right and welfare of human subjects. In Module 2, we will help you navigate the IRB process, providing template studies that will help you begin the process on your end.


The IRB application process is fairly standard across institutions, and will involve detailing all of the aspects of your project, ranging from hypotheses and study objectives to having a plan for safeguarding the data. In addition to permitting the exchange of hospital claims information, an IRB agreement can also allow you to use the claims data as an outcome to evaluate the impact of your hotspotting intervention. If you are an external entity looking to analyze claims data generated by another institution, you’ll need to execute an agreement to permit data sharing with the hospital(s). Again, you will need to assess whether or not the data is for research purposes or for QA/PI, which will dictate what path you must take. If the purpose is research related, you will need to follow the aforementioned IRB steps, but will also be required to identify a Principal Investigator (generally a doctor or nurse) at the institution contributing data who is willing to lend his/her name to the project. You will also need to execute a Data Use Agreement (DUA) that will establish what data will be shared and the terms for how it can be used, secured, and shared externally.If you are approaching this work with a more clinical focus, you will want to pursue the business path for hotspotting that centers around Quality Assurance and Process Improvement. This entails entering into Business Associates Agreements and Collaborative Service Agreements with data providers to make explicit how and what data will be shared and what the specific activities that you will undertake with the data.BAAIn lieu of involving the Institutional Review Board, this process generally involves working with hospital legal departments and IT departments to execute the agreements. These agreements allow individual patient data to be shared without an individual consent to be signed. This module will also provide guidance on how to best navigate that process and will provide template agreements that will help get you started with the process.

Ultimately, however, you will want to ensure that you have both frameworks covered, so that as you operationalize hotspotting and begin working directly with patients, you have a vehicle for research that enables you to test and evaluate various hypotheses, publish your findings, and ultimately contribute to the field of generalizable knowledge.



The Health Insurance Portability and Accountability Act (HIPAA) protects the privacy of individually identifiable health information, establishing national standards for electronic protected health information (PHI) and penalties for privacy breaches. HIPAA establishes the groundwork for what is considered identifiable health information versus what is considered to be de-identified. There are two techniques that can be used to de-identify data. The first involves removing all variables that HIPAA deems identified (there are 17 HIPAA variables total [link]). The other method involves having a person with knowledge of and experience with generally accepted statistical and scientific principles and methods apply and document a method that ensures that the risk of the information being used to identify individuals is very small. Most hotspotting work will require you to work with identified health information, as it will be important to incorporate variables such as dates of service, dates of birth, and home addresses – all of which are considered to be PHI. As such, your program should be careful to comply with all HIPAA guidelines. The hospitals you work with can be a good resource for information about HIPAA.

Protecting Your Data

Hotspotting entails working with ethically sensitive, formerly private, and legally protected information. The information isn’t ours. It’s a treasure our patients and partner institutions loan to us. Data security is the price paid in return for the loan.
Throughout this module, we’ll explore the threats to data security that you will have to address and provide guidance on how to enact adequate safeguards so that the integrity of the data is maintained and your community can continue to hotspot without any future issues. The below video provides a good overview of what encryption is, why we need it, methods to begin encrypting your data, and advice on how to build an effective password strategy.


Hotspotting’s data-driven nature requires us to store patient  information electronically in forms that are, by design, easy to share.  We make copies of data several times of day. Often leaving copies of information behind without knowing it. Email servers store copies of the files that pass through them. Normal file deletion hides files rather than ensuring that they are gone. Data management software makes automatic back up copies and temporary copies. Devices are set to synchronize their contents when they connect. An operating system’s normal memory optimization techniques make copies files stored in memory for disk to a so called pagefile or swapfile, and on and on. In an era of massive online data collection and habitual data over-sharing, coupled with every-rising computing power and increasingly ubiquitous connectivity, even seemingly innocuous data fragments may be assembled to reveal data about individuals that they had no intent to share.
The threats we often fear can be misguided. Identity theft from the outside, malicious mischief or unlawful data use from within. It does occur but excessive focus on this kind of risk can lead us to overlook other more common threats: unintentional, human error, lost hardware, sending the wrong file. A good data security policy will protect your data against both kinds of intentional and unintentional threats.
DataSecurityFor our purposes, data security can be thought of as having 4 major components:

  1. Secure transfer (how you get and give files)
  2. Secure storage (how and where you keep you data)
  3. Secure deletion (how you dispose of files you no longer need)
  4. Strong passwords (how you verify authorization to access your data, and how you create and remember that authorization)

Technologies and, critically, procedures to manage all 4 elements are the minimum necessary components for a secure data ecosystem. A poor implementation (including one that is simply too hard to use or to use well) of any component can render even the strongest encryption algorithm useless or worse, actually encourage careless data management by providing a false sense of technological security. Secure Transfer Secure transfer, particularly electronic transfer like email, relies on the coordination of multiple computer systems and technical protocols. An easy to use, reliable solution will most likely depend on secure email encrypted connections to remote servers which are probably best managed through the IT infrastructure of your institution. Third party commercial secure file transfer protocol (SFTP) solutions, such as ShareFile,  do exist, and will enable you to send files securely when you do not have access to an encrypted email system or when files are too large to be sent via email. In the absence of secure electronic transfer, if properly encrypted and protected by well managed passwords, data can be securely exchanged on physical media like DVD’s, hard drives, or flash drives Secure Storage File encryption allows sensitive data to be stored safely on a decrypted and unlocked machine. The Camden Coalition currently uses and recommends the free and open source encryption product TrueCrypt for either or both full disk and file level encryption. It is one of the most feature rich encryption options available, whether free or paid.TrueCryptOpen source software is particularly important in the fields of security and cryptography where it is generally agreed that the ability for anyone to critique the code is the best insurance of well implemented encryption techniques. Particularly in light of recent revelations that several domestic and foreign security agencies have negotiated or attempted to negotiate with commercial software providers to implement so called back doors that would allow them to access presumed secure files, we recommend opting for opensource security options when possible. The thinking goes that where such backdoor weaknesses exist they will eventually be discovered by a wider audience. Though Truecrypt has been used and trusted by security experts for many years, it is currently undergoing an independent forensic audit to verify that it is indeed as secure as experts tend to think. The long time developers of the software have unfortunately just decided to stop maintaining it. Following the results of the audit it is very likely that some other group will take over maintenance. Until then, the last full functioning version of the software is 7.1a. The just released 7.2 only decrypts and cannot be used to secure your data. We have prepared detailed downloadable instructions for using Truecrypt and links where you can download version 7.1a. Whole disk encryption protects against physical loss or unauthorized access to drives that are off and password protected. Whole disk encryption renders your entire hard drive password protected and unreadable, but only when the computer or drive containing the data is off and not logged in. It protects lost or stolen hardware. Because patient data may be unintentionally stored anywhere on your machine this is the blanket level of encryption that you must have. If you have any reason to doubt the security of your computer network or are in a location where you might connect to a public wifi network it s a simple and worthwhile step to only decrypt and work on your sensitive data while “air gapped” You simply need to unplug from your wired connection and or turn off your wireless adapter while your data is decrypted. Secure Deletion If you no longer need a file of sensitive data, or if you receive, despite your best efforts data which is not properly encrypted, you need a way to dispose of those files. This is where secure deletion comes in. Normal deletion of a file merely tells the computer hide the file from its original location and act as if it exists in a trash directory. Even when supposedly deleted from the trash the file is still hidden in its original location. The computer merely indicates that that location is ok to use for other data and the data can still be recovered with readily available tools. Secure deletion fixes this weakness. Secure deletion bypasses the trashcan, forcing the data to be overwritten so that it cannot be recovered. For secure deletion on windows we recommend Eraser or the secure shredder tool included with the Symantec products. In Linux based systems including Macs Secure deletion is a menu option. On Macz it is called Secure Empty Trash. Once you’re using secure deletion, one more often overlooked tip: When moving a file across a network, to a shared drive, or between drives or partitions on your own computer, the operating system is really copying from the first location, pasting and deleting in the old location without using the trashcan. Instead you should do this explicitly, copy paste, and use using secure delete   An Effective Password Policy

Making The Request

Claims Data Screen Shot
Basic Variables to Request

The cumulating step in the process of getting started is actually requesting the data. This will involve activating the agreements that you have in place and making a clear and making a well-articulated request for the specific data that you will use. In this module, we’ll help walk you through what data exists and how to structure that request so that when you finally receive some data to work with, it fully meets your needs.requestOnce you have obtained the necessary approvals to receive claims data, you should make a formal request for the data. In that request, you should specify what filters you would like to have applied (e.g. visit date ranges, geographic filters) as well as what variables you would like to have included in the extract. We recommend requesting at least 2 to 3 years of retrospective data to begin your analysis, as you will likely want to compare trends across multiple years. If the hospital’s data system permits, you can ask for data for additional years (the CCHP claims database currently goes back 13 years), though more data has the potential to create additional work for you and your teams down the line. Because claims data follow a relatively standard format, there are common variables that you can expect a hospital’s IT department to be able to query. These variables are not intended to be a comprehensive listing of all of the variables that you should request. They are meant to serve as a starting point. With your data request, you should also ask for a data dictionary, which will include a list of all fields, data types, field size codes and definitions of those codes). Some variables, such as race and language, are often coded and will require a crosswalk to understand and standardize across institutions. The data dictionary will prove crucial in allowing you to fully understand the nuances of your data set. Once an extract has been prepared, you will need to transfer the data to your local system. There are multiple ways to handle that transmission. Some hospitals have IT systems and protocols in place that will dictate how they send you the file. This could involve a secure email attachment or a secure file transfer protocol (SFTP) process. An example of a low tech option is creating an encrypted file on a flash drive or DVD.