Datasets

About Datasets

Datasets may be submitted to Libra Data (U.Va.’s local instance of Dataverse http://dataverse.lib.virginia.edu) in order to preserve them and make them accessible worldwide.  Because Libra Data is intended as a repository for completed work in electronic format, analogous to the physical library as a repository of completed work in print format, datasets should be complete in the form in which the researcher would want them to be permanently preserved.  Libra Data is not meant to provide storage for works in progress.

Management of research data is a complex process, especially due to the recent mandates of federal funding agencies.  Extensive information on managing data can be found on the website of the Data Management Consulting Group of the University Libraries.  Please consult the Data Management Consulting Group pages for information on university policies, recommended procedures, and information about the DMPTool  (http://dmptool.org) application which may be used to generate data management plans that fit the requirements of NSF and other agencies.  You may also contact Data Management Consulting Group directly for assistance.

Sensitive and personally identifiable data is highly regulated under federal, state and University policy. Prior to supplying any data for public archiving and distribution, you must remove any confidential or sensitive information, student education records protected under FERPA,  and all information that personally identifies any individual or that contains any information classified as highly sensitive under state or federal law, or UVA policy.

For more on University policy, see:

You should review the Libra Dataset Deposit License now to gain a better understanding of the types of data that can appropriately be deposited for public access through Libra.  You will also be asked to confirm the following points before data deposit:

  • That you have read the deposit license and affirm that you have the legal right and authorization to make this data publicly available online for world-wide unrestricted access through Libra Data.
  • In preparing the data for public archiving and distribution, you have removed any confidential or sensitive information, student education records protected under FERPA, and all information that personally identifies any individual or that contains any information classified as highly sensitive under state or federal law, or UVA policy.
  • If the submission is based upon work that has been sponsored or supported by an agency or organization other than UVA such as the National Institutes of Health, the National Science Foundation, or a private sponsor or funder, you represent that you have fulfilled any right of review, confidentiality, or other obligations required by that contract or agreement.
  • You represent that you have made a reasonable effort to ensure that the data contained in your submission is accurate.
  • You represent that you have appropriately acknowledged other researchers whose work contributed to the data.

Who may submit datasets to Libra Data?

Any employee (faculty/staff) or student of the University may submit datasets to Libra Data.  Please consider that UVA claims ownership rights to intellectual property, including data, generated with significant University resources. Therefore a researcher may not be able to claim sole ownership of his/her data and may need to consult with the University before placing the dataset in Libra.  Please see the memo on Data Rights and Responsibilities Guidance for more information, or consult with the Office of the Vice President for Research for guidance as to whether you may make your data available through Libra.

Please consider whether other researchers or colleagues have rights to manage the release of this data, for example others involved with the grant, research activity, or laboratory, whether at UVA or at another institution. If the answer to this question is yes, you must obtain their permission to deposit the data in Libra.

What types of content may be submitted as datasets?

Many types of data are appropriate for Libra Data deposit.  Examples of appropriate data for deposit would include:

  • data already made publicly available through another repository
  • data required by a funding agency to be made publicly available (which does not include sensitive or confidential information)
  • scholarly data which does not include sensitive or confidential information

In all cases depositors should be careful to ensure that the content they submit contains no confidential or sensitive information.  As noted above, confidential or sensitive information includes all information that personally identifies any individual or that contains any information classified as highly sensitive under state or federal law, or UVA policy.  For these purposes, highly sensitive data currently includes personal information that can lead to identity theft if exposed and health information that reveals an individual’s health condition and/or history of health services use:

1.  Personal information that, if exposed, can lead to identity theft. “Personal information” means the first name or first initial and last name in combination with and linked to any one or more of the following data elements about the individual:

  • Social security number
  • Driver’s license number or state identification card number issued in lieu of a driver’s license number
  • Passport number
  • Financial account number, or credit card or debit card number

2.  Personally Identifiable Information (PII) is information that, if exposed, can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context:

  • Name, such as full name, maiden name, mother’s maiden name, or alias;
  • Personal identification number, such as social security number (SSN), passport number, driver’s license number, taxpayer identification number, patient identification number, and financial account or credit card number;
  • Address information, such as street address or email address;
  • Asset information, such as Internet Protocol (IP) or Media Access Control (MAC) address or other host-specific persistent static identifier that consistently links to a particular person or small, well-defined group of people;
  • Telephone numbers, including mobile, business, and personal numbers;
  • Personal characteristics, including photographic image (especially of face or other distinguishing characteristic), x-rays, fingerprints, or other biometric image or template data (e.g., retina scan, voice signature, facial geometry);
  • Information identifying personally owned property, such as vehicle registration number or title number and related information;
  • Information about an individual that is linked or linkable to one of the above (e.g., date of birth, place of birth, race, religion, weight, activities, geographical indicators, employment information, medical information, education information, financial information);
  • Other FERPA-protected data not otherwise covered specifically in this list.

3.  Health information that, if exposed, can reveal an individual’s health condition and/or history of health services use.  “Health information,” also known as “protected health information (PHI),” includes health records combined in any way with one or more of the following data elements about the individual:

  • Names
  • All geographic subdivisions smaller than a State, including street address, city, county, precinct, zip code, and their equivalent geocodes, except for the initial three digits of a zip code if, according to the current publicly available data from the Bureau of the Census the geographic unit formed by combining all zip codes with the same three initial digits contains more  than 20,000 people, and the initial three digits of a zip code for all such geographic units containing 20,000 or fewer people is changed to 000
  • All elements of dates (except year) for dates directly related to an individual, including birth date, admission date, discharge date, date of death; and all ages over 89 and all elements of dates (including year) indicative of such age, except that such ages and elements may be aggregated into a single category of age 90 or older
  • Telephone numbers
  • Fax numbers
  • Electronic mail addresses
  • Social security numbers
  • Medical record numbers
  • Health plan beneficiary numbers
  • Account numbers
  • Certificate/license numbers
  • Vehicle identifiers and serial numbers, including license plate numbers
  • Device identifiers and serial numbers
  • Web Universal Resource Locators (URLs)
  • Internet Protocol (IP) address numbers
  • Biometric identifiers, including finger and voice prints
  • Full face photographic images and any comparable images
  • Any other unique identifying number, characteristic, or code

For more information see:

Are there restrictions on file type and size?

Any file format will be accepted and the limit on individual self-deposited files is 2GB. If you have larger files sizes to upload, contact us. While all file formats are accepted, please keep in mind that some file formats are more likely to remain readable in the future– for example, plain text files, or files in non-proprietary formats which are commonly used and for which conversion utilities are more likely to be available. See the Data Management Consulting Group website on File Formats and Data Types for more information.

How will other users understand my data?

In order for other users to understand your data, it is advisable that you to submit a ReadMe file along with your data. The ReadMe file should be a plain text or PDF file and include the following types of information, in as much detail as possible:

  • Abstract which includes a brief description of the study which generated the dataset, including methodology.
  • Information on the contents of specific files if more than one is included in the dataset.
  • Variable names which are unambiguous and consistent, especially if you are submitting multiple data files from the same study.
  • Row and column identifiers which are clearly identified and defined.
  • Explanation of codes and classification schemes used, especially an interpretation of any values that are not obvious.
  • Algorithms used to transform data.
  • File format and software (including version) used.
  • Terms of use which require end users to cite and acknowledge the data creator (you).
  • Citation reference if you have a particular one you wish others to use to acknowledge your work.
  • Citation and acknowledgement information if your data was generated by co-researchers, co-researchers at other labs, or co-researchers at other institutions.
  • Citation information if you re-purposed a data file from someone else. You should also include a link to their original data file.

A template for creating the ReadMe file is available.  Additionally, we encourage including a link to publications that relate to the data being deposited, which will provide users with context and analysis for the data beyond the technical explanation of how the data was collected and how it is organized.

Why do I need to agree to the deposit license, and what does it say?

While individual facts and observations are not protected by copyright and are free for all to copy and reuse, datasets include not only observed facts but also expressive content that could qualify for copyright protection. Accordingly, UVA needs the author(s)’ permission to store, display, and distribute these aspects of the datasets, and members of the public need permission to download and reuse them. And, because some aspects of a dataset can raise concerns about privacy, confidentiality, and other important legal interests, we need you to represent that your data is free from these potential hazards.

Specifically, the deposit license requires you to affirm that:

  • You have the legal right and authorization to make this data publicly available online worldwide through Libra.
  • You have removed any confidential or sensitive information, including all information that personally identifies any individual or that contains any information that should not be made public under state or federal law, or UVA policy.
  • You have fulfilled any right of review, confidentiality, or other obligations imposed under any sponsored research agreement.
  • You have made a reasonable effort to ensure that the data contained in your submission is accurate.
  • You have appropriately acknowledged other researchers whose work contributed to the data.

The Deposit License confirms that you wish to make the material available through Libra Data to the public, to allow certain educational, non-commercial, public uses of the material under a CC0 license (by default) or your own custom license, and to allow for preservation by the University. It is a non-exclusive agreement, meaning you retain the rights to use your data however you like. However, if you use a CC0 license, you are surrendering the relatively thin rights you may have had to limit others’ use of copyrightable aspects of the dataset.

See the full text of the Deposit License for more information.

Who can access Libra Data to search for and download items?  What can users of Libra Data do with the datasets I have submitted?

Libra Data is an open access repository, meaning that anyone can search, view and download content. Users are required to agree to the Terms of Use stated on the Libra Data site before accessing content. Those terms make clear that use of each dataset is governed by a license chosen by the depositor. The default license, CC0 (public domain dedication), permits unlimited reuse by anyone who accesses the repository. This is consistent with the purpose of open data repositories, namely, to encourage the widest possible reuse of data. You may choose to use a custom license to limit reuse. This will make your data less useful to researchers, but may be required by some funders or institutions.

May I delete or change a dataset that I have added to Libra Data?

Because the repository is meant for scholarly work that is as close as possible to its final or published form, items cannot be deleted once they are deposited in Libra.  Scholars should only deposit in Libra Data the version to which they intend to provide permanent open access.

If the Library determines or is made aware that a dataset has been deposited which contains non-public sensitive data or information, it retains the option of removing public access to the data at its discretion.  If you, the depositor, become aware of a rights or privacy problem with your data deposit, we ask that you let us know immediately and the Library will take prompt appropriate action.

Are items in Libra Data guaranteed to remain available in perpetuity?

The Library and ITS are committed to the durability and sustainability of scholarship deposited in Libra.  Libra Data uses standard data management practices, including security and backup procedures, to provide a reasonable assurance that files will remain retrievable over time.  In addition, UVA Library is a founder and  participant in the Academic Preservation Trust consortium, which aims to ensure preservation of digital library content including Libra Data files.  However since permanent access is not a guarantee with any technology, we urge scholars to keep personal copies of their files.

The university reserves the right to remove content that is deposited out of compliance with the standards terms set forth in the Data Deposit License, or content that is deemed inappropriate for public viewing.

How can I find more information on using and depositing in Libra Data?