| 2018 | |
|---|---|
The hardest thing to understand about other people's data is often selection criteria that are hard to express. | 1.61% |
In my field incomplete capture and sample bias are inherent. Reserachers must use the proper design and estimator to minimize these biases. They must report capture/detection probabilities and how thesse were estimated along with the other metadata. | 1.61% |
ecology data is highly variable and small differences in collection have huge impacts on data meaning. it's a difficult thing to standardize! | 1.61% |
License assignment were easy, clear, and respected | 1.61% |
all data collection and processing by others was done with open source code that was distributed with the data. | 1.61% |
Associated cleaning/organization scripts were available | 1.61% |
Clear information on how the data to be cited is available! | 1.61% |
Conditions/licenses for reuse were explicit and there was documentation that data were collected in accordance with ethical standards. | 1.61% |
If data was integrated/federated across different domains where appropriate | 1.61% |
Most of the data I have to manage are really very dirty and a mess. No metadata standard can do anything against that, and I wish I had a data dictionnary describing at least unit of measurement and types of the variables that are created in datasets. Most of the metadata standards do not even think to that issue. | 1.61% |
if I can vizualize the data and its documentation | 1.61% |
I knew where it came from and for what purpose it was collected. | 1.61% |
there is sufficient identifiers methods | 1.61% |
If there are questions about the data (for example, that did not seem important at the time), I can ask the originating lab directly about their methods. | 1.61% |
I did not reuse data, so I would rather have answered "N/A" when available | 1.61% |
I use data as they are provided on the microscopic slides | 1.61% |
For some fields (i.e. experimental chemistry) the "data sets" are rather small, and none of the questions are really applicable. To elaborate: it takes *a lot* of effort to obtain what amounts to a single data point, and that single result is extremely meaningful. | 1.61% |
I am not sure | 1.61% |
The paper the data were published in also published their data processing tools (i.e. R or python scripts). | 1.61% |
a full record on how other use the data. | 1.61% |
Publication of methods and results is far more important than unrated arbitrary metadata, however publication is always a problem | 1.61% |
data need to be published first | 1.61% |
once data is published, it should be available to anyone with appropriate citation. It is the journal's responsibility to ensure all aspects needed for quality control are provided in the manuscript. | 1.61% |
I'm not convinced that this survey and I have the same idea of what constitutes 'data'. We may be talking about different types of information. Several options here aren't relevant to my work. | 1.61% |
there was assurance that data are reproducible | 1.61% |
Data with link to primary publication | 1.61% |
Funding should partially reflect the amount of data a researcher 'creates' since the creation of data often takes more resources than 'manipulating' the data | 1.61% |
if it's available on the web, in a format usable by my code, I'll use it | 1.61% |
Standards were followed as set by the disciplinary community (in our case, UNAVCO NSF Facility, or IGS) | 1.61% |
I know the other researchers personally and am familiar with their methods. | 1.61% |
Quantitative uncertainty analysis was provided, as uncertainty intervals within given probability level (eg 95%) | 1.61% |
I would mostly reuse data already published and peer reviewed, except perhaps if I'm familiar with the material | 1.61% |
In my opinion, only data that have been peer reviewed in high quality journal should be re-used. Anything else could just be trash and slow down your research or lead you to wrong conclusions. | 1.61% |
Stop hiding data in figures where data points cannot, sometimes even in principle, be converted to a number. This applies to everyone. | 1.61% |
This is a vague questionaire. I am only discussing data that has been published in peer-reviewed literature. That's not clear in the questions. | 1.61% |
... the data source was a trusted, accomplished, and published research group | 1.61% |
What is metadata?? | 1.61% |
these questions differ a lot for each dataset. Can't answer in general. | 1.61% |
Too much information on data quality, like chain of custody, is too burdensome to review. | 1.61% |
Cruise reports - description of the oceanographic cruise that collected the data, station lists, purpose, etc. | 1.61% |
Contact is established between the user and original collector of the data | 1.61% |
Sufficient record of how they were obtained is essential, composites and data processing are often not documented | 1.61% |
raw data | 1.61% |
The data are readily available in industry-standard formats | 1.61% |
The data were accompanied by a statement regarding source/funding/conflict-of-interest, and a statement regarding acceptable terms of reuse (e.g. commercial vs. noncommercial) | 1.61% |
I re-use other people's data constantly and rate the quality of information provided | 1.61% |
There are too many data-sets with too many formats and a gigantic number of tools which is sometimes extremely difficult to integrate and use/re-use. | 1.61% |
...if computational codes were available for derived data. | 1.61% |
when the full source of the generation process were provided with the data or the processing were explicitly stated (as is done by the cdo utilities). | 1.61% |
To the best of my knowledge there is no established metadata standard and workflow in my discipline; this is a general problem of this survey that it implicitly assumes that such things are already established or can be established in a meaningful/useful orm. I have severe doubts regarding the latter. | 1.61% |
The data archive was actively curated and someone was keeping an eye on continually vetting the data quality. | 1.61% |
Code demonstrating how data was analyzed in original citations was provided with the data | 1.61% |
Every data set should provide details about the uncertainties in the data set. | 1.61% |
The collection method was standardized and there is a way to verify collection methods or results. | 1.61% |
Data come with detailed manipulation records | 1.61% |
Easier to access and download. | 1.61% |
No experience using data from others | 1.61% |
Researchers were more forthcoming about mistakes | 1.61% |
I think this is highly case specific. In atmospheric biogeosciences data collection is very much location based and developing standards on how data is exactly collected (instrument maintenance etc) is incredibly complex. | 1.61% |
Proper coordinate system standards were used, and the data were geolocated (e.g. geodetically controlled) at known levels of accuracy. | 1.61% |
I was aware of others' experience in reusing the particular dataset. | 1.61% |
Had a proper license (e.g. a legal document saying that the data can be use, who has the copyright, etc) Assuming it is public domain is not okay. It has to be written | 1.61% |
Enough metadata was provided to fully understand the data | 0% |
اندازه نمونه | 2,184 |
حجم نمونه محاسبه شده توسط VoxDash | 62 |
نمونهگیری احتمالی | |
حاشیه خطا | ±3% |
حاشیه خطای قابل مقایسه | |
* محاسبه شده توسط VoxDash | |
تأمینکننده داده | |
پاسخدهنده | |
سن | 18 |
قاب نمونهبرداری | نمونهگیری غیر احتمالی |
انواع داده | United States of America |
پوششها | دیگر |