Title: | Datasets for Introduction to Statistical Data Analysis for the Life Sciences |
---|---|
Description: | Provides datasets for the book "Introduction to Statistical Data Analysis for the Life Sciences, Second edition" by Ekstrøm and Sørensen (2014). |
Authors: | Claus Thorn Ekstrøm <[email protected]> and Helle Sørensen <[email protected]> |
Maintainer: | Claus Ekstrom <[email protected]> |
License: | GPL-2 |
Version: | 3.0.1 |
Built: | 2024-12-26 04:57:19 UTC |
Source: | https://github.com/cran/isdals |
In order to relate the body fat percentage to age, researchers selected nine healthy adults and determined their body fat percentage.
data(agefat)
data(agefat)
A data frame with 9 observations on the following 2 variables.
age
age of the subject
fatpct
body fat percentage
Ib Skovgaard (2004).Basal Biostatistik 2, Samfundslitteratur.
data(agefat)
data(agefat)
Number of aids cases and deaths for a 19-year period.
data(aids)
data(aids)
A data frame with 19 observations on the following 3 variables.
year
a numeric vector
cases
a numeric vector
deaths
a numeric vector
data(aids)
data(aids)
Data on food preference for 59 alligators. It is of interest to examine if different sized alligators have different food preferences.
data(alligator)
data(alligator)
A data frame with 59 observations on the following 2 variables.
length
length of the alligator (in meters)
food
a factor with levels Fish
Invertebrates
Other
representing the food preference
Agresti, A. (2007). An Introduction to Categorical Data Analysis. Wiley
data(alligator) library(VGAM) model <- vglm(food ~ length, family=multinomial, data=alligator) summary(model)
data(alligator) library(VGAM) model <- vglm(food ~ length, family=multinomial, data=alligator) summary(model)
The amount of organic material in heifer dung was measured after eight weeks of decomposition. The data come from 36 heifers from six treatment groups. The treatments are different types of antibiotics. Only 34 observations are available.
data(antibio)
data(antibio)
A data frame with 34 observations on the following 2 variables.
type
a factor with the antibiotic treatments. Level: Alfacyp
Control
Enroflox
Fenbenda
Ivermect
Spiramyc
org
a numeric vector with the amount of organic matrial
C. Sommer and B. M. Bibby (2002). The influence of veterinary medicines on the decomposition of dung organic matter in soil. European Journal of Soil Biology", 38, 115-159.
data(antibio)
data(antibio)
When an antibiotic is injected into the bloodstream, a certain part of it will bind to serum protein. This binding reduces the medical effect. As part of a larger study, the binding rate was measured for 12 cows which were given one of three types of antibiotics: chloramphenicol, erythromycin, and tetracycline
data(binding)
data(binding)
A data frame with 12 observations on the following 2 variables.
antibiotic
antibiotic type. Factor with levels Chlor
Eryth
Tetra
binding
binding rate
G. Ziv and F. G. Sulman (1972). Binding of antibiotics to bovine and ovine serum. Antimicrobial Agents and Chemotherapy, 2, 206-213.
data(binding)
data(binding)
Data from a study that was undertaken to investigate how the sex of the baby and the age of the fetus influence birth weight during the last weeks of the pregnancy.
data(birthweight)
data(birthweight)
A data frame with 361 observations on the following 3 variables.
sex
a factor with levels male
female
age
a numeric vector
weight
a numeric vector
Anette Dobson (2001). An Introduction to Generalized Linear Models (2nd ed.) Chapman and Hall.
data(birthweight) ## maybe str(birthweight) ; plot(birthweight) ...
data(birthweight) ## maybe str(birthweight) ; plot(birthweight) ...
It is expensive and cumbersome to determine the body fat in humans as it involves immersion of the person in water. This dataset provides information on body fat, triceps skinfold thickness, thigh circumference, and mid-arm circumference for twenty healthy females aged 20 to 34. It is desirable if a model could provide reliable predictions of the amount of body fat, since the measurements needed for the predictor variables are easy to obtain.
data(bodyfat)
data(bodyfat)
A data frame with 20 observations on the following 4 variables.
Fat
body fat
Triceps
triceps skinfold measurement
Thigh
thigh circumference
Midarm
mid-arm circumference
J. Neter and M.H. Kutner and C.J. Nachtsheim and W. Wasserman (1996). Applied Linear Statistical Models. McGraw-Hill
data(bodyfat)
data(bodyfat)
Average butterfat content (percentages) for random samples of 20 cows (10 two year olds and 10 mature (greater than four years old)) from each of five breeds.
data(butterfat)
data(butterfat)
A data frame with 100 observations on the following 3 variables.
Butterfat
a numeric vector
Breed
a factor with levels Ayrshire
Canadian
Guernsey
Holstein-Fresian
Jersey
Age
a factor with levels 2year
Mature
Hand et al. (1993). A Handbook of Small Data Sets. Chapman and Hall
data(butterfat)
data(butterfat)
Cabbage yield for different treatment methods and different fields
data(cabbage)
data(cabbage)
A data frame with 16 observations on the following 3 variables.
method
a factor with levels A
C
K
N
yield
a numeric vector
field
a numeric vector
data(cabbage)
data(cabbage)
An experiment involved 21 cancer tumors. For each tumor the weight was registered as well as the emitted radioactivity obtained with a special medical technique (scintigraphic images). Three data points from large tumors were removed.
data(cancer2)
data(cancer2)
A data frame with 18 observations on the following 3 variables.
id
tumor id (numeric)
tumorwgt
tumor weight
radioact
emitted radioactivity (numeric)
Shin et al. (2005). Noninvasive imaging for monitoring of viable cencer cells using a dual-imaging reporter gene. The Journal of Nuclear Medicine, 45, 2109-2115.
data(cancer2)
data(cancer2)
As part of a larger cattle study, the effect of a particular type of feed on the concentration of a certain hormone was investigated. Nine cows were given the feed for a period, and the hormone concentration was measured initially and at the end of the period.
data(cattle)
data(cattle)
A data frame with 9 observations on the following 3 variables.
cow
cow id
initial
initial homorne concentration (before treatment)
final
final hormone concentration (after treatment)
data(cattle)
data(cattle)
Twenty chickens were fed with four different feed types - five chickens for each type - and the weight gain was registered for each chicken after a period.
data(chicken)
data(chicken)
A data frame with 20 observations on the following 2 variables.
feed
id of feed type. Numeric but it should be used a factor
gain
Weight gain (numeric)
Anonymous (1949). Query 70. Biometrics, 250–251.
data(chicken)
data(chicken)
An experiment with winter wheat was carried in order to investigate if the concentration of nitrogen in the soil can be predicted from the concentration of chlorophyll in the plants. The chlorophyll concentration in the leaves as well as the nitrogen concentration in the soil were measured for 18 plants.
data(chloro)
data(chloro)
A data frame with 18 observations on the following 2 variables.
chloro
chlorophyll concentration in leaves
nit
nitrogen concentration in soil
Experiment was carried out at the Royal Veterinary and Agricultural University in Denmark.
data(chloro)
data(chloro)
Two different cooling methods for pork meat were compared in an experiment with 18 pigs from two different groups: low or high pH content. After slaughter, each pig was split in two and one side was exposed to rapid cooling while the other was put through a cooling tunnel. After the experiment, the tenderness of the meat was measured.
data(cooling)
data(cooling)
A data frame with 18 observations on the following 4 variables.
pig
a numeric vector with the id of the pig
ph
pH concentration level. A factor with levels high
low
tunnel
Tenderness observed from tunnel cooling
rapid
Tenderness observed from rapid cooling
A. J. Moller and E. Kirkegaard and T. Vestergaard (1987). Tenderness of Pork Muscles as Influenced by Chilling Rate and Altered Carcass Suspension. Meat Science, 27, p. 275–286.
data(cooling) hist(cooling$tunnel[cooling$ph=="low"], main="", xlab="Tenderness (low pH)", col="lightgray", ylim=c(0,5), xlim=c(3,9)) hist(cooling$tunnel[cooling$ph=="high"], main="", xlab="Tenderness (high pH)", col="lightgray", ylim=c(0,5), xlim=c(3,9)) hist(cooling$tunnel[cooling$ph=="low"], freq=FALSE, main="", xlab="Tenderness (low pH)", col="lightgray", ylim=c(0,.5), xlim=c(3,9)) hist(cooling$tunnel[cooling$ph=="high"], freq=FALSE, main="", xlab="Tenderness (high pH)", col="lightgray", ylim=c(0,.5), xlim=c(3,9)) plot(cooling$tunnel, cooling$rapid, xlim=c(3,9), ylim=c(3,9), xlab="Tenderness (tunnel)", ylab="Tenderness (rapid)") boxplot(cooling$tunnel, cooling$rapid, names=c("Tunnel", "Rapid"), ylab="Tenderness score")
data(cooling) hist(cooling$tunnel[cooling$ph=="low"], main="", xlab="Tenderness (low pH)", col="lightgray", ylim=c(0,5), xlim=c(3,9)) hist(cooling$tunnel[cooling$ph=="high"], main="", xlab="Tenderness (high pH)", col="lightgray", ylim=c(0,5), xlim=c(3,9)) hist(cooling$tunnel[cooling$ph=="low"], freq=FALSE, main="", xlab="Tenderness (low pH)", col="lightgray", ylim=c(0,.5), xlim=c(3,9)) hist(cooling$tunnel[cooling$ph=="high"], freq=FALSE, main="", xlab="Tenderness (high pH)", col="lightgray", ylim=c(0,.5), xlim=c(3,9)) plot(cooling$tunnel, cooling$rapid, xlim=c(3,9), ylim=c(3,9), xlab="Tenderness (tunnel)", ylab="Tenderness (rapid)") boxplot(cooling$tunnel, cooling$rapid, names=c("Tunnel", "Rapid"), ylab="Tenderness score")
Two varieties of corn were randomly assigned to the 8 plots in a completely randomized design so that each variety was planted on 4 plots. Four amounts of fertilizer (5, 10, 15, and 20 units) were randomly assigned to the 4 plots in which variety A was planted. Likewise, the same four amounts of fertilizer were randomly assigned to the 4 plots in which variety B was planted. Yield in bushels per acre was recorded for each plot at the end of the experiment.
data(cornyield)
data(cornyield)
A data frame with 8 observations on the following 3 variables.
yield
a numeric vector
variety
a factor with levels A
B
fertilizer
a numeric vector
data(cornyield)
data(cornyield)
The length and weight of 361 crabs. The crabs were measured at three different days and they were raised in three different vat types.
data(crabs)
data(crabs)
A data frame with 361 observations on the following 5 variables.
day
id for day of measurement (a numeric vector)
date
date of measurement (a numeric vector)
kar
id of the vat type (a numeric vector
lgth
length of the crab in cm
wgt
weight of the crab in grams
Only crabs from day 1 (190692) are used in the isdals book.
Experiment carried out at the Royal Veterinary and Agricultural University of Copenhagen.
data(crabs)
data(crabs)
Cuckoos place their eggs in other birds' nests for hatching and rearing. Researchers investigated 154 cuckoo eggs and measured their size. The adoptive species is also registered (three types). It is believed that cuckoos choose the “adoptive parents” such that the cuckoo eggs are similar in size to the eggs of the adoptive species.
data(cuckoo)
data(cuckoo)
A data frame with 154 observations on the following 2 variables.
spec
adoptive species. Factor with levels redstart
whitethroat
wren
width
width of egg (unit: half millimeters)
O.H. Latter (1905). The egg of Cuculus Canorus: An attempt to ascertain from the dimensions of the cuckoo's egg if the species is tending to break up into sub-species, each exhibiting a preference for some one foster-parent. Biometrika, 4, 363-373.
data(cuckoo)
data(cuckoo)
Spread of a disease in cucumbers depends on climate and amount of fertilizer. The amount of infection on standardized plants was recorded after a number of days, and two plants were examined for each combination of climate and dose.
data(cucumber)
data(cucumber)
A data frame with 12 observations on the following 3 variables.
disease
a numeric vector
climate
a factor with levels A
(change to day temperature 3
hours before sunrise) and B
(normal change to day temperature)
dose
a numeric vector with dose of applied fertilizer
de Neergaard, E. et al (1993). Studies of Didymella bryoniae: the influence of nutrition and cultural practices on the occurrence of stem lesions and internal and external fruit rot on different cultivars of cucumber. Netherlands Journal of Plant Pathology. 99:335-343
data(cucumber)
data(cucumber)
Running times from 5 times 5 km relay race in Copenhagen 2006, held over four days. The sex distribution in the team classifies the teams into six groups. Total running time for a team (not each participant) is registered.
data(dhl)
data(dhl)
A data frame with 24 observations on the following 6 variables.
day
race day. A factor with levels Monday
Thursday
Tuesday
Wednesday
men
number of men on the team (numeric)
women
number of men on the team (numeric)
hours
hours of running (should be combined with minutes and seconds)
minutes
minutes of running (should be combined with hours and seconds)
seconds
seconds of running (should be combined with hours and minutes)
The total running time for the team (not for each participant) is registered. On average, there are 800 teams per combination of race day and sex group. The dataset contains median running times.
http://www.sparta.dk
data(dhl) attach(dhl) totaltime <- 60*60*hours + 60*minutes + seconds ## Total time in seconds
data(dhl) attach(dhl) totaltime <- 60*60*hours + 60*minutes + seconds ## Total time in seconds
In an experiment with six horses the digestibility coefficient was measured twice for each horse: once after the horse had been fed straw treated with NaOH and once after the horse had been treated ordinary straw.
data(digestcoefs)
data(digestcoefs)
A data frame with 6 observations on the following 3 variables.
horse
horse id
ordinary
digestibility coefficient corresponding to ordinary straw
naoh
digestibility coefficient corresponding to NaOH treated straw
Ib Skovgaard (2004). Basal Biostatistik 2. Samfundslitteratur.
data(digestcoefs)
data(digestcoefs)
Over a period of 14 years from 1990 to 2003, environmental agencies monitored the average amount of dioxins found in the liver of crabs at two different monitoring stations located some distance apart from a closed paper pulp mill. The outcome is the average total equivalent dose (TEQ), which is a summary measure of different forms of dioxins with different toxicities found in the crabs
data(dioxin)
data(dioxin)
A data frame with 28 observations on the following 3 variables.
site
a factor with levels a
b
corresponding to the two monitoring stations
year
the year
TEQ
a numeric vector for the total equivalent dose
C. J. Schwarz (2013). Sampling, Regression, Experimental Design and Analysis for Environmental Scientists, Biologists, and Resource Managers. Course Notes.
data(dioxin) ## maybe str(dioxin) ; plot(dioxin) ...
data(dioxin) ## maybe str(dioxin) ; plot(dioxin) ...
Growth of duckweed (Lemna) by counting the number of leaves every day over a two-week period
data(duckweed)
data(duckweed)
A data frame with 14 observations on the following 2 variables.
days
a numeric vector
leaves
a numeric vector
E. Ashby and T. A. Oxley (1935). The interactions of factors in the growth of Lemna. Annals of Botany. 49:309-336
data(duckweed)
data(duckweed)
The investigation of water temperatures influence on the frequency of these electrical signals
data(eels)
data(eels)
A data frame with 21 observations on the following 2 variables.
temp
the water temperature measured in degrees Celsius
freq
the frequency of of the emitted signal measured in Hz
Data were supplied from the course "Biostatistik, Geostatistik samt Sandsynlighedsteori og Statistik" held at Aarhus University in 2002.
data(eels)
data(eels)
As part of a so-called ELISA experiment, the optical density was measured for various dilutions of two different dissolutions with ubiquitin antibody. One dissolution was standard, whereas the other was serum from mice. For each dilution, the mixture proportion describes how many times the original ubiquitin dissolution has been thinned.
data(elisa)
data(elisa)
A data frame with 16 observations on the following 3 variables.
type
type of dissolution. Factor with levels mouse
std
mix
a numeric vector describing how many times the original ubiquitin dissolution was thinned
od
optical density
The data was generated by Marianne Freisleben in her work for the master's thesis at the University of Copenhagen.
data(elisa)
data(elisa)
In February 2010, 12 production farms were for sale in a municipality on Fuen island in Denmark. The dataset contains the soil area in thousands of square meters and the price in thousands of DKK.
data(farmprice)
data(farmprice)
A data frame with 12 observations on the following 2 variables.
area
area of soil in thousands of square meters
price
price in thousands of DKK
data(farmprice)
data(farmprice)
Dataset to examine if respiratory function in children was influenced by exposure to smoking at home.
data(fev)
data(fev)
A data frame with 654 observations on the following 5 variables.
Age
age in years
FEV
forced expiratory volume in liters
Ht
height measured in inches
Gender
gender (0=female, 1=male)
Smoke
exposure to smoking (0=no, 1=yes)
I. Tager and S. Weiss and B. Rosner and F. Speizer (1979). Effect of Parental Cigarette Smoking on the Pulmonary Function of Children. American Journal of Epidemiology. 110:15-26
data(fev)
data(fev)
Two groups were compared in an experiment with six microarrays. Two conditions (the test group and the reference group) were examined on each array and the amount of protein synthesized by the gene (also called the gene expression) was registered.
data(geneexp)
data(geneexp)
A data frame with 6 observations on the following 3 variables.
array
array id
test
gene expression level for test group
reference
gene expression level for reference group
Fictious data.
data(geneexp)
data(geneexp)
The length of the gestation period (the period from conception to birth) was registered for 13 horses.
data(gestation)
data(gestation)
A data frame with 13 observations on the following variable.
gest
length of gestation period
Fictious (but realistic) data.
data(gestation)
data(gestation)
The sorption was measured for a variety of hazardous organic solvents. The solvents were classified into three types (esters, aromatics, and chloroalkanes), and the purpose was to examine differences between the three types.
data(hazard)
data(hazard)
A data frame with 32 observations on the following 2 variables.
type
type of solvent. Factor with levels aromatic
chlor
estere
sorption
sorption measurements
J.D. Ortego, T.M Aminabhavi, S.F. Harlapur, R.H. Balundgi (1995). A review of polymeric geosynthetics used in hazardous waste facilities. Journal of Hazardous Materials, 42, 115-156.
data(hazard)
data(hazard)
An experiment was carried out in order to investigate the migration of nematodes in Danish herrings. The fish were allocated to eight different treatment groups corresponding to different combinations of storage time and storage conditions until filleting. After filleting, it was determined whether nematodes were present in the fillet or not.
data(herring)
data(herring)
A data frame with 884 observations on the following 4 variables.
group
a numeric vector that is the combination of storage and time
time
a numeric vector that contains the duration of storage in hours before the fish is filleted
condi
a numeric vector representing the storage condition
fillet
a numeric vector to indicate the presence of nematodes (1) or absence of nematodes (0)
The variable group is the combination of storage condition and storage time. Notice that a storage time 0 is equivalent to storage condition 0 and that no fish were stored 132 hours under condition 4. Hence, there are only 8 combinations; i.e., 8 levels of the group variable.
A. Roepstorff and H. Karl and B. Bloemsma and H. H. Huss (1993). Catch handling and the possible migration of Anisakis larvae in herring, Clupea harengus. Journal of Food Protection. 56:783-787.
data(herring) ## maybe str(herring) ; plot(herring) ...
data(herring) ## maybe str(herring) ; plot(herring) ...
As part of a larger cattle study, the effect of two types of feed on the concentration of a certain hormone was investigated. Twenty cows were given the feed for a period, and the hormone concentration was measured initially and at the end of the period.
data(hormone)
data(hormone)
A data frame with 20 observations on the following 3 variables.
feed
a numeric vector
initial
a numeric vector
final
a numeric vector
data(hormone)
data(hormone)
The data comes from an enzyme experiment with inhibitors. The enzyme acts on a substrate that was tested in six concentrations between 10 micro M and 600 micro M. Three concentrations of the inhibitor were tested, namely 0 (controls), 50 micro M and 100 micro M. There were two replicates for each combination yielding a total of 36 observations of reaction rate.
data(inhibitor)
data(inhibitor)
A data frame with 36 observations on the following 3 variables.
Iconc
Inhibitor concentration in micro Mole (numeric vector)
Sconc
Substrate concentration in micro Mole (numeric vector)
RR
Reaction rate (numeric vector)
The experiment was carried out by students at a biochemistry course at University of Copenhagen.
data(inhibitor)
data(inhibitor)
A study of the membrane potential for neurons from guinea pigs was carried out. The data consists of 312 measurements of interspike intervals; that is, the length of the time period between spontaneous firings from a neuron.
data(interspike)
data(interspike)
A data frame with 312 observations on the following variable.
interval
length of the interspike intervals
Petr Lansky, Pavel Sanda and Jufang He (2006). The parameters of the stochastic leaky integrate-and-fire neuronal model. Journal of Computational Neuroscience, 21, 211-223.
data(interspike)
data(interspike)
Dimensions in millimetres are given of two samples of jellyfish from Hawkesbury River in New South Wales, Australia
data(jellyfish)
data(jellyfish)
A data frame with 46 observations on the following 3 variables.
Location
a factor with levels Dangar
Salamander
Width
the width of the jellyfish in mm
Length
the length of the jellyfish in mm
Hand D.J., Daly F., Lunn A.D., McConway K.J., Ostrowski E. (1993) A Handbook of Small Data Sets. London: Chapman & Hall. Data set 335.
data(jellyfish)
data(jellyfish)
A score measuring the symmetry of the gait for eight trotting horses. Each horse was tested twice, namely while it was clinically healthy and after mechanical induction of lameness in a fore limb.
data(lameness)
data(lameness)
A data frame with 8 observations on the following 3 variables.
horse
a numeric vector with an id of the horse
lame
the symmetry score when the horse is lame
healthy
the symmetry score when the horse is healthy
A.T. Jensen, H. Sorensen, M.H. Thomsen and P.H. Andersen (2010). Quantification of symmetry for functional data with application to equine lameness classification. Submitted manuscript.
data(lameness)
data(lameness)
Length of the gestation period (period from conception to birth) and the lifespan (duration of life) for seven horses.
data(lifespan)
data(lifespan)
A data frame with 7 observations on the following 2 variables.
lifespan
duration of life (years)
gestation
length of gestation period (days)
Probably fictitous data.
data(lifespan)
data(lifespan)
Ten wildtype mice and ten RIP2-deficient mice, i.e., mice without the RIP2 protein, were used in the experiment. Each mouse was infected with listeria, and after three days the bacteria growth was measured in the liver or spleen. Errors were detected for two liver measurements, so the total number of observations is 18.
data(listeria)
data(listeria)
A data frame with 18 observations on the following 3 variables.
organ
a factor with levels liv
spl
telling where the mesurement was taken
type
a factor with levels rip2
wild
corresponding to the mouse type
growth
bacteria growth
Anand, P. K., Tait, S. W. G., Lamkanfi, M., Amer, A. O., Nunez, G., Pagès, G., Pouysségur, J., McGargill, M. A., Green, D. R., and Kanneganti, T.-D. (2011). TLR2 and RIP2 pathways mediate autophagy of listeria monocytogenes via extracellular signal-regulated kinase (ERK) activation. Journal of Biological Chemistry, 286:42981-42991.
data(listeria)
data(listeria)
Compute the logit of a probability
logit(p)
logit(p)
p |
a probability between 0 and 1 |
A number with list with class htest
containing the following components:
Claus Ekstrom [email protected]
Ten plants were used in an experiment of fertility of lucerne Two clusters of flowers were selected from each plant and pollinated. One cluster was bent down, whereas the other was exposed to wind and sun. At the end of the experiment, the average number of seeds per pod was counted for each cluster and the weight of 1000 seeds was registered for each cluster.
data(lucerne)
data(lucerne)
A data frame with 10 observations on the following 5 variables.
plant
plant id
seeds.exp
average number of seeds per pod from cluster exposed to sun and wind
wgt.exp
weight of 1000 seeds from cluster exposed to sun and wind
seeds.bent
average number of seeds per pod from cluster that was bent down
wgt.bent
weight of 1000 seeds from cluster that was bent down
H.L. Petersen (1954). Pollination and seed setting in lucerne. Kgl. Veterinaer og Landbohojskole, Aarsskrift 1954, 138-169.
data(lucerne)
data(lucerne)
Data to examine if cooling right after catching prevents nematodes (roundworms) from moving from the belly of mackerel to the fillet. A total of 150 mackerels were investigated and their length, number of nematodes in the belly, and time before counting the nematodes in the fillet were registered. The response variable is binary: presence or absence of nematodes in the fillet.
data(mackerel)
data(mackerel)
A data frame with 150 observations on the following 7 variables.
length
a numeric vector
visc
a numeric vector
left
a numeric vector
right
a numeric vector
filet
a numeric vector
portion
a numeric vector
time
a numeric vector
A. Roepstorff and H. Karl and B. Bloemsma and H. H. Huss (1993). Catch handling and the possible migration of Anisakis larvae in herring, Clupea harengus. Journal of Food Protection. 56:783-787.
data(mackerel) ## maybe str(mackerel) ; plot(mackerel) ...
data(mackerel) ## maybe str(mackerel) ; plot(mackerel) ...
A medical researcher took blood samples from 31 children who were infected with malaria and determined for each child the number of malaria parasites in 1 ml of blood.
data(malaria)
data(malaria)
A data frame with 31 observations on the following variable.
parasites
the number of malaria parasites
M.L. Samuels and J.A. Witmer (2003). Statistics for the Life Sciences (3rd ed.). Pearson Education, Inc., New Jersey.
C. B. Williams (1964) Patterns in the Balance of Nature. Academic Press, London.
data(malaria)
data(malaria)
Two common methods are GC-MS (gas chromatography-mass spectrometry) and HPLC (high performance liquid chromatography). The biggest difference between the two methods is that one uses gas while the other uses liquid. We wish to determine if the two methods measure the same amount of muconic acid in human urine.
data(massspec)
data(massspec)
A data frame with 16 observations on the following 3 variables.
sample
a numeric vector
hplc
a numeric vector
gcms
a numeric vector
data(massspec)
data(massspec)
In meat production, packs of minced meat are specified to contain 500 grams of minced meat. A sample of ten packs was drawn at random and the weights (in grams) of the content was recorded.
data(mincedmeat)
data(mincedmeat)
A data frame with 10 observations on the following variable.
wgt
weight of minced meat in grams
Fictitious data.
data(mincedmeat)
data(mincedmeat)
In an experiment on the utilization of vitamin A, 20 rats were given vitamin A over a period of three days. Ten rats were fed vitamin A in corn oil and ten rats were fed vitamin A in castor oil (American oil). On the fourth day, the liver of each rat was examined and the vitamin A concentration in the liver was determined.
data(oilvit)
data(oilvit)
A data frame with 20 observations on the following 2 variables.
type
type of oil. A factor with levels am
corn
avit
vitamin A concentration in liver
C.I.Bliss (1967). Statistics in Biology. McGraw-Hill, New York
data(oilvit)
data(oilvit)
A new enzyme, OOR, makes it possible for a certain bacteria species to develop on oxalate. In an experiment the enzyme activity (micromole per minute per mg) was measured and registered for 29 different pH-values.
data(OORdata)
data(OORdata)
A data frame with 29 observations on the following 2 variables.
ph
pH value (a numeric vector)
act
enzyme activity measured in micromole per minute per mg (a numeric vector)
Pierce, E., Becker, D. F., and Ragsdale, S. W. (2010). Identification and characterization of oxalate oxidoreductase, a novel thiamine pyrophosphate- dependent 2-oxoacid oxidoreductase that enables anaerobic growth on oxalate. Journal of Biological Chemistry, 285:40515-40524.
data(OORdata)
data(OORdata)
Tensile strength in pound-force per square inch of Kraft paper (used in brown paper bags) for various amounts of hardwood contents in the paper pulp.
data(paperstr)
data(paperstr)
A data frame with 19 observations on the following 2 variables.
hardwood
hardwood content
strength
tensile strength in pound-force per square inch
G. Joglekar and J. H. Schuenemeyer and V. LaRiccia (1989). Lack-of-Fit Testing When Replicates Are Not Available. The American Statistician. 43:135-143
data(paperstr)
data(paperstr)
In a plant physiological experiment the amount of water-soluble phosphorous (among others) was measured in the plants, as a percentage of dry matter. The phosphorous concentration was measured nine weeks during the growth season, and the averages over the plants in the experiments was reported.
data(phosphor)
data(phosphor)
A data frame with 9 observations on the following 2 variables.
week
week number
phos
phosphor concentration (average over the plants)
Ib Skovgaard (2004). Basal Biostatistik 2, Samfundslitteratur.
data(phosphor)
data(phosphor)
A small dataset for evaluating the effects of increasing pplication rates of picloram for control of tall larkspur.
data(picloram)
data(picloram)
A data frame with 313 observations on the following 3 variables.
replicate
a factor with levels 1
2
3
corresponding to the three replicates (locations) used
dose
the dose of picloram used in kg ae/ha
status
a numeric vector. 0 means the plant survived, 1 that it died
David L. Turner, Michael H. Ralphs and John O. Evans (1992): Logistic Analysis for Monitoring and Assessing Herbicide Efficacy. Weed Technology
data(picloram)
data(picloram)
An experiment on the effect of different stimuli was carried out with 60 pillbugs. The bugs were split into three groups: 20 bugs were exposed to strong light, 20 bugs were exposed to moisture, and 20 bugs were used as controls. For each bug it was registered how many seconds it used to move six inches.
data(pillbug)
data(pillbug)
A data frame with 60 observations on the following 2 variables.
time
number of seconds it took the pillbug to move six inches
group
treatment. A factor with levels Control
Light
Moisture
Samuels and Witmer (2003). Statistics for the Life Sciences (3rd ed.). Pearson Education, Inc., New Jersey.
data(pillbug)
data(pillbug)
The data consist of height and diameter (in breast height) measurements from 18 pine trees.
data(pine)
data(pine)
A data frame with 18 observations on the following 2 variables.
diam
diameter of the pine tree
height
height of the pine tree
J.N.R. Jeffers (1959). Experimental Design and Analysis in Forest Research. Almqvist and Wiksell, Stockholm.
data(pine)
data(pine)
The data concerns three insecticides (rotenone, deguelin, and a mixture of those). A total of 818 insects were exposed to different doses of one of the three insecticides. After exposure, it was recorded if the insect died or not.
data(poison)
data(poison)
A data frame with 818 observations on the following 3 variables.
status
status of insect: dead=1, alive=0 (numeric vector)
poison
type of insecticide. A factor with levels D
(deguelin) M
(mixture)) R
(rotenone)
logdose
natural logarithm of dose of insecticide
D.J. Finney (1952). Probit analysis. Cambridge University Press, England.
data(poison)
data(poison)
Investigation of meat quality of pork through color stability of pork chops. The color was measured from a pork chop from each of ten pigs at days 1, 4, and 6 after storage.
data(pork)
data(pork)
A data frame with 30 observations on the following 3 variables.
brightness
a numeric vector
day
a numeric vector
pig
a numeric vector
data(pork)
data(pork)
In an experiment with the enzyme puromycin, the rate of the reaction, V, was measured twice for each of six concentrations C of the substrate.
data(puromycin)
data(puromycin)
A data frame with 12 observations on the following 2 variables.
conc
concentration of the substrate (numeric vector)
rate
rate of reaction (numeric vector)
Unknown
data(puromycin)
data(puromycin)
An experiment was undertaken to investigate the amount of drug present in the liver of a rat. Nineteen rats were randomly selected, weighed, placed under a light anesthetic, and given an oral dose of the drug. It was believed that large livers would absorb more of a given dose than a small liver, so the actual dose given was approximately determined as 40 mg of the drug per kilogram of body weight. After a fixed length of time, each rat was sacrificed, the liver weighed, and the percent dose in the liver was determined.
data(ratliver)
data(ratliver)
A data frame with 19 observations on the following 4 variables.
BodyWt
body weight of each rat in grams
LiverWt
weight of liver in grams
Dose
relative dose of the drug given to each rat as a fraction of the largest dose
DoseInLiver
proportion of the dose in the liver
S. Weisberg (1985). Applied Linear Regression (2nd ed.). John Wiley and Sons
data(ratliver)
data(ratliver)
Data contains the weight gain for rats fed on four different diets: combinations of protein source (beef or cereal) and protein amount (low and high)
data(ratweight)
data(ratweight)
A data frame with 40 observations on the following 3 variables.
Gain
a numeric vector
Protein
a factor with levels Beef
Cereal
Amount
a factor with levels High
Low
Hand et al. (1993). A Handbook of Small Data Sets. Chapman and Hall
data(ratweight)
data(ratweight)
Plots a standardized residual plot from an lm object and provides additional graphics to help evaluate the variance homogeneity and mean.
residualplot(object, bandwidth = 0.3, ...)
residualplot(object, bandwidth = 0.3, ...)
object |
an lm object |
bandwidth |
The width of the window used to calculate the local smoothed version of the mean and the variance. Value should be between 0 and 1 and determines the percentage of the windowwidth used |
... |
Arguments passed to plot. |
Plots a standardized residual plot from an lm object and provides additional graphics to help evaluate the variance homogeneity and mean.
The brown area is a smoothed estimate of 1.96*SD of the standardized residuals in a window around the predicted value. The brown area should largely be rectangular if the standardized residuals have more or less the same variance.
The dashed line shows the smoothed mean of the standardized residuals and should generally follow the horizontal line through (0,0).
Produces a standardized residual plot
Claus Ekstrøm <[email protected]>
# Linear regression example x <- rnorm(100) y <- rnorm(100, mean=.5*x) model <- lm(y ~ x) residualplot(model)
# Linear regression example x <- rnorm(100) y <- rnorm(100, mean=.5*x) model <- lm(y ~ x) residualplot(model)
Weight gain of cattle fed with rice straw to see if rice straw can replace wheat straw as potential feed for slaughter cattle in Tanzania
data(ricestraw)
data(ricestraw)
A data frame with 35 observations on the following 2 variables.
time
number of days that the calf has been fed rice straw
weight
weight gain (in kg) since the calf was first fed rice straw
Ph.D. project at the Faculty of LIFE Sciences, University of Copenhagen
data(ricestraw) plot(ricestraw$time, ricestraw$weight) lm(weight ~ time, data=ricestraw)
data(ricestraw) plot(ricestraw$time, ricestraw$weight) lm(weight ~ time, data=ricestraw)
In order to study emission of greenhouse gasses in forests, 14 paired values of water content in the soil and emission of N2O were collected.
data(riis)
data(riis)
A data frame with 14 observations on the following 2 variables.
water
content of water in soil, measured as a volume percentage (numeric vector)
N2O
emission of N2O, measured as micrograms per square metre per hour (numeric vector)
Jesper Riis Christiansen, Department of Geosciences and Natural Resource Management, University of Copenhagen.
data(riis)
data(riis)
24 perennial ryegrass plants have been treated with different concentrations of ferulic acid, and the length of the root has been measured after a period of time
data(ryegrass)
data(ryegrass)
A data frame with 24 observations on the following 2 variables.
conc
concentration of ferulic acid in mM (numeric vector)
rootl
length of root in cm (numeric vector)
Inderjit, Streibig, J. C., and Olofsdotter, M. (2002). Joint action of phenolic acid mixtures and its significance in allelopathy research. Physiologia Plantarum, 114:422-428.
data(ryegrass)
data(ryegrass)
An experiment with two difference salmon stocks, from River Conon in Scotland and from River Atran in Sweden, was carried out. Thirteen fish from each stock were infected and after four weeks the number of a certain type of parasites was counted for each of the 26 fish.
data(salmon)
data(salmon)
A data frame with 26 observations on the following 2 variables.
stock
origin of the fish. A factor with levels atran
conon
parasites
a numeric vector with the parasite counts
Heinecke, R. D, Martinussen, T. and Buchmann, K. (2007). Microhabitat selection of Gyrodactylus salaris Malmberg on different salmonids. Journal of Fish Diseases, 30, 733-743.
data(salmon)
data(salmon)
The average sarcomere length in the meat and the corresponding tenderness as scored by a panel of sensory judges was examined. A high score corresponds to tender meat.
data(sarcomere)
data(sarcomere)
A data frame with 24 observations on the following 3 variables.
pig
factor with levels 1–24. Pid id
sarc.length
numeric Sarcomere length
tenderness
numeric Meat tenderness score
A. J. Moller and E. Kirkegaard and T. Vestergaard (1987). Tenderness of Pork Muscles as Influenced by Chilling Rate and Altered Carcass Suspension. Meat Science, 27, p. 275–286.
data(sarcomere) cor(sarcomere$sarc.length, sarcomere$tenderness)
data(sarcomere) cor(sarcomere$sarc.length, sarcomere$tenderness)
The number of seals in a population were counted each year during a period of 11 years, freom 1952 to 1962.
data(seal)
data(seal)
A data frame with 11 observations on the following 2 variables.
year
year of seal count
size
number of seals in population
J. Verzani (1005). Using R for Introductory Statistics. Chapman & Hall/CRC, London
data(seal)
data(seal)
The electric conductance was measured for 32 pieces of soap in 4 groups (8 pieces in each group). The content of fatty acid differs between the groups. Quality of soap is mainly determined by its content of fatty acid, which can be determined with a chemical analysis. It is much easier to measure the electric conductance, and it is therefore of interest if there is a simple relation between the two.
data(soap)
data(soap)
A data frame with 32 observations on the following 3 variables.
group
the groups of soap (notice: numeric vector, not factor)
fattyacid
content if fatty acid in percent (numeric vector)
conduct
electric conductance in milli Siemens (numeric vector)
Unknown
data(soap)
data(soap)
An experiment was carried out with 26 soybean plants. The plants were pairwise genetically identical, so there were 13 pairs in total. For each pair, one of the plants was 'stressed' by being shaken daily, whereas the other plant was not shaken. After a period the plants were harvested and the total leaf area was measured for each plant.
data(soybean)
data(soybean)
A data frame with 13 observations on the following 3 variables.
pair
id of the pair of plants
stress
Total leaf area of stressed plant
nostress
total leaf area of control plant
data(soybean)
data(soybean)
The average digestibility percent was measured for nine different levels of stearic acid proportion
data(stearicacid)
data(stearicacid)
A data frame with 9 observations on the following 2 variables.
stearic.acid
Percentage of stearic acid
digest
Average digestibility percentage
Jorgensen, G. and Hansen, N.G. (1973). Fedtsyresammensaetningens indflydelse paa fedstoffers fordojelighed. Landokonomisk Forsogslaboratorium.
data(stearicacid) lm(digest ~ stearic.acid, data=stearicacid)
data(stearicacid) lm(digest ~ stearic.acid, data=stearicacid)
Fifteen subjects participated in an experiment related to overweight and got a standardized meal.The interest was, among others, to find relationships between the time it takes from a meal until the stomach is empty again and the concentration of a certain hormone.
data(stomach)
data(stomach)
A data frame with 15 observations on the following 2 variables.
conc
hormone concentration
empty
time from meal until the stomach is empty
Ib Skovgaard (2004). Basal Biostatistik 2. Samfundslitteratur.
data(stomach)
data(stomach)
A dog experiment was carried out in order to examine the effect of two treatments on the development of tartar. Apart from the two treatment groups there was also a control group. Twenty-six dogs were used and allocated to one of the three groups. After four weeks each dog was examined, and the development of tartar was summarized by an index.
data(tartar)
data(tartar)
A data frame with 26 observations on the following 2 variables.
treat
treatment. A factor with levels Control
HMP
P2O7
index
a numeric vector with the tartar index
data(tartar)
data(tartar)
68 lettuce plants were treated with the herbicide tetraneurin-A in different concentrations. After 5 days each plant was harvested and the root length in cm was registered.
data("tetra")
data("tetra")
A data frame with 68 observations on the following 2 variables.
konz
concentration of herbicide (numeric vector)
root
root length in cm (numeric vector)
Belz, R., Cedergreen, N., and Sørensen, H. (2008). Hormesis in mixtures - Can it be predicted. Science of the Total Environment, 404:77-87.
data(tetra)
data(tetra)
A brass thumbtack was thrown 100 times and it was registered whether the pin was pointing up or down towards the table upon landing.
data(thumbtack)
data(thumbtack)
The format is: int [1:100] 1 1 0 0 1 1 0 1 0 0 ...
1 corresponds to "tip pointing down" and 0 corresponds to "tip pointing up"
Mats Rudemo (1979). Statistik og sandsynlighedslaere med biologiske anvendelser. Del 1: Grundbegreber.
data(thumbtack) mean(thumbtack)
data(thumbtack) mean(thumbtack)
Data to examine the effect of turtle carapace length on the clutch size of turtles.
data(turtles)
data(turtles)
A data frame with 18 observations on the following 2 variables.
length
a numeric vector
clutch
a numeric vector
K. G. Ashton and R. L. Burke and J. N. Layne (2007). Geographic variation in body and clutch size of gopher tortoises. Copeia. 49:355-363.
data(turtles) ## maybe str(turtles) ; plot(turtles) ...
data(turtles) ## maybe str(turtles) ; plot(turtles) ...
The impact of food intake and exercise as possible explanatory variables for the urinary tract disease in cats.
data(urinary)
data(urinary)
A data frame with 74 observations on the following 3 variables.
disease
a factor with levels no
yes
food
a factor with levels excessive
normal
exercise
a factor with levels little
much
Willeberg P (1976). Interaction effects of epidemiologic factors in the feline urological syndrome. Nordisk Veterinaer Medicin, 28, 193-200
data(urinary) head(urinary)
data(urinary) head(urinary)
The daily food intake was studied for 2224 subjects, and the content of many different vitamins and substances were meaured,
data(vitamina)
data(vitamina)
A data frame with 2224 observations on the following 20 variables.
person
subject id (a numeric vector)
wt
weight (kg)
ht
height (cm)
sex
sex: 1 for male, 2= for female
age
age
bmr
basal metabolic rate
E_bmr
energy divided by bmr
energi
energy content (kJ)
Avit
vitamin A (RE)
retinol
retinol (microgram)
betacar
beta-caroten (microgram)
Dvit
vitamin D (microgram
Evit
vitamin E (alphaTE)
B1vit
vitamin B1 (milligram)
B2vit
vitamin B2 (milligram)
niacin
niacin (NE)
B6vit
vitamin B6 (milligram)
folacin
folacin (microgram)
B12vit
vitamin B12 (microgram)
Cvit
vitamin C (milliggram)
Only variables Avit and bmr are used in the "Introduction to Statistical Data Analysis for the Life Sciences" book.
J. Haraldsdottir, J.H. Jensen, A. Moller (1985). Danskernes kostvaner 1985, Hovedresultater. Levnedsmiddelstyrelsen, publikation nr. 138.
data(vitamina)
data(vitamina)