Author

Danet and Becks, based on originals by Delmas and Griffiths

Published

November 19, 2024

Working with vectors, arrays and matrices is important. But quite often, we want to collect high-dimension data (multiple variables) from our simulations and store them in a spreadsheet type format.

As you’ve seen in Tutorial 1, there are plotting macros (@df) within the StatsPlots package that allow us to work with data frame objects from the DataFrames package. A second benefit of the data frame object is that we can export it as a csv file and import this into R where we may prefer working on plotting and statistics.

To this end, here we will also introduce the CSV package, which is very handy for exporting DataFrame objects to csv files, and importing them as well, if you’d like.

The Data Frame

To initialise a dataframe you use the DataFrame function from the DataFrames package:

dat = DataFrame(col1=[], col2=[], col3=[]) # we use [] to specify an empty column of any type and size.
0×3 DataFrame
Row col1 col2 col3
Any Any Any

Alternately, you can specify the data type for each column.

dat1 = DataFrame(col1=Float64[], col2=Int64[], col3=Float64[])
0×3 DataFrame
Row col1 col2 col3
Float64 Int64 Float64

Of course, col1 is not the only label you provide: variable names are super important and the conventions we use in R are also important here in Julia, e.g. a_b or AaBa but not a b (no spaces allowed) or a.b (because the (dot) . functions as an operator).

# provide informative column titles using:
dat2 = DataFrame(species=[], size=[], rate=[])
0×3 DataFrame
Row species size rate
Any Any Any

Allocating or adding data to a data frame.

To add data to a dataframe, we use the push! (read as push bang) command.

species = "D.magna"
size = 2.2
rate = 4.2
4.2
# push!() arguments: data frame, data
push!(dat2, [species, size, rate])
1×3 DataFrame
Row species size rate
Any Any Any
1 D.magna 2.2 4.2

Of course, the push!() function can append data to the existing data frame. It is worth noting that push! can only append one row at a time. But since Julia is so good with loops (compared to R), this will make adding data to a dataframe really easy, and we’ll learn how to do this in the next tutorial. What makes the ! (bang) function very useful is that you can append (or remove, with pop!()) items to an object without having to assign it.

species2 = "D.pulex"
size2 = 1.8
rate2 = 3.1

# push!() arguments: data frame, data
push!(dat2, [species2, size2, rate2])
2×3 DataFrame
Row species size rate
Any Any Any
1 D.magna 2.2 4.2
2 D.pulex 1.8 3.1

Helper Functions for Data Frames

You can print data frames using println

println(dat2)
2×3 DataFrame
 Row │ species  size  rate 
     │ Any      Any   Any  
─────┼─────────────────────
   1 │ D.magna  2.2   4.2
   2 │ D.pulex  1.8   3.1

There are first and last function that are like head and tail in R and elsewhere, with a first argument the data frame and the second argument the number of rows.

first(dat2, 2)
2×3 DataFrame
Row species size rate
Any Any Any
1 D.magna 2.2 4.2
2 D.pulex 1.8 3.1
last(dat2,2)
2×3 DataFrame
Row species size rate
Any Any Any
1 D.magna 2.2 4.2
2 D.pulex 1.8 3.1

And as we learned with matrices and arrays, the [row, column] method also works for data frames:

dat2[1,2]
2.2
dat2[1,:]
DataFrameRow (3 columns)
Row species size rate
Any Any Any
1 D.magna 2.2 4.2
dat2[:,3]
2-element Vector{Any}:
 4.2
 3.1

The CSV

As with R, there are functions to read and write .csv files to and from dataframes. This makes interoperability with tools in R and standard data storage file formats easy.

To write our daphnia data to a csv file, we use a familiar syntax, but a function from the CSV package.

CSV.write("daphniadata.csv", dat2)

Of course, you can read files in using…. yes, CSV.read. Note the second argument declares the data to go into a data frame.

daph_in = CSV.read("betterDaphniaData.csv", DataFrame)