This lesson is being piloted (Beta version)

Data Harvesting for Agriculture: Glossary

Key Points

Welcome to Data Harvesting for Agriculture!
Introduction to Programming with R and RStudio
  • In programming there is certain terminology we need to learn: variables, variable types, functions

  • Use variable <- value to assign a value to a variable in order to record it in memory.

  • Use read.csv to read data in to R

Introduction to QGIS
  • QGIS is a free alternative to the more expensive ArcGIS software

  • QGIS can give you a more detailed visualization of your data and help you understand your yield results

Geospatial data and boundaries
  • It is preferred to use sf for data analysis, making it easier to access the dataframe.

  • Projecting your data in utm is necessary for many of the geometric operations you perform (e.g. making trial grids and splitting plots into subplot data)

  • Different data formats that you are likely to encounter include gpkg, shp (cpg, dbf, prj, sbn, sbx), geojson, and tif.

Trial Design
  • Most of the code in this part would be using the functions, therefore understanding what different functions can be quite important

  • In designing the trials, the most important thing is to know how to design the experimental rates,and the tech part can be done by someone else

Trial Data
  • sf is prefereable for data analysis; it is easier to access the dataframe

  • Projecting your data in utm is necessary for many of the geometric operations you perform (e.g. making trial grids and splitting plots into subplot data)

  • Compare different data formats, such as gpkg, shp(cpg,dbf,prj,sbn,sbx),geojson,tif

Data Cleaning and Aggregation
  • Comparison operators such as >, <, and == can be used to identify values that exceed or equal certain values.

  • All the cleaning in ArcGIS/QGIS can be done by R, but we need to check the updated shapefile in ArcGIS/QGIS. Including removing observations that has greater than 2sd harvester speed, certain headlands, or being too close to the plot borders

  • The filter function in dplyr removes rows from a data frame based on values in one or more columns.

SSURGO & Weather Data
  • sf is preferable for data analysis; it is easier to access the dataframe

  • Projecting your data in utm is necessary for many of the geometric operations you perform (e.g. making trial grids and splitting plots into subplot data)

  • Compare different data formats, such as gpkg, shp(cpg,dbf,prj,sbn,sbx),geojson,tif

SSURGO & Weather Data
  • sf is preferable for data analysis; it is easier to access the dataframe

  • Projecting your data in utm is necessary for many of the geometric operations you perform (e.g. making trial grids and splitting plots into subplot data)

  • Compare different data formats, such as gpkg, shp(cpg,dbf,prj,sbn,sbx),geojson,tif

Glossary

argument
A value given to a function or program when it runs. The term is often used interchangeably (and inconsistently) with parameter.
comma-separated values (CSV)
A common textual representation for tables in which the values in each row are separated by commas.
comment
A remark in a program that is intended to help human readers understand what is going on, but is ignored by the computer. Comments in Python, R, and the Unix shell start with a # character and run to the end of the line; comments in SQL start with --, and other languages have other conventions.
data
Quantities on which R will perform calculations.
data type
The format of the data you are working with. For whole numbers (e.g. -5, 1, 2) these are called integers, for numbers with decimals these are called floats, for words or phrases (e.g. “hi”, “Hi all!”) these are called strings.
dimensions (of an array)
An array’s extent, represented as a vector. For example, an array with 5 rows and 3 columns has dimensions (5,3).
documentation
Human-language text written to explain what software does, how it works, or how to use it.
encapsulation
The practice of hiding something’s implementation details so that the rest of a program can worry about what it does rather than how it does it.
function
A piece of code that we want to use again and again.
function body
The statements that are executed inside a function.
function call
A use of a function in another piece of software.
function composition
The immediate application of one function to the result of another, such as f(g(x)).
index
A subscript that specifies the location of a single value in a collection, such as a single pixel in an image.
parameter
A variable named in the function’s declaration that is used to hold a value passed into the call. The term is often used interchangeably (and inconsistently) with argument.
variable
R variables are of an R object type and are mostly lists of data and can be numeric or text
working directory
The file path on your computer that sets the default location of any files you read into R, or save out of R.