Title: | Distance Metrics for Image Color Similarity |
---|---|
Description: | Loads and displays images, selectively masks specified background colors, bins pixels by color using either data-dependent or automatically generated color bins, quantitatively measures color similarity among images using one of several distance metrics for comparing pixel color clusters, and clusters images by object color similarity. Uses CIELAB, RGB, or HSV color spaces. Originally written for use with organism coloration (reef fish color diversity, butterfly mimicry, etc), but easily applicable for any image set. |
Authors: | Hannah Weller [aut, cre] |
Maintainer: | Hannah Weller <[email protected]> |
License: | GPL-3 |
Version: | 1.1.2 |
Built: | 2025-02-18 04:57:44 UTC |
Source: | https://github.com/hiweller/colordistance |
Computes the chi-squared distance between each element of a pair of vectors which must be of the same length. Good for comparing color histograms if you don't want to weight by color similarity. Probably hugely redundant; alas.
chisqDistance(a, b)
chisqDistance(a, b)
a |
Numeric vector. |
b |
Numeric vector; must be the same length as a. |
Chi-squared distance, , between vectors a and
b. If one or both elements are NA/NaN, contribution is counted as a 0.
colordistance:::chisqDistance(rnorm(10), rnorm(10))
colordistance:::chisqDistance(rnorm(10), rnorm(10))
Calculates the Euclidean distance between each pair of points in two dataframes as returned by extractClusters or getImageHist and returns the sum of the distances.
colorDistance(T1, T2)
colorDistance(T1, T2)
T1 |
Dataframe (especially a dataframe as returned by
|
T2 |
Another dataframe like T1. |
Sum of Euclidean distances between each pair of points (rows) in the provided dataframes.
## Not run: cluster.list <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) colordistance:::colorDistance(cluster.list[[1]], cluster.list[[2]]) ## End(Not run)
## Not run: cluster.list <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) colordistance:::colorDistance(cluster.list[[1]], cluster.list[[2]]) ## End(Not run)
Calculates color histograms for images in immediate subdirectories of a folder, and averages histograms for images in the same subdirectory.
combineClusters(folder, method = "mean", ...)
combineClusters(folder, method = "mean", ...)
folder |
Path to the folder containing subdirectories of images. Must be a character vector. |
method |
Method for combining color histograms. Default is
|
... |
Additional arguments passed to |
combined_clusters <- colordistance::combineClusters(system.file("extdata", "Heliconius", package="colordistance"), method="median", bins=2, lower=rep(0.8, 3), upper=rep(1, 3))
combined_clusters <- colordistance::combineClusters(system.file("extdata", "Heliconius", package="colordistance"), method="median", bins=2, lower=rep(0.8, 3), upper=rep(1, 3))
Combine a list of cluster features as returned by getHistList
according to the specified method.
combineList(hist_list, method = "mean")
combineList(hist_list, method = "mean")
hist_list |
A list of cluster dataframes as returned by
|
method |
Method for combining color histograms. Default is
|
While the function can also accept clusters generated using kmeans
(getKMeansList
followed by extractClusters
),
this is not recommended, as kmeans does not provide explicit analogous
pairs of clusters, and clusters are combined by row number (all row 1
clusters are treated as analogous, etc). Color histograms are appropriate
because the bins are defined the same way for each image.
hist_list <- getHistList(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) median_clusters <- combineList(hist_list, method="median")
hist_list <- getHistList(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) median_clusters <- combineList(hist_list, method="median")
Wrapper for convertColor
that builds in random
sampling, error messages, and removes default illuminant (D65) to enforce
manual specification of a reference white.
convertColorSpace( color.coordinate.matrix, from = "sRGB", to = "Lab", sample.size = 1e+05, from.ref.white, to.ref.white )
convertColorSpace( color.coordinate.matrix, from = "sRGB", to = "Lab", sample.size = 1e+05, from.ref.white, to.ref.white )
color.coordinate.matrix |
A color coordinate matrix with rows as colors
and channels as columns. If a color histogram (e.g. as returned by
|
from , to
|
Input and output color spaces, passed to
|
sample.size |
Number of pixels to be randomly sampled from filtered pixel array for conversion. If not numeric or larger than number of colors provided (i.e. cluster matrix), all colors are converted. See details. |
from.ref.white , to.ref.white
|
Reference whites passed to
|
Color spaces are all passed to
convertColor
, and can be any of: "XYZ"
,
"sRGB"
, "Apple RGB"
, "CIE RGB"
, "Lab"
, or
"Luv"
.
Lab
and Luv
color spaces are approximately perceptually
uniform, meaning they usually do the best job of reflecting intuitive color
distances without the non-linearity problems of more familiar RGB spaces.
However, because they describe object colors, they require a reference
'white light' color (dimly and brightly lit photographs of the same object
will have very different RGB palettes, but similar Lab palettes if
appropriate white references are used). The idea here is that the apparent
colors in an image depend not just on the "absolute" color of an object,
but also on the available light in the scene. There are seven CIE
standardized illuminants available in colordistance
(A, B, C, E, and
D50, D55, and D65), but the most common are:
"A"
:
Standard incandescent lightbulb
"D65"
: Average daylight
"D50"
: Direct sunlight
Color conversions will be highly dependent on the reference white used, which is why no default is provided. Users should look into standard illuminants to choose an appropriate reference for a dataset.
The conversion from RGB to a standardized color space (XYZ, Lab, or Luv) is
approximate, non-linear, and relatively time-consuming. Converting a large
number of pixels can be computationally expensive, so
convertColorSpace
will randomly sample a specified number of rows to
reduce the time. The default sample size, 100,000 rows, takes about 5 seconds
convert from sRGB to Lab space on an early 2015 Macbook with 8 GB of RAM.
Time scales about linearly with number of rows converted.
A 3- or 4-column matrix depending on whether
color.coordinate.matrix
included a 'Pct' column (as from
getImageHist
), with one column per channel.
# Convert a single RGB triplet and then back convert it rgb_color <- c(0, 1, 0) lab_color <- colordistance::convertColorSpace(rgb_color, from="sRGB", to="Lab", to.ref.white="D65") rgb_again <- colordistance::convertColorSpace(lab_color, from="Lab", to="sRGB", from.ref.white="D65") # Convert pixels from loadImage() function img <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance")) lab_pixels <- colordistance::convertColorSpace(img$filtered.rgb.2d, from="sRGB", to="XYZ", sample.size=5000) # Convert clusters img <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance")) img_hist <- colordistance::getImageHist(img, bins=2, plotting=FALSE) lab_clusters <- colordistance::convertColorSpace(img_hist, to.ref.white="D55")
# Convert a single RGB triplet and then back convert it rgb_color <- c(0, 1, 0) lab_color <- colordistance::convertColorSpace(rgb_color, from="sRGB", to="Lab", to.ref.white="D65") rgb_again <- colordistance::convertColorSpace(lab_color, from="Lab", to="sRGB", from.ref.white="D65") # Convert pixels from loadImage() function img <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance")) lab_pixels <- colordistance::convertColorSpace(img$filtered.rgb.2d, from="sRGB", to="XYZ", sample.size=5000) # Convert clusters img <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance")) img_hist <- colordistance::getImageHist(img, bins=2, plotting=FALSE) lab_clusters <- colordistance::convertColorSpace(img_hist, to.ref.white="D55")
Calculates the
Earth
mover's distance (briefly, the amount of work required to move the data from
one distribution to resemble the other distribution, or the amount of "dirt"
you have to shovel weighted by how far you have to shovel it). Accounts for
both color disparity and size disparity. Recommended unless binAvg
is
off for histogram generation. Note: this function is not exported by the package,
since it is fairly specific to the colordistance framework. For a more generic
implementation of EMD, see the [emdist::emd] function in the emdist package.
EMDistance(T1, T2)
EMDistance(T1, T2)
T1 |
Dataframe (especially a dataframe as returned by
|
T2 |
Another dataframe like T1. |
Earth mover's distance between the two dataframes (metric of overall bin similarity for a pair of 3-dimensional histograms).
## Not run: cluster.list <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) colordistance:::EMDistance(cluster.list[[1]], cluster.list[[2]]) ## End(Not run)
## Not run: cluster.list <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) colordistance:::EMDistance(cluster.list[[1]], cluster.list[[2]]) ## End(Not run)
Converts a symmetrical distance matrix to a tree and saves it in newick
format. Uses hclust
to form clusters.
exportTree(getColorDistanceMatrixObject, file, return.tree = FALSE)
exportTree(getColorDistanceMatrixObject, file, return.tree = FALSE)
getColorDistanceMatrixObject |
A distance matrix, especially as returned
by |
file |
Character vector of desired filename for saving tree. Should end in ".newick". |
return.tree |
Logical. Should the tree object be returned to the working environment in addition to being saved as a file? |
Newick tree saved in specified location and as.phylo
tree
object if return.tree=TRUE
.
## Not run: clusterList <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), lower=rep(0.8, 3), upper=rep(1, 3)) CDM <- colordistance::getColorDistanceMatrix(clusterList, method="emd", plotting=FALSE) # Tree is both saved in current working directory and stored in # heliconius_tree variable heliconius_tree <- colordistance::exportTree(CDM, "./HeliconiusColorTree.newick", return.tree=TRUE) ## End(Not run)
## Not run: clusterList <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), lower=rep(0.8, 3), upper=rep(1, 3)) CDM <- colordistance::getColorDistanceMatrix(clusterList, method="emd", plotting=FALSE) # Tree is both saved in current working directory and stored in # heliconius_tree variable heliconius_tree <- colordistance::exportTree(CDM, "./HeliconiusColorTree.newick", return.tree=TRUE) ## End(Not run)
Extract a list of dataframes with the same format as those returned by
getHistList
, where each dataframe has 3 color attributes (R, G,
B or H, S, V) and a size attribute (Pct) for every cluster.
extractClusters(getKMeansListObject, ordering = TRUE, normalize = FALSE)
extractClusters(getKMeansListObject, ordering = TRUE, normalize = FALSE)
getKMeansListObject |
A list of |
ordering |
Logical. Should clusters by reordered by color similarity? If
|
normalize |
Logical. Should each cluster be normalized to show R:G:B or
H:S:V ratios rather than absolute values? Can be helpful for inconsistent
lighting, but reduces variation. See |
A list of dataframes (same length as input list), each with 4 columns: R, G, B (or H, S, V) and Pct (cluster size), with one row per cluster.
Names are inherited from the list passed to the function.
clusterList <- colordistance::getKMeansList(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), bins=3) colordistance::extractClusters(clusterList)
clusterList <- colordistance::getKMeansList(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), bins=3) colordistance::extractClusters(clusterList)
Calculates a distance matrix for a list of color cluster sets as returned by
extractClusters
or getHistList
based on the
specified distance metric.
getColorDistanceMatrix( cluster.list, method = "emd", ordering = "default", size.weight = 0.5, color.weight = 0.5, plotting = TRUE, ... )
getColorDistanceMatrix( cluster.list, method = "emd", ordering = "default", size.weight = 0.5, color.weight = 0.5, plotting = TRUE, ... )
cluster.list |
A list of identically sized dataframes with 4 columns each
(R, G, B, Pct or H, S, V, Pct) as output by |
method |
One of four possible comparison methods for calculating the
color distances: |
ordering |
Logical if not left as "default". Should the color clusters
in the list be reordered to minimize the distances between the pairs? If
left as default, ordering depends on distance method: "emd" and "chisq" do
not order clusters ("emd" orders on a case-by-case in the
|
size.weight |
Same as in |
color.weight |
Same as in |
plotting |
Logical. Should a heatmap of the distance matrix be displayed once the function finishes running? |
... |
Additional arguments passed on to
|
Each cell represents the distance between a pair of color cluster sets as measured using either chi-squared distance (cluster size only), earth mover's distance (size and color), weighted pairs (size and color with user-specified weights for each), or color distance (Euclidean distance between clusters as 3-dimensional - RGB or HSV - color coordinates).
Earth mover's distance is recommended unless binAvg
is set to false
during cluster list generation (in which case all paired bins will have the
same colors across datasets), in which case chi-squared is recommended.
Weighted pairs or color distance may be appropriate depending on the
question, but generally give poorer results.
A distance matrix of image distance scores (the scales vary depending on the distance metric chosen, but for all four methods, higher scores = more different).
## Not run: cluster.list <- colordistance::getHistList(c(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), system.file("extdata", "Heliconius/Heliconius_B", package="colordistance")), lower=rep(0.8, 3), upper=rep(1, 3)) # Default values - recommended! colordistance::getColorDistanceMatrix(cluster.list, main="EMD") # Without plotting colordistance::getColorDistanceMatrix(cluster.list, plotting=FALSE) # Use chi-squared instead colordistance::getColorDistanceMatrix(cluster.list, method="chisq", main="Chi-squared") # Override ordering (throws a warning if you're trying to do this with # chisq!) colordistance::getColorDistanceMatrix(cluster.list, method="chisq", ordering=TRUE, main="Chi-squared w/ ordering") # Specify high size weight/low color weight for weighted pairs colordistance::getColorDistanceMatrix(cluster.list, method="weighted.pairs", color.weight=0.1, size.weight=0.9, main="Weighted pairs") # Color distance only colordistance::getColorDistanceMatrix(cluster.list, method="color.dist", ordering=TRUE, main="Color distance only") ## End(Not run)
## Not run: cluster.list <- colordistance::getHistList(c(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), system.file("extdata", "Heliconius/Heliconius_B", package="colordistance")), lower=rep(0.8, 3), upper=rep(1, 3)) # Default values - recommended! colordistance::getColorDistanceMatrix(cluster.list, main="EMD") # Without plotting colordistance::getColorDistanceMatrix(cluster.list, plotting=FALSE) # Use chi-squared instead colordistance::getColorDistanceMatrix(cluster.list, method="chisq", main="Chi-squared") # Override ordering (throws a warning if you're trying to do this with # chisq!) colordistance::getColorDistanceMatrix(cluster.list, method="chisq", ordering=TRUE, main="Chi-squared w/ ordering") # Specify high size weight/low color weight for weighted pairs colordistance::getColorDistanceMatrix(cluster.list, method="weighted.pairs", color.weight=0.1, size.weight=0.9, main="Weighted pairs") # Color distance only colordistance::getColorDistanceMatrix(cluster.list, method="color.dist", ordering=TRUE, main="Color distance only") ## End(Not run)
Gets a vector of colors for plotting histograms from
getImageHist
in helpful ways.
getHistColors(bins, hsv = FALSE)
getHistColors(bins, hsv = FALSE)
bins |
Number of bins for each channel OR a vector of length 3 with bins for each channel. Bins = 3 will result in 3^3 = 27 bins; bins = c(2, 2, 3) will result in 2 * 2 * 3 = 12 bins (2 red, 2 green, 3 blue), etc. |
hsv |
Logical. Should HSV be used instead of RGB? |
A vector of hex codes for bin colors.
colordistance:::getHistColors(bins = 3) colordistance:::getHistColors(bins = c(8, 3, 3), hsv = TRUE)
colordistance:::getHistColors(bins = 3) colordistance:::getHistColors(bins = c(8, 3, 3), hsv = TRUE)
Applies getImageHist
to every image in a provided set of image
paths and/or directories containing images.
getHistList( images, bins = 3, bin.avg = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), alpha.channel = TRUE, norm.pix = FALSE, plotting = FALSE, pausing = TRUE, hsv = FALSE, title = "path", img.type = FALSE, bounds = c(0, 1) )
getHistList( images, bins = 3, bin.avg = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), alpha.channel = TRUE, norm.pix = FALSE, plotting = FALSE, pausing = TRUE, hsv = FALSE, title = "path", img.type = FALSE, bounds = c(0, 1) )
images |
Character vector of directories, image paths, or both. |
bins |
Number of bins for each channel OR a vector of length 3 with bins for each channel. Bins=3 will result in 3^3 = 27 bins; bins=c(2, 2, 3) will result in 2*2*3=12 bins (2 red, 2 green, 3 blue), etc. |
bin.avg |
Logical. Should the returned color clusters be the average of
the pixels in that bin (bin.avg= |
lower |
RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). |
upper |
RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no background filtering is
needed, set bounds to some non-numeric value ( |
alpha.channel |
Logical. If available, should alpha channel transparency be
used to mask background? See |
norm.pix |
Logical. Should RGB or HSV cluster values be normalized using
|
plotting |
Logical. Should the histogram generated for each image be displayed? |
pausing |
Logical. If |
hsv |
Logical. Should HSV be used instead of RGB? |
title |
String for what the title the plots if plotting is on; defaults to the image name. |
img.type |
Logical. Should the file extension for the images be retained
when naming the output list elements? If |
bounds |
Upper and lower limits for the channels; R reads in images with intensities on a 0-1 scale, but 0-255 is common. |
A list of getImageHist
dataframes, 1 per image, named
by image name.
For every image, the pixels are binned according to the specified bin
breaks. By providing the bounds for the bins rather than letting an algorithm
select centers (as in getKMeansList
), clusters of nearly
redundant colors are avoided.
So you don't end up with, say, 3 nearly-identical yellow clusters which are treated as unrelated just because there's a lot of yellow in your image; you just get a very large yellow cluster and empty non-yellow bins.
## Not run: # Takes >10 seconds if you run all examples clusterList <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), upper = rep(1, 3), lower = rep(0.8, 3)) clusterList <- colordistance::getHistList(c(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), system.file("extdata", "Heliconius/Heliconius_A", package="colordistance")), pausing = FALSE, upper = rep(1, 3), lower = rep(0.8, 3)) clusterList <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package = "colordistance"), plotting = TRUE, upper = rep(1, 3), lower = rep(0.8, 3)) ## End(Not run)
## Not run: # Takes >10 seconds if you run all examples clusterList <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), upper = rep(1, 3), lower = rep(0.8, 3)) clusterList <- colordistance::getHistList(c(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), system.file("extdata", "Heliconius/Heliconius_A", package="colordistance")), pausing = FALSE, upper = rep(1, 3), lower = rep(0.8, 3)) clusterList <- colordistance::getHistList(system.file("extdata", "Heliconius/Heliconius_B", package = "colordistance"), plotting = TRUE, upper = rep(1, 3), lower = rep(0.8, 3)) ## End(Not run)
Computes a histogram in either RGB or HSV colorspace by sorting pixels into a specified number of bins.
getImageHist( image, bins = 3, bin.avg = TRUE, defaultClusters = NULL, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), as.vec = FALSE, alpha.channel = TRUE, norm.pix = FALSE, plotting = TRUE, hsv = FALSE, title = "path", bounds = c(0, 1), ... )
getImageHist( image, bins = 3, bin.avg = TRUE, defaultClusters = NULL, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), as.vec = FALSE, alpha.channel = TRUE, norm.pix = FALSE, plotting = TRUE, hsv = FALSE, title = "path", bounds = c(0, 1), ... )
image |
Path to a valid image (PNG or JPG) or a |
bins |
Number of bins for each channel OR a vector of length 3 with bins for each channel. Bins=3 will result in 3^3 = 27 bins; bins=c(2, 2, 3) will result in 2*2*3=12 bins (2 red, 2 green, 3 blue), etc. |
bin.avg |
Logical. Should the returned color clusters be the average of
the pixels in that bin (bin.avg= |
defaultClusters |
Optional dataframe of default color clusters to be
returned when a bin is empty. If |
lower |
RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). |
upper |
RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no background filtering is
needed, set bounds to some non-numeric value ( |
as.vec |
Logical. Should the bin sizes just be returned as a vector?
Much faster if only using |
alpha.channel |
Logical. If available, should alpha channel transparency be
used to mask background? See |
norm.pix |
Logical. Should RGB or HSV cluster values be normalized using
|
plotting |
Logical. Should a histogram of the bin colors and sizes be plotted? |
hsv |
Logical. Should HSV be used instead of RGB? |
title |
String for what to title the plots if plotting is on; defaults to the image name. |
bounds |
Upper and lower limits for the channels; R reads in images with intensities on a 0-1 scale, but 0-255 is common. |
... |
Optional arguments passed to the |
If you choose 2 bins for each color channel, then each of R, G, and B will be divided into 2 bins each, for a total of 2^3 = 8 bins.
Once all pixels have been binned, the function will return either the size of each bin, either in number of pixels or fraction of total pixels, and the color of each bin, either as the geometric center of the bin or as the average color of all pixels assigned to it.
For example, if you input an image of a red square and used 8 bins, all red pixels (RGB triplet of [1, 0, 0]) would be assigned to the bin with R bounds (0.5, 1], G bounds [0, 0.5) and B bounds [0, 0.5). The average color of the bin would be [0.75, 0.25, 0.25], but the average color of the pixels assigned to that bin would be [1, 0, 0]. The latter option is obviously more informative, but takes longer (about 1.5-2x longer depending on the images).
A vector or dataframe (depending on whether as.vec=T
) of bin
sizes and color values.
# generate HSV histogram for a single image colordistance::getImageHist(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3), bins=c(8, 3, 3), hsv=TRUE, plotting=TRUE) # generate RGB histogram colordistance::getImageHist(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3), bins=2)
# generate HSV histogram for a single image colordistance::getImageHist(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3), bins=c(8, 3, 3), hsv=TRUE, plotting=TRUE) # generate RGB histogram colordistance::getImageHist(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3), bins=2)
Find all valid image paths (PNG and JPG) in a directory (does not search subdirectories). Will recover any image ending in .PNG, .JPG, or .JPEG, case-insensitive.
getImagePaths(path)
getImagePaths(path)
path |
Path to directory in which to search for images. Absolute or relative filepaths are fine. |
A vector of absolute filepaths to JPG and PNG images in the given directory.
In the event that no compatible images are found in the directory, it returns a message to that effect instead of an empty vector.
im.dir <- colordistance::getImagePaths(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance")) ## Not run: im.dir <- colordistance::getImagePaths("some/nonexistent/directory") ## End(Not run) im.dir <- colordistance::getImagePaths(getwd())
im.dir <- colordistance::getImagePaths(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance")) ## Not run: im.dir <- colordistance::getImagePaths("some/nonexistent/directory") ## End(Not run) im.dir <- colordistance::getImagePaths(getwd())
Uses KMeans clustering to determine color clusters that minimize the sum of distances between pixels and their assigned clusters. Useful for parsing common color motifs in an object.
getKMeanColors( path, n = 10, sample.size = 20000, plotting = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), iter.max = 50, nstart = 5, return.clust = TRUE, color.space = "rgb", from = "sRGB", ref.white )
getKMeanColors( path, n = 10, sample.size = 20000, plotting = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), iter.max = 50, nstart = 5, return.clust = TRUE, color.space = "rgb", from = "sRGB", ref.white )
path |
Path to an image (JPG or PNG). |
n |
Number of KMeans clusters to fit. Unlike |
sample.size |
Number of pixels to be randomly sampled from filtered pixel
array for performing fit. If set to |
plotting |
Logical. Should the results of the KMeans fit (original image + histogram of colors and bin sizes) be plotted? |
lower |
RGB triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). |
upper |
RGB triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no background filtering is
needed, set bounds to some non-numeric value ( |
iter.max |
Inherited from |
nstart |
Inherited from |
return.clust |
Logical. Should clusters be returned? If |
color.space |
The color space ( |
from |
Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer. |
ref.white |
The reference white passed to
|
A kmeans
fit object.
colordistance::getKMeanColors(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), n=3, return.clust=FALSE, lower=rep(0.8, 3), upper=rep(1, 3))
colordistance::getKMeanColors(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), n=3, return.clust=FALSE, lower=rep(0.8, 3), upper=rep(1, 3))
Performs getKMeanColors
on every image in a set of images and
returns a list of kmeans fit objects, where each dataframe contains the RGB
coordinates of the clusters and the percentage of pixels in the image
assigned to that cluster.
getKMeansList( images, bins = 10, sample.size = 20000, plotting = FALSE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), iter.max = 50, nstart = 5, img.type = FALSE, color.space = "rgb", from = "sRGB", ref.white )
getKMeansList( images, bins = 10, sample.size = 20000, plotting = FALSE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), iter.max = 50, nstart = 5, img.type = FALSE, color.space = "rgb", from = "sRGB", ref.white )
images |
A character vector of directories, image paths, or a combination of both. Takes either absolute or relative filepaths. |
bins |
Number of KMeans clusters to fit. Unlike |
sample.size |
Number of pixels to be randomly sampled from filtered pixel
array for performing fit. If set to |
plotting |
Logical. Should the results of the KMeans fit (original image + histogram of colors and bin sizes) be plotted for each image? |
lower |
RGB triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). |
upper |
RGB triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no background filtering is
needed, set bounds to some non-numeric value ( |
iter.max |
Inherited from |
nstart |
Inherited from |
img.type |
Logical. Should the image extension (.PNG or .JPG) be retained in the list names? |
color.space |
The color space ( |
from |
Original color space of images if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer. |
ref.white |
The reference white passed to
|
A list of kmeans fit objects, where the list element names are the original image names.
## Not run: # Takes a few seconds to run kmeans_list <- colordistance::getKMeansList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), bins=3, lower=rep(0.8, 3), upper=rep(1, 3), plotting=TRUE) ## End(Not run)
## Not run: # Takes a few seconds to run kmeans_list <- colordistance::getKMeansList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), bins=3, lower=rep(0.8, 3), upper=rep(1, 3), plotting=TRUE) ## End(Not run)
Computes a histogram in CIE Lab color space by sorting pixels into specified bins.
getLabHist( image, bins = 3, sample.size = 10000, ref.white, from = "sRGB", bin.avg = TRUE, alpha.channel = TRUE, as.vec = FALSE, plotting = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), title = "path", a.bounds = c(-128, 127), b.bounds = c(-128, 127), ... )
getLabHist( image, bins = 3, sample.size = 10000, ref.white, from = "sRGB", bin.avg = TRUE, alpha.channel = TRUE, as.vec = FALSE, plotting = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), title = "path", a.bounds = c(-128, 127), b.bounds = c(-128, 127), ... )
image |
Path to a valid image (PNG or JPG) or a |
bins |
Number of bins for each channel OR a vector of length 3 with bins for each channel. Bins = 3 will result in 3^3 = 27 bins; bins = c(2, 2, 3) will result in 2 * 2 * 3 = 12 bins (2 L, 2 a, 3 b), etc. |
sample.size |
Numeric. How many pixels should be randomly sampled from the non-background part of the image and converted into CIE Lab coordinates? If non-numeric, all pixels will be converted, but this can be very slow (see details). |
ref.white |
Reference white passed to |
from |
Original color space of image, probably either "sRGB" or "Apple RGB", depending on your computer. |
bin.avg |
Logical. Should the returned color clusters be the average of
the pixels in that bin (bin.avg= |
alpha.channel |
Logical. If available, should alpha channel transparency be
used to mask background? See |
as.vec |
Logical. Should the bin sizes just be returned as a vector?
Much faster if only using |
plotting |
Logical. Should a histogram of the bin colors and sizes be plotted? |
lower , upper
|
RGB or HSV triplets specifying the lower and upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no
background filtering is needed, set bounds to some non-numeric value
( |
title |
String for what the title the plot if plotting is on; defaults to the image name. |
a.bounds , b.bounds
|
Numeric ranges for the a (green-red) and b (blue-yellow) channels of Lab color space. Technically, a and b have infinite range, but in practice nearly all values fall between -128 and 127 (the default). Many images will have an even narrower range than this, depending on the lighting conditions and conversion; setting narrower ranges will result in finer-scale binning, without generating empty bins at the edges of the channels. |
... |
Additional arguments passed to |
getLabHist
uses convertColorSpace
to convert
pixels into CIE Lab coordinates, which requires a references white. There
are seven CIE standardized illuminants available in colordistance
(A, B, C, E, and D50, D55, and D65), but the most common are:
"A"
: Standard incandescent lightbulb
"D65"
:
Average daylight
"D50"
: Direct sunlight
Color conversions will be highly dependent on the reference white used, which is why no default is provided. Users should look into standard illuminants to choose an appropriate reference for a dataset.
The conversion from RGB to a standardized color space (XYZ, Lab, or Luv) is
approximate, non-linear, and relatively time-consuming. Converting a large
number of pixels can be computationally expensive, so
convertColorSpace
will randomly sample a specified number of rows to
reduce the time. The default sample size, 10,000 rows, takes about 1 second to
convert from sRGB to Lab space on an early 2015 Macbook with 8 GB of RAM.
Time scales about linearly with number of rows converted.
Unlike RGB or HSV color spaces, the three channels of CIE Lab color space do not all range between 0 and 1; instead, L (luminance) is always between 0 and 100, and the a (green-red) and b (blue-yellow) channels generally vary between -128 and 127, but usually occupy a narrower range depending on the reference white. To achieve the best results, ranges for a and b should be restricted to avoid generating empty bins.
A vector or dataframe (depending on whether as.vec = TRUE
) of bin
sizes and color coordinates.
path <- system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance") getLabHist(path, ref.white = "D65", bins = c(2, 3, 3), lower = rep(0.8, 3), upper = rep(1, 3), sample.size = 1000, ylim = c(0, 1))
path <- system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance") getLabHist(path, ref.white = "D65", bins = c(2, 3, 3), lower = rep(0.8, 3), upper = rep(1, 3), sample.size = 1000, ylim = c(0, 1))
Applies getLabHist
to every image in a provided set of image
paths and/or directories containing images.
getLabHistList( images, bins = 3, sample.size = 10000, ref.white, from = "sRGB", bin.avg = TRUE, as.vec = FALSE, plotting = FALSE, pausing = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), alpha.channel = TRUE, title = "path", a.bounds = c(-128, 127), b.bounds = c(-128, 127), ... )
getLabHistList( images, bins = 3, sample.size = 10000, ref.white, from = "sRGB", bin.avg = TRUE, as.vec = FALSE, plotting = FALSE, pausing = TRUE, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), alpha.channel = TRUE, title = "path", a.bounds = c(-128, 127), b.bounds = c(-128, 127), ... )
images |
Character vector of directories, image paths, or both. |
bins |
Number of bins for each channel OR a vector of length 3 with bins for each channel. Bins = 3 will result in 3^3 = 27 bins; bins = c(2, 2, 3) will result in 2 * 2 * 3 = 12 bins (2 L, 2 a, 3 b), etc. |
sample.size |
Numeric. How many pixels should be randomly sampled from the non-background part of the image and converted into CIE Lab coordinates? If non-numeric, all pixels will be converted, but this can be very slow (see details). |
ref.white |
Reference white passed to |
from |
Original color space of image, probably either "sRGB" or "Apple RGB", depending on your computer. |
bin.avg |
Logical. Should the returned color clusters be the average of
the pixels in that bin (bin.avg= |
as.vec |
Logical. Should the bin sizes just be returned as a vector?
Much faster if only using |
plotting |
Logical. Should a histogram of the bin colors and sizes be plotted? |
pausing |
Logical. If |
lower , upper
|
RGB or HSV triplets specifying the lower and upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no
background filtering is needed, set bounds to some non-numeric value
( |
alpha.channel |
Logical. If available, should alpha channel transparency be
used to mask background? See |
title |
String for what the title the plot if plotting is on; defaults to the image name. |
a.bounds , b.bounds
|
Numeric ranges for the a (green-red) and b (blue-yellow) channels of Lab color space. Technically, a and b have infinite range, but in practice nearly all values fall between -128 and 127 (the default). Many images will have an even narrower range than this, depending on the lighting conditions and conversion; setting narrower ranges will result in finer-scale binning, without generating empty bins at the edges of the channels. |
... |
Additional arguments passed to |
getLabHist
uses convertColorSpace
to convert
pixels into CIE Lab coordinates, which requires a references white. There
are seven CIE standardized illuminants available in colordistance
(A, B, C, E, and D50, D55, and D65), but the most common are:
"A"
: Standard incandescent lightbulb
"D65"
:
Average daylight
"D50"
: Direct sunlight
Color conversions will be highly dependent on the reference white used, which is why no default is provided. Users should look into standard illuminants to choose an appropriate reference for a dataset.
Unlike RGB or HSV color spaces, the three channels of CIE Lab color space do
not all range between 0 and 1; instead, L (luminance) is always between 0 and
100, and the a (green-red) and b (blue-yellow) channels generally vary
between -128 and 127, but usually occupy a narrower range depending on the
reference white. The exception is reference white A (standard incandescent
lighting), which tends to have lower values when converting with
convertColor
.
A list of getLabHist
dataframes, 1 per image, named
by image name.
images <- system.file("extdata", "Heliconius/Heliconius_B", package="colordistance") colordistance::getLabHistList(images, bins = 2, sample.size = 1000, ref.white = "D65", plotting = TRUE, pausing = FALSE, lower = rep(0.8, 3), upper = rep(1, 3), a.bounds = c(-100, 100), b.bounds = c(-127, 100), ylim = c(0, 1))
images <- system.file("extdata", "Heliconius/Heliconius_B", package="colordistance") colordistance::getLabHistList(images, bins = 2, sample.size = 1000, ref.white = "D65", plotting = TRUE, pausing = FALSE, lower = rep(0.8, 3), upper = rep(1, 3), a.bounds = c(-100, 100), b.bounds = c(-127, 100), ylim = c(0, 1))
Plots a heatmap of a symmetrical distance matrix in order to visualize
similarity/dissimilarity in scores. Values are clustered by similarity using
hclust
.
heatmapColorDistance( clusterList_or_matrixObject, main = NULL, col = "default", margins = c(6, 8), ... )
heatmapColorDistance( clusterList_or_matrixObject, main = NULL, col = "default", margins = c(6, 8), ... )
clusterList_or_matrixObject |
Either a list of identically sized
dataframes with 4 columns each (3 color channels + Pct) as output by
|
main |
Title for heatmap plot. |
col |
Color scale for heatmap from low to high. Default is
|
margins |
Margins for column and row labels. |
... |
Additional arguments passed on to |
Heatmap representation of distance matrix.
## Not run: # Takes a few seconds to run cluster.list <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), lower=rep(0.8, 3), upper=rep(1, 3)) CDM <- colordistance::getColorDistanceMatrix(cluster.list, plotting=FALSE) colordistance::heatmapColorDistance(CDM, main="Heliconius color similarity") colordistance::heatmapColorDistance(cluster.list, col=colorRampPalette(c("red", "cyan", "blue"))(n=299)) ## End(Not run)
## Not run: # Takes a few seconds to run cluster.list <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), lower=rep(0.8, 3), upper=rep(1, 3)) CDM <- colordistance::getColorDistanceMatrix(cluster.list, plotting=FALSE) colordistance::heatmapColorDistance(CDM, main="Heliconius color similarity") colordistance::heatmapColorDistance(cluster.list, col=colorRampPalette(c("red", "cyan", "blue"))(n=299)) ## End(Not run)
Takes images, computes color clusters for each image, and calculates distance matrix/dendrogram from those clusters.
imageClusterPipeline( images, cluster.method = "hist", distance.method = "emd", lower = c(0, 140/255, 0), upper = c(60/255, 1, 60/255), hist.bins = 3, kmeans.bins = 27, bin.avg = TRUE, norm.pix = FALSE, plot.bins = FALSE, pausing = TRUE, color.space = "rgb", ref.white, from = "sRGB", bounds = c(0, 1), sample.size = 20000, iter.max = 50, nstart = 5, img.type = FALSE, ordering = "default", size.weight = 0.5, color.weight = 0.5, plot.heatmap = TRUE, return.distance.matrix = TRUE, save.tree = FALSE, save.distance.matrix = FALSE, a.bounds = c(-127, 128), b.bounds = c(-127, 128) )
imageClusterPipeline( images, cluster.method = "hist", distance.method = "emd", lower = c(0, 140/255, 0), upper = c(60/255, 1, 60/255), hist.bins = 3, kmeans.bins = 27, bin.avg = TRUE, norm.pix = FALSE, plot.bins = FALSE, pausing = TRUE, color.space = "rgb", ref.white, from = "sRGB", bounds = c(0, 1), sample.size = 20000, iter.max = 50, nstart = 5, img.type = FALSE, ordering = "default", size.weight = 0.5, color.weight = 0.5, plot.heatmap = TRUE, return.distance.matrix = TRUE, save.tree = FALSE, save.distance.matrix = FALSE, a.bounds = c(-127, 128), b.bounds = c(-127, 128) )
images |
Character vector of directories, image paths, or both. |
cluster.method |
Which method for getting color clusters from each image
should be used? Must be either |
distance.method |
One of four possible comparison methods for calculating
the color distances: |
lower |
RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). |
upper |
RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no background filtering is
needed, set bounds to some non-numeric value ( |
hist.bins |
Only applicable if |
kmeans.bins |
Only applicable if |
bin.avg |
Logical. Should the color clusters used for the distance matrix
be the average of the pixels in that bin (bin.avg= |
norm.pix |
Logical. Should RGB or HSV cluster values be normalized using
|
plot.bins |
Logical. Should the bins for each image be plotted as they are calculated? |
pausing |
Logical. If |
color.space |
The color space ( |
ref.white |
The reference white passed to
|
from |
Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer. |
bounds |
Upper and lower limits for the channels; R reads in images with intensities on a 0-1 scale, but 0-255 is common. |
sample.size |
Only applicable if |
iter.max |
Only applicable if |
nstart |
Only applicable if |
img.type |
Logical. Should file extensions be retained with labels? |
ordering |
Logical if not left as "default". Should the color clusters
in the list be reordered to minimize the distances between the pairs? If
left as default, ordering depends on distance method: "emd" and "chisq" do
not order clusters ("emd" orders on a case-by-case in the
|
size.weight |
Weight of size similarity in determining overall score and
ordering (if |
color.weight |
Weight of color similarity in determining overall score
and ordering (if |
plot.heatmap |
Logical. Should a heatmap of the distance matrix be plotted? |
return.distance.matrix |
Logical. Should the distance matrix be returned to the R environment or just plotted? |
save.tree |
Either logical or a filepath for saving the tree; default if
set to |
save.distance.matrix |
Either logical or filepath for saving distance
matrix; default if set to |
a.bounds , b.bounds
|
Passed to |
Color distance matrix, heatmap, and saved distance matrix and tree
files if saving is TRUE
.
This is the fastest way to get a distance matrix for color similarity
starting from a folder of images. Essentially, it just calls in a series of
other package functions in order: input images -> getImagePaths
-> getHistList
or getKMeansList
followed by
extractClusters
-> getColorDistanceMatrix
->
plotting -> return/save distance matrix. Sort of railroads you, but good for
testing different combinations of clustering methods and distance metrics.
## Not run: colordistance::imageClusterPipeline(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), color.space="hsv", lower=rep(0.8, 3), upper=rep(1, 3), cluster.method="hist", distance.method="emd", hist.bins=3, plot.bins=TRUE, save.tree="example_tree.newick", save.distance.matrix="example_DM.csv") ## End(Not run)
## Not run: colordistance::imageClusterPipeline(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), color.space="hsv", lower=rep(0.8, 3), upper=rep(1, 3), cluster.method="hist", distance.method="emd", hist.bins=3, plot.bins=TRUE, save.tree="example_tree.newick", save.distance.matrix="example_DM.csv") ## End(Not run)
Imports a single image and returns a list with the original image as a 3D array, a 2D matrix with background pixels removed, and the absolute path to the original image.
loadImage( path, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), hsv = TRUE, CIELab = FALSE, sample.size = 1e+05, ref.white = NULL, alpha.channel = TRUE, alpha.message = FALSE )
loadImage( path, lower = c(0, 0.55, 0), upper = c(0.24, 1, 0.24), hsv = TRUE, CIELab = FALSE, sample.size = 1e+05, ref.white = NULL, alpha.channel = TRUE, alpha.message = FALSE )
path |
Path to image (a string). |
lower |
RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). |
upper |
RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no background filtering is
needed, set bounds to some non-numeric value ( |
hsv |
Logical. Should HSV pixel array also be calculated? Setting to
|
CIELab |
Logical. Should CIEL*a*b color space pixels be calculated from RGB? Requires specification of a reference white (see details). |
sample.size |
Number of pixels to be randomly sampled from filtered pixel array for conversion. If not numeric, all pixels are converted. |
ref.white |
String; white reference for converting from RGB to CIEL*a*b
color space. Accepts any of the standard white references for
|
alpha.channel |
Logical. If available, should alpha channel transparency be
used to mask background? See |
alpha.message |
Logical. Output a message if using alpha channel transparency to mask background? Helpful for troubleshooting with PNGs. |
The upper and lower limits for background pixel elimination set the inclusive bounds for which pixels should be ignored for the 2D arrays; while all background pixels are ideally a single color, images photographed against "uniform" backgrounds often contain some variation, and even segmentation done with photo editing software will produce some variance as a result of image compression.
The upper and lower bounds represent cutoffs: any pixel for which the first channel falls between the first upper and lower bounds, the second channel falls between the second upper and lower bounds, and the third channel falls between the third upper and lower bounds, will be ignored. For example, if you have a green pixel with RGB channel values [0.1, 0.9, 0.2], and your upper and lower bounds were (0.2, 1, 0.2) and (0, 0.6, 0) respectively, the pixel would be ignored because 0 <= 0.1 <= 0.2, 0.6 <= 0.9 <= 1, and 0 <= 0.2 <= 0.2. But a pixel with the RGB channel values [0.3, 0.9, 0.2] would not be considered background because 0.3 >= 0.2.
CIEL*a*b color space requires a reference 'white light' color (dimly and
brightly lit photographs of the same object will have very different RGB
palettes, but similar Lab palettes if appropriate white references are used).
The idea here is that the apparent colors in an image depend not just on the
"absolute" color of an object (whatever that means), but also on the
available light in the scene. There are seven CIE standardized illuminants
available in colordistance
(A, B, C, E, and D50, D55, and D60), but
the most common are:
"A"
: Standard incandescent
lightbulb
"D65"
: Average daylight
"D50"
: Direct
sunlight
Color conversions will be highly dependent on the reference white used, which is why no default is provided. Users should look into standard illuminants to choose an appropriate reference for a dataset.
A list with original image ($original.rgb, 3D array), 2D matrix with background pixels removed ($filtered.rgb.2d and $filtered.hsv.2d), and path to the original image ($path).
The 3D array is useful for displaying the original image, while the 2D arrays (RGB and HSV) are treated as rows of data for clustering in the rest of the package.
loadedImg <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3)) loadedImgNoHSV <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3), hsv=FALSE)
loadedImg <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3)) loadedImgNoHSV <- colordistance::loadImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance"), upper=rep(1, 3), lower=rep(0.8, 3), hsv=FALSE)
Converts clusters from raw channel intensity to their fraction of the intensity for that cluster
normalizeRGB(extractClustersObject)
normalizeRGB(extractClustersObject)
extractClustersObject |
A list of color clusters such as those returned
by |
A list of the same size and structure as the input list, but with the cluster normalized as described.
This is a useful option if your images have a lot of variation in lighting, but obviously comes at the cost of reducing variation (if darker and lighter colors are meaningful sources of variation in the dataset).
For example, a bright yellow (R=1, G=1, B=0) and a darker yellow (R=0.8, G=0.8, B=0) both have 50% red, 50% green, and 0% blue, so their normalized values would be equivalent.
A similar but less harsh alternative would be to use HSV rather than RGB for
pixel binning and color similarity clustering by setting hsv=T
in
clustering functions and specifying a low number of 'value' bins (e.g.
bins=c(8, 8, 2)
).
cluster.list <- colordistance::getKMeansList(c(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3))) cluster.list <- colordistance::extractClusters(cluster.list) colordistance:::normalizeRGB(cluster.list)
cluster.list <- colordistance::getKMeansList(c(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3))) cluster.list <- colordistance::extractClusters(cluster.list) colordistance:::normalizeRGB(cluster.list)
Reorders clusters to minimize color distance using the
Hungarian algorithm
as implemented by solve_LSAP
.
orderClusters(extractClustersObject)
orderClusters(extractClustersObject)
extractClustersObject |
A list of color clusters such as those returned
by |
Briefly: Euclidean distances between every possible pair of clusters across two dataframes are calculated, and pairs of clusters are chosen in order to minimize the total sum of color distances between the cluster pairs (i.e. A1-B1, A2-B2, etc).
For example, if dataframe A has a black cluster, a white cluster, and a blue cluster, in that order, and dataframe B has a white cluster, a blue cluster, and a grey cluster, in that order, the final pairs might be A1-B3 (black and grey), A2-B2 (blue and blue), and A3-B1 (white and white).
Rows are reordered so that paired rows are in the same row index (in the example, dataframe B would be reshuffled to go grey, blue, white instead of white, grey, blue).
A list with identical data to the input list, but with rows in each dataframe reordered to minimize color distances per cluster pair.
cluster.list <- colordistance::getKMeansList(c(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3))) cluster.list <- colordistance::extractClusters(cluster.list) colordistance:::orderClusters(cluster.list)
cluster.list <- colordistance::getKMeansList(c(system.file("extdata", "Heliconius/Heliconius_A", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3))) cluster.list <- colordistance::extractClusters(cluster.list) colordistance:::orderClusters(cluster.list)
Tiny little function wrapper, mostly used for looping or when several plots are output by a single function. Waits for user keystroke to move on to next image or exit.
pause()
pause()
for (i in c(1:5)) { print(i) if (i < 5) { colordistance:::pause() } }
for (i in c(1:5)) { print(i) if (i < 5) { colordistance:::pause() } }
Interactive, 3D plot_ly
plots of cluster sizes and
colors for each image in a list of cluster dataframes in order to visualize
cluster output.
plotClusters( cluster.list, color.space = "rgb", p = "all", pausing = TRUE, ref.white, to = "sRGB" )
plotClusters( cluster.list, color.space = "rgb", p = "all", pausing = TRUE, ref.white, to = "sRGB" )
cluster.list |
A list of identically sized dataframes with 4 columns each
(R, G, B, Pct or H, S, V, Pct) as output by |
color.space |
The color space ( |
p |
Numeric vector of indices for which elements to plot; otherwise each set of clusters is plotted in succession. |
pausing |
Logical. Should the function pause and wait for user keystroke before plotting the next plot? |
ref.white |
The reference white passed to
|
to |
Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer. |
A 3D plot_ly
plot of cluster sizes in the
specified colorspace for each cluster dataframe provided.
## Not run: # Takes >10 seconds cluster.list <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClusters(cluster.list, p=c(1:3, 7:8), pausing=FALSE) clusterListHSV <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), hsv=TRUE, plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClusters(clusterListHSV, p=c(1:3, 7:8), hsv=TRUE, pausing=FALSE) ## End(Not run)
## Not run: # Takes >10 seconds cluster.list <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClusters(cluster.list, p=c(1:3, 7:8), pausing=FALSE) clusterListHSV <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), hsv=TRUE, plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClusters(clusterListHSV, p=c(1:3, 7:8), hsv=TRUE, pausing=FALSE) ## End(Not run)
Plots cluster sets from several different dataframes on a single plot for easy comparison.
plotClustersMulti( cluster.list, color.space = "rgb", p = "all", title = "", ref.white, to = "sRGB" )
plotClustersMulti( cluster.list, color.space = "rgb", p = "all", title = "", ref.white, to = "sRGB" )
cluster.list |
A list of identically sized dataframes with 4 columns
each as output by |
color.space |
The color space ( |
p |
Numeric vector of indices for which elements to plot; otherwise all of the cluster sets provided will be plotted together. |
title |
Optional title for the plot. |
ref.white |
The reference white passed to
|
to |
Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer. |
A single plot_ly
plot of every cluster in a
list of cluster sets. Each cluster is colored by cluster color,
proportional to cluster size, and labeled according to the image from which
it originated.
Each cluster plotted is colored according to its actual color, and labeled according to the image from which it originated.
## Not run: # Takes >10 seconds cluster.list <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClustersMulti(cluster.list, p=c(1:4), title="Orange and black Heliconius") colordistance::plotClustersMulti(cluster.list, p=c(5:8), title="Black, yellow, and red Heliconius") clusterListHSV <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), hsv=TRUE, plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClustersMulti(clusterListHSV, p=c(1:3, 7:8), hsv=TRUE) ## End(Not run)
## Not run: # Takes >10 seconds cluster.list <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClustersMulti(cluster.list, p=c(1:4), title="Orange and black Heliconius") colordistance::plotClustersMulti(cluster.list, p=c(5:8), title="Black, yellow, and red Heliconius") clusterListHSV <- colordistance::getHistList(dir(system.file("extdata", "Heliconius/", package="colordistance"), full.names=TRUE), hsv=TRUE, plotting=FALSE, lower=rep(0.8, 3), upper=rep(1, 3)) colordistance::plotClustersMulti(clusterListHSV, p=c(1:3, 7:8), hsv=TRUE) ## End(Not run)
Plots a color histogram from a dataframe as returned by
getImageHist
, getHistList
, or
extractClusters
. Bars are colored according to the color of the
bin.
plotHist( histogram, pausing = TRUE, color.space = "rgb", ref.white, from = "sRGB", main = "default", ... )
plotHist( histogram, pausing = TRUE, color.space = "rgb", ref.white, from = "sRGB", main = "default", ... )
histogram |
A single dataframe or a list of dataframes as returned by
|
pausing |
Logical. Pause and wait for keystroke before plotting the next histogram? |
color.space |
The color space ( |
ref.white |
The reference white passed to
|
from |
Display color space of image if clustering in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer. |
main |
Title for plot. If |
... |
Optional arguments passed to the |
color_df <- as.data.frame(matrix(rep(seq(0, 1, length.out=3), 3), nrow=3, ncol=3)) color_df$Pct <- c(0.2, 0.5, 0.3) colordistance::plotHist(color_df, main="Example plot")
color_df <- as.data.frame(matrix(rep(seq(0, 1, length.out=3), 3), nrow=3, ncol=3)) color_df$Pct <- c(0.2, 0.5, 0.3) colordistance::plotHist(color_df, main="Example plot")
Plots an image as an image.
plotImage(img)
plotImage(img)
img |
Either a path to an image or a |
Redundant, but a nice sanity check. Used in a few other functions in
colordistance
package. Takes either a path to an image (RGB or PNG) or
an image object as read in by loadImage
.
A plot of the provided image in the current plot window.
colordistance::plotImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance")) colordistance::plotImage(loadImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)))
colordistance::plotImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance")) colordistance::plotImage(loadImage(system.file("extdata", "Heliconius/Heliconius_A/Heliconius_01.jpeg", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)))
Plots non-background pixels according to their color coordinates, and colors them according to their RGB or HSV values. Dimensions are either RGB or HSV depending on flags.
plotPixels( img, n = 10000, lower = c(0, 0.55, 0), upper = c(0.25, 1, 0.25), color.space = "rgb", ref.white = NULL, pch = 20, main = "default", from = "sRGB", xlim = "default", ylim = "default", zlim = "default", ... )
plotPixels( img, n = 10000, lower = c(0, 0.55, 0), upper = c(0.25, 1, 0.25), color.space = "rgb", ref.white = NULL, pch = 20, main = "default", from = "sRGB", xlim = "default", ylim = "default", zlim = "default", ... )
img |
Either a path to an image or a |
n |
Number of randomly selected pixels to plot; recommend <20000 for speed. If n exceeds the number of non-background pixels in the image, all pixels are plotted. If n is not numeric, all pixels are plotted. |
lower |
RGB or HSV triplet specifying the lower bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). |
upper |
RGB or HSV triplet specifying the upper bounds for background pixels. Default upper and lower bounds are set to values that work well for a bright green background (RGB [0, 1, 0]). Determining these bounds may take some trial and error, but the following bounds may work for certain common background colors:
If no background filtering is
needed, set bounds to some non-numeric value ( |
color.space |
The color space ( |
ref.white |
The reference white passed to
|
pch |
Passed to |
main |
Plot title. If left as "default", image name is used. |
from |
Original color space of image if plotting in CIE Lab space, probably either "sRGB" or "Apple RGB", depending on your computer. |
xlim , ylim , zlim
|
Ranges for the X, Y, and Z axes. If "default", the widest ranges for each axis according to the specified color space (0-1 for RGB and HSV, 0-100 for L of Lab, -128-127 for a and b of Lab) are used. |
... |
Optional parameters passed to
|
3D plot of pixels in either RGB or HSV color space, colored according
to their color in the image. Uses
scatterplot3d
function.
If n
is not numeric, then all pixels are plotted, but this is
not recommended. Unless the image has a low pixel count, it takes much
longer, and plotting this many points in the plot window can obscure
important details.
There are seven CIE standardized illuminants available in
colordistance
(A, B, C, E, and D50, D55, and D65), but the most
common are:
"A"
: Standard incandescent lightbulb
"D65"
: Average daylight
"D50"
: Direct sunlight
colordistance::plotPixels(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), n=20000, upper=rep(1, 3), lower=rep(0.8, 3), color.space = "rgb", angle = -45)
colordistance::plotPixels(system.file("extdata", "Heliconius/Heliconius_B/Heliconius_07.jpeg", package="colordistance"), n=20000, upper=rep(1, 3), lower=rep(0.8, 3), color.space = "rgb", angle = -45)
Take an image array (from readPNG
or
jpeg{readJPEG}
) and remove the background pixels based on
transparency (if a PNG with transparency) or color boundaries.
removeBackground( img, lower = NULL, upper = NULL, quietly = FALSE, alpha.channel = TRUE )
removeBackground( img, lower = NULL, upper = NULL, quietly = FALSE, alpha.channel = TRUE )
img |
|
lower , upper
|
RGB or HSV triplets specifying the bounds for background
pixels. See |
quietly |
Logical. Display a message if using transparency? |
alpha.channel |
Logical. If available, should alpha channel transparency be used to mask background? See details. |
If alpha.channel = TRUE
, transparency takes precedence over
color masking. If you provide a PNG with any pixels with alpha < 1,
removeBackground
ignores any lower
and upper
color
boundaries and assumes transparent pixels are background. If all pixels are
opaque (alpha = 1), color masking will apply.
A list with a 3-dimensional RGB array and a 2-dimensional array of non-background pixels with R, G, B columns.
# remove background by transparency img_path <- system.file("extdata/chrysochroa_NPL.png", package = "colordistance") img_array <- png::readPNG(img_path) img_filtered <- removeBackground(img_array) # remove background by color img_path <- dir(system.file("extdata/Heliconius", package = "colordistance"), recursive = TRUE, full.names = TRUE)[1] img_array <- jpeg::readJPEG(img_path) img_filtered <- removeBackground(img_array, lower = rep(0.8, 3), upper = rep(1, 3))
# remove background by transparency img_path <- system.file("extdata/chrysochroa_NPL.png", package = "colordistance") img_array <- png::readPNG(img_path) img_filtered <- removeBackground(img_array) # remove background by color img_path <- dir(system.file("extdata/Heliconius", package = "colordistance"), recursive = TRUE, full.names = TRUE)[1] img_array <- jpeg::readJPEG(img_path) img_filtered <- removeBackground(img_array, lower = rep(0.8, 3), upper = rep(1, 3))
Uses scatterplot3d
to plot clusters in color
space.
scatter3dclusters( clusters, color.space, ref.white = "D65", xlim = "default", ylim = "default", zlim = "default", main = "Color clusters", scaling = 10, opacity = 0.9, plus = 0.01, ... )
scatter3dclusters( clusters, color.space, ref.white = "D65", xlim = "default", ylim = "default", zlim = "default", main = "Color clusters", scaling = 10, opacity = 0.9, plus = 0.01, ... )
clusters |
A single dataframe or a list of dataframes as returned by
|
color.space |
The color space ( |
ref.white |
Standard reference white for converting lab coordinates to RGB coordinates for coloring clusters. One of either "A", "B", "C", "E", "D50", "D55", or "D65". |
xlim , ylim , zlim
|
X, Y, and Z-axis limits. If not specified, the defaults are 0-1 for all channels in RGB and HSV space, or 0-100 for L and -100-100 for a and b channels of CIE Lab space. |
main |
Title for the plot. |
scaling |
Scaling factor for size of clusters. |
opacity |
Transparency value for plotting; must be between 0 and 1. |
plus |
Amount to add to percent column for plotting; can help to make very small (or 0) clusters visible. |
... |
Additional parameters passed to
|
plotClusters
, plotClustersMulti
clusters <- data.frame(R = runif(20, min = 0, max = 1), G = runif(20, min = 0, max = 1), B = runif(20, min = 0, max = 1), Pct = runif(20, min = 0, max = 1)) # plot in RGB space scatter3dclusters(clusters, scaling = 15, plus = 0.05) # overrule determined color space and plot in HSV space scatter3dclusters(clusters, scaling = 15, plus = 0.05, color.space = "hsv")
clusters <- data.frame(R = runif(20, min = 0, max = 1), G = runif(20, min = 0, max = 1), B = runif(20, min = 0, max = 1), Pct = runif(20, min = 0, max = 1)) # plot in RGB space scatter3dclusters(clusters, scaling = 15, plus = 0.05) # overrule determined color space and plot in HSV space scatter3dclusters(clusters, scaling = 15, plus = 0.05, color.space = "hsv")
Distance metric with optional user input for specifying how much the bin size similarity and color similarity should be weighted when pairing clusters from different color cluster sets.
weightedPairsDistance( T1, T2, ordering = FALSE, size.weight = 0.5, color.weight = 0.5 )
weightedPairsDistance( T1, T2, ordering = FALSE, size.weight = 0.5, color.weight = 0.5 )
T1 |
Dataframe (especially a dataframe as returned by
|
T2 |
Another dataframe like T1. |
ordering |
Logical. Should clusters by paired in order to minimize overall distance scores or evaluated in the order given? |
size.weight |
Weight of size similarity in determining overall score and ordering (if ordering=T). |
color.weight |
Weight of color similarity in determining overall score and ordering (if ordering=T). Color and size weights do not necessarily have to sum to 1. |
Similarity score based on size and color similarity of each pair of points in provided dataframes.
Use with caution, since weights can easily swing distance scores more
dramatically than might be expected. For example, if size.weight
= 1
and color.weight
= 0, two clusters of identical color but different
sizes would not be compared.
cluster.list <- colordistance::getKMeansList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) cluster.list <- colordistance::extractClusters(cluster.list, ordering=TRUE) colordistance:::weightedPairsDistance(cluster.list[[1]], cluster.list[[2]], size.weight=0.8, color.weight=0.2)
cluster.list <- colordistance::getKMeansList(system.file("extdata", "Heliconius/Heliconius_B", package="colordistance"), lower=rep(0.8, 3), upper=rep(1, 3)) cluster.list <- colordistance::extractClusters(cluster.list, ordering=TRUE) colordistance:::weightedPairsDistance(cluster.list[[1]], cluster.list[[2]], size.weight=0.8, color.weight=0.2)