Skip to main content

Recognition API

Coolr's computer vision API performs various steps:

  1. Stitch: Coolr's WiFi Vista uses 2 cameras to capture more area. These images are stitched together to create a single image.
  2. Obfuscate human body parts
  3. Identify area of interest: Any area outside the area of inteerst (asset) is discarded
  4. Segment the view into shelves/ baskets
  5. Identify products and empty spaces
  6. Identify stacking
  7. Identify any labels
  8. Identify SKUs for products

The API returns the results in JSON format:

{
"shelf": [
{
"products": [
{
"product": "Ben \u0026 Jerry\u0027s Pint Chocolate Fudge Brownie 16oz 1x8 PK",
"sku": [
"24665"
],
"confidence": 100,
"data": [
[
498,
702
],
[
498,
806
],
[
595,
806
],
[
595,
702
]
],
"position": 1,
"capacity": 0,
"stacked": [
{
"product": "-1000",
"sku": [
"-1000"
],
"confidence": 0.95148211717605591,
"data": [
[
491,
592
],
[
491,
707
],
[
590,
707
],
[
590,
592
]
],
"position": 1,
"capacity": 0,
"stacked": null,
"stackSize": 0
}
],
"stackSize": 0
},
{
"product": "Ben \u0026 Jerry\u0027s Pint Choc. Chip Cookie Dough 16oz 1x8 PK",
"sku": [
"24664"
],
"confidence": 100,
"data": [
[
599,
685
],
[
599,
776
],
[
691,
776
],
[
691,
685
]
],
"position": 2,
"capacity": 0,
"stacked": [
{
"product": "-1000",
"sku": [
"-1000"
],
"confidence": 0.89078009128570557,
"data": [
[
589,
581
],
[
589,
696
],
[
688,
696
],
[
688,
581
]
],
"position": 3,
"capacity": 0,
"stacked": null,
"stackSize": 0
}
],
"stackSize": 0
}
]
}
]
}

Note: For brevity, data is truncated.

As it can be seen in the result above, the API returns an array of shelves which has array of products. Each product has:

  1. product
  2. sku array (if multiple products/ stacked)
  3. confidence level %age
  4. data - product coordinates
  5. position - guessed position on the shelf (left to right)
  6. capacity - if there is any open capacity at this position
  7. stacked - an array of product if additional products are stacked