How To Use Google Vision AI With GPT-3: A Quick Demo (2022)

Olivia Rhye
11 Jan 2022
5 min read

AI might feel like magic, but it's a power anyone can wield if you just have the right tutorial to teach you. In the following demo, we are going to build an app with Vision AI and GPT-3 that transforms your receipts into actionable data you can use for accounting, among other things. Check out the end-result in the Demo page if you're in a hurry to try it.

Vision AI is a Google Cloud service that provides models to classify images, detect objects, read writings, and much more―while OpenAI's GPT-3 is an API to understand and process natural language. When we combine the two in creative ways with Rowy's low-code spreadsheets, we can make magic. Let's see them in action!

1. Creating a new Rowy table

First, click this Install link and let Rowy walk you through it. It only takes 5 minutes to create the account and connect with Firebase.

Then, we are going to create a new Firestore collection called expenses by adding a new table.

If all is good, you should see an empty table ready to use:

0.jpg

2. Obtaining An Image Input

Let's add our first column to collect image inputs. Click Add new column and pick Image as a field type:

1.jpg

Every time we need to scan a receipt, we can just add it in a new row and let the app do all the processing. But first, we need to code it.

3. Detecting text in an image with Google Vision AI

We then create a derivative column to scan the input image. A derivative column derives its value from other columns, as its name suggests. We use this column type to add code logic used to populate some spreadsheet cells.

In this case, we take the image input and use Google's Vision AI to detect text contained in the image:

const derivative:Derivative = async ({row,ref,db,storage,auth})=>{
    const vision = require('@google-cloud/vision');

  const client = new vision.ImageAnnotatorClient();

  let url = row.receiptImage[0].downloadURL
  const res = await fetch(url).then(res => res.arrayBuffer())

  const dataBuffer = new Buffer(res)

  const [ result ] = await client.textDetection(dataBuffer);

  return result.textAnnotations[0].description
}

When you create a derivative column, you can go to the column settings and simply copy/paste the aforementionned code:

2.jpg

Now, every time we add an invoice image, the column will automatically fill up with the matching text:

3.jpg

4. Parsing text with OpenAI GPT-3

Now we need to transform the text to make it easy for a program to parse. That's where OpenAI's GPT-3 comes in.

We create a new derivative column parsedData and use it to make an API call to OpenAI to transform the blob of text into a JSON structure containing a total, a currency, and an invoice date:

const derivative:Derivative = async ({row,ref,db,storage,auth})=>{
  if (!row.ocr) return;
  const openai_key = await rowy.secrets.get("openai")

  const res = await fetch("https://api.openai.com/v1/completions", {

    method: "POST",
    headers: {
      "Content-Type": "application/json",
      "Authorization": "Bearer " + openai_key
    },
    body: JSON.stringify({
      "prompt": `${row.ocr}\n========\njson output of "totalCost" as number , "currencySymbol" and "date" as ISO Date String\n{`,
      "temperature": 0.09,
      "model": "text-davinci-003",
      "max_tokens": 256,
      "top_p": 1,
      "frequency_penalty": 0,
      "presence_penalty": 0,
      "stop": ["}"]
    })
  }).then(res => res.json())

  return JSON.parse(`{${res.choices[0].text}}`)
}

This little piece of code is powerful enough to transform the blob of text into a valid JSON structure:

4.jpg

5. Extracting the total cost and the invoice date

At this stage, extracting the parsed data into distinct columns is a piece of cake.

To obtain the total cost of the bill as a derivative column totalCost:

const derivative:Derivative = async ({row,ref,db,storage,auth})=>{
  return row.parsedData.totalCost
}

And for the date of the receipt as date:

const derivative:Derivative = async ({row,ref,db,storage,auth})=>{
  return new Date(row.parsedData.date)
}

Which gives us the final result:

5.jpg

Do the same in one click with our ready-to-use Expenses template.

Check Out More Demos

And that's a wrap! Vision AI and GPT-3 are powerful, but what about other AI tools and services? We've got you covered with 24 other demos and examples on how to use Rowy to build powerful apps, like Face Restoration with Replicate API, image generation with Stable Diffusion, or even emojify with GPT-3.

Olivia Rhye
11 Jan 2022
5 min read
Get started with Rowy in minutes

Continue reading

Browse all