Building a Document Conversion Micro service Using LibreOffice, Golang, and gRPC
Turning user input into downloadable documents is easy until you need PDFs, DOCX, XLSX documents, oh and it has to work offline. You also most likely don’t want to pay for a SaaS solution either. Our solution? Golang, gRPC, and a headless LibreOffice instance.
We started off with having users create these documents dynamically from data input, but we needed to be able to convert a Word or Excel document to a PDF. We could have just written a bunch of code to also generate the documents as PDF, or used a SaaS product to do the conversion for us, but for our purposes, we had some really simple documents that needed converting, and didn't want to pay for converting documents on the fly. Additionally our client requested the application to work without internet.
That is where we thought, hey we can do this in Microsoft Office, or LibreOffice...oh hey LibreOffice is free to use, and we can use a headless instance of LibreOffice parameters to do the document conversion for us. Our project was born on this idea and below was the first iteration of it using Golang.
func runLibreoffice(
ogFileType string,
newFileType string,
randBytes string,
)error{
var libreofficeOptions string
if ogFileType == "pdf" {
libreofficeOptions = "--infilter='writer_pdf_import'"
}
command := fmt.Sprintf(
"libreoffice %s -env:UserInstallation=file://%s%s_lo --headless -convert-to %s %s",
libreofficeOptions,
storagePath,
randBytes,
newFileType,
randBytes,
)
libreofficeCmd := exec.Command("bash", "-c", command)
libreofficeCmd.Dir="/dev/shm"
return libreofficeCmd.Run()
}Day 0 Code
This worked fine initially but we expanded how many documents we were creating and we had a few ideas.
- Wow LibreOffice is large in our API docker image.
- We can reuse this in other projects.
- This code doesn't really fit with our established pattern of the API.
So what about a micro service? Before we continue you should be very careful breaking code out into micro services. Its all the rage these days and its easy to fall into a unnecessary trap. Below are a few guidelines we follow before breaking a piece of code out into its own micro service.
- Are we sure you will use this code in other projects?
- Does it need resources that the rest of the code base doesn't?
- Can we effectively monitor and update the micro service with our team size and budget?
- Does the code we want to move into a micro service only do one thing?
Well we checked all those boxes and started breaking the code out into a micro service.
First we built a simple HTTP server wrapper in Golang.
This worked fine, especially if you wanted to host this micro service as an API and have resources from outside the network to be able to access it. In our case though we wanted to have the micro service in our Kubernetes Cluster, so the obvious answer was...gRPC? gRPC is often thought as difficult or complex due to the past, however, with Golang it is quite easy to get started on writing your own gRPC enabled micro service.
Protoc helps us easily generate the protobuf definitions needed to structure our data within our gRPC micro service. We will not only need these definitions for our server, but also our client.
Then we need to define how our API will receive and send data. We define what the request and response functions will handle the API call. Then we define what the data structure will look like in the FileRequest and FileResponse structs. In the request struct we need to know what our original and new file type will be, as well as the actual file itself.
syntax = "proto3";
package proto;
option go_package="/test";
service Test{
rpc HandleFileRequest(FileRequest) returns (FileResponse);
}
message TestRequest{
string TestData = 1;
}
message TestResponse{
bytes TestData = 1;
}Example Request and Response Definition
Once that is created, we can generate all of the protobuf files we will need in one simple line of code that we will talk about later.
Once we run that command we will see the service.pb.go and service_grpc.go files that we will need to make the server work. We won't have to worry about these but if you are interested in how they work, check out more documentation on them.
Cool, well we have what we need for the server, but an important concept to understand with gRPC is the client has no idea how to interact with the API by default. For this we need to copy the files generated to our client program for later use. We will cover a full example of how to do this later in the blog. You can also use reflection in order to tell the client how to interact with the API. Implementing this isn't quite needed for this use case, but its simple to implement, so take a look at this short tutorial on how to do this. https://github.com/grpc/grpc-go/blob/master/Documentation/server-reflection-tutorial.md
Lets Put this all together now!
Install gRPC
Lets assume you already have Golang installed on your local machine. If you haven't set that up yet, check out Golang's download page to get started.
Getting gRPC up and running is often thought as daunting but it can be done in just a few steps:
- First lets install the protobuf and Golang GRPC package needed for generating some of the scaffolding files.
go install google.golang.org/protobuf/cmd/protoc-gen-go@latest
go install google.golang.org/grpc/cmd/protoc-gen-go-grpc@latestTwo Golang Packages to be installed
- Lets confirm everything is working, you can use the sample files to make sure you have everything installed correctly.
Generate protobuf files
Earlier we talked about defining our gRPC service via protobuf files, lets create that file and generate some of the necessary files to run our service.
syntax = "proto3";
package proto;
option go_package="/gobre";
service Gobre{
rpc HandleFileRequest(FileRequest) returns (FileResponse);
}
message FileRequest{
string OriginalFileType = 1;
string NewFileType = 2;
bytes FileData = 3;
}
message FileResponse{
bytes FileData = 1;
}service.proto
We can see that we have one function that will process the file request, and we want to know a few things about it. What is the file type? What is the new file type? And of course the file data itself.
There is also the response struct that will just return the file data of the converted file.
Now lets generate the client code needed for other programs to access the service from other programs.
protoc proto/service.proto --go_out=proto/ --go_opt=paths=source_relative --proto_path=proto/ --go-grpc_out=proto/ --go-grpc_opt=paths=source_relative
If you take a look inside the proto folder you should now see the client code in the service.pb.go and the service_grpc.pb.go file.
Scaffold App
Lets talk about app structure. Since this is just an example MVP application we don't need to get to fancy with it, lets structure everything like:
- main.go <- Server setup and initialization
- main_test.go <- Tests
- libreoffice.go <- Code to handle conversion of the file
- proto
- service.proto <- Request and response definitions
Now we need to actually write some of the logic to start up the server and some of the function router.
package main
import (
"context"
"net"
proto "server/proto"
"google.golang.org/grpc"
)
type GRPCServer struct {
proto.UnimplementedGRPCServer
}
//Route any requests to the libreoffice handler
func (s GRPCServer) HandleFileRequest(
ctx context.Context,
param *proto.FileRequest,
) (*proto.FileResponse, error) {
fileData, errors := HandleConvertFile(
param.OriginalFileType,
param.NewFileType,
param.FileData,
)
return &proto.FileResponse{FileData: fileData}, errors
}
//Start up a basic gRPC server on port 8081
func StartServer(ctx context.Context) {
listener, listenError := net.Listen("tcp", ":8081")
if listenError != nil {
panic(listenError)
}
server := grpc.NewServer()
proto.RegisterGRPCServer(server, GRPCServer{})
//Start the server in a separate goroutine
go func() {
<-ctx.Done()
server.Stop()
}()
//Serve the server
serverError := server.Serve(listener)
if serverError != nil {
panic(serverError)
}
}
func main() {
ctx := context.Background()
StartServer(ctx)
}
main.go
For brevity's sake lets just make this our main.go file. The majority of this is boiler plate, but lets talk about some of the import parts.
- The HandleFileRequest function is what will route our service call to the actual conversion function. We capture the parameters from the call and route them to the actual conversion function. Note we are also utilizing the response struct we set up earlier and if there is an error that will be returned instead.
- The StartServer function will just spin up the listener on the specified port. In our case will be 8081.
Write libreoffice handler
Now we'll need to write the meat of this application, first we'll need to install LibreOffice locally to be able to use its headless instance. Go ahead and do that now, on Debian based systems you can just run:
sudo apt-get install libreofficeIf you're on other systems check out the LibreOffice download page.
Now lets write some code:
package main
import (
"fmt"
"math/rand"
"os"
"os/exec"
"strconv"
"google.golang.org/grpc/codes"
"google.golang.org/grpc/status"
)
var (
storagePath = "/tmp/"
)
func HandleConvertFile(
ogFileType string,
newFileType string,
fileData []byte,
) ([]byte, error) {
if len(ogFileType) == 0 || len(newFileType) == 0 {
return nil, status.Error(codes.NotFound, "Invlaid file types")
}
randBytes := strconv.Itoa(rand.Int())
filePtr, _ := os.OpenFile(
storagePath+randBytes,
os.O_WRONLY|os.O_CREATE,
0644,
)
//Write the original file to disk for easier converting via libreoffice
_, fileWriteError := filePtr.Write(fileData)
if fileWriteError != nil {
return nil, status.Error(codes.FailedPrecondition, fmt.Sprint(fileWriteError))
}
filePtr.Close()
//Run file conversion using the libreoffice function
libreofficeCmdError := runLibreoffice(ogFileType, newFileType, randBytes)
if libreofficeCmdError != nil {
return nil, status.Error(codes.Unknown, fmt.Sprint(libreofficeCmdError))
}
//Load converted data to memory for response
data, readFileError := os.ReadFile(storagePath + randBytes + "." + newFileType)
if readFileError != nil {
return nil, status.Error(codes.FailedPrecondition, fmt.Sprint(readFileError))
}
return data, nil
}
func runLibreoffice(
ogFileType string,
newFileType string,
randBytes string,
) error {
libreofficeCmd := exec.Command(
"bash",
"-c",
"libreoffice "+
" -env:UserInstallation=file://"+storagePath+randBytes+"_lo "+
"--headless --convert-to "+newFileType+" "+randBytes,
)
libreofficeCmd.Dir = storagePath
return libreofficeCmd.Run()
}
libreoffice.go
For this bit of code we included comments so follow those along if you are having problems putting it all together. Lets at least talk about some of the basic steps.
- We need to store the file for later usage with the headless instance of LibreOffice
- We need to use Golang's exec.Command function to run the libreoffice binary.
- We need to return the converted data once the libreoffice binary is done converting the file.
Now that we have everything, how do we make sure this works? Tests of course!
Write Tests
Every program needs some tests, lets write a basic one to test that the service and file conversion is working as intended.
package main
import (
"context"
"os"
"testing"
proto "server/proto"
"google.golang.org/grpc"
"google.golang.org/grpc/credentials/insecure"
)
func TestStartServer(t *testing.T) {
ctx, cancel := context.WithCancel(context.Background())
go StartServer(ctx)
conn, connErr := grpc.NewClient(
":8081",
grpc.WithTransportCredentials(insecure.NewCredentials()),
)
if connErr != nil {
t.Fatal(connErr)
}
defer conn.Close()
client := proto.NewGRPCClient(conn)
body, err := os.ReadFile("test_data/TestWordDoc.docx")
if err != nil {
t.Fatal(err)
}
request := &proto.FileRequest{
FileData: []byte(body),
NewFileType: "pdf",
OriginalFileType: "doc",
}
response, responseErr := client.HandleFileRequest(context.Background(), request)
if responseErr != nil {
t.Fatal(responseErr)
}
if string(response.GetFileData()) == "" {
t.Errorf("No file data in the response.")
}
cancel()
}
Lets setup a test file and our directory to store coverage files.
- Create a directory called test_data
- Find a document you want to convert, you can use a simple example from the internet here.
- Put that file in the test_data directory and name it TestWordDoc.docx
- Now lets run some test commands.
go test -v -cover ./... -coverprofile test_data/cover.out
go tool cover -html test_data/cover.out -o test_data/cover.htmlYou should see output from the first command there like:
=== RUN TestStartServer
--- PASS: TestStartServer (1.57s)
PASS
coverage: 75.8% of statements
ok server 1.571s coverage: 75.8% of statements
server/proto coverage: 0.0% of statementsIf everything is working correctly we should see a PDF in the /tmp directory on our machine (or whatever directory you choose). Congratulations! You now have your own mini file converter!
Wrap Up
Pretty neat eh? Well before you go off on file conversion adventurers, here are a few considerations before you go.
- How will you deploy this? You can simply package it in a Docker Container, or even take it a step further and use that container in a cluster.
- Security is a concern here, you should not use this exact example in production, build protections in to prevent from malicious files.
- With LibreOffice headless, we often ran into issues with memory and computational power, make sure to bench mark your use case and allocate resources accordingly.
- This specifically became an issue with us when the micro-service was under a heavy load. The micro-service would randomly crash and we saw no errors in the logs.
- However we realized LibreOffice uses a ton of memory and CPU power, and this micro-service was in a cluster with a low resource allocation. So we set our helm values to 500m for the CPU and 1Gi for the memory to allow multiple document conversions at once. If this was something that needed to scale to the thousands of users you might want to run several instances of the document converter micro-service and auto scale resources when under load.
If you need help with a project like this, feel free to reach out to us at krum.io. We'd be happy to work with you on whatever your organization is building.