Data Integration: Connecting Business Systems,
with Fethi Rabhi and Alan Hsiao
In this conversation, Jon Scheele, Fethi Rabhi, and Alan Hsiao discuss the complexities of data integration, particularly in the context of e-invoicing. They explore the challenges faced by businesses in automating invoicing processes, the importance of standards like PEPPOL, and the limitations of current systems. The discussion also touches on the future of invoicing, automation, and the potential impact of AI and blockchain technology on business processes.
Takeaways
- Data integration is a significant challenge for businesses.
- E-invoicing requires automatic production and consumption of invoices.
- Standards like PEPPOL facilitate communication but have limitations.
- The complexity of invoicing is compounded by various business processes.
- Microservices and APIs can help in evolving architectures.
- AI's impact on invoicing may take time to materialize.
- The future may see more hybrid systems for invoicing.
- Automation in invoicing can extend to procurement and pricing.
- Trust in communication is essential for effective invoicing.
- The digitization of impractical processes is on the horizon.
Sound bites
"The world effectively runs on this invoice."
"We need a way of trust in the communication."
"We do see a decline in ERP type systems."
Chapters
00:00 The Challenge of Data Integration
01:48 Understanding E-Invoicing Complexity
08:47 Standards in E-Invoicing: Benefits and Limitations
14:01 Balancing Standardization and Innovation
22:07 Future Trends in E-Invoicing and Automation
Jon Scheele (00:00)
Data is everywhere in lots of different systems and that's part of the beauty of data. It's also part of the challenge in bringing that data to the place where we can make better decisions and automate our processes. I'm very pleased to welcome Professor Fethi Rabhi from University of New South Wales and Alan Hsiao, the founder and CEO of Cognitivo to discuss this very aspect of how we make data work for us. So, Fethi, perhaps you'd like to introduce yourself.
Fethi Rabhi (00:33)
Yeah, thank you, Jon. My name is Fethi Rabhi. I'm a professor in software engineering at UNSW. I've been there for 25 years working in different aspects of building software, particularly for business applications. Not only we teach a software engineering degree that empowers the students to use the latest techniques in building software, but we also engage with industry in collaborative projects. And I can confirm that one of the big issues we've been confronted with is data integration. So I'm very happy that today we'll be talking about that.
Jon Scheele (01:12)
Thanks, Fethi and Alan.
Alan Hsiao (01:13)
Yeah, so I'm Alan Xiao. I'm the founder of Cognitivo. Cognitivo is an AI software company. We call ourselves the UI of AI because we believe the hardest thing about AI is telling this faceless algorithm what you want it to do. So I think that that's where we're touching on is how do you interact with this thing that we don't quite all fully understand to get it to do what you want it to do. My background comes from professional services in software development, also in banking in various technical and non-technical roles.
Jon Scheele (01:48)
Great. Thanks for the intro, Alan. So I'll get straight to business. Let's discuss an example business use case that every company is affected by or has to do, and that is invoicing. And lots of companies have accounting or ERP systems that will happily produce an invoice, but getting that to their customer or receiving an invoice from a supplier is not always so automated. Now Fethi, you and your team have done a recent research study on e-invoicing. Would you like to give us a rundown on what the current state of electronic invoicing is from your research?
Fethi Rabhi (02:31)
Yeah, thank you. And the first thing is to say what invoicing is not. And people, they think that using your ERP system, creating invoice and sending it electronically, it's an electronic invoice, but it's not because ultimately it's manually produced and it's manually consumed. e-invoicing is a situation where the invoice is automatically produced and automatically consumed, meaning that there's no eyes to read it. It has to be understood from both parties Particularly in the area of invoicing, you don't want misunderstandings to happen. To facilitate that, there are many electronic invoicing standards that have been developed. They are fairly laborious, contain a typical invoice, think contain two or three fields, but it's hundreds and hundreds of different fields to account for the different situations. And then different rules and regulations in different countries will also have different validation rules. So it is highly complicated, complex area where you need technical standards to define the content of the invoice. You need other standards to define the communication. author invoice, and you need technologies to map from your local information system to these standards. So lots of different facets that require thinking, require tools, requires consulting and advice. And I would say that most companies that suffer are more the small and medium enterprises because these are not the type of things they have the capacity to deal with. And for us, yeah, it's a very interesting domain to work in for all of these reasons.
Jon Scheele (04:15)
So can you give us an idea of some of the complexity? Because often as consumers, we may get an invoice and it has a reference number and we go and pay the invoice and we provide the reference number. But when you're dealing with companies, particularly some complex supply chains, have the reconciliation of the invoice has to go not just to the invoice number, but to each and every line item and the "What is this?" Give us an idea of what has to happen between companies in those cases.
Fethi Rabhi (04:47)
Yeah, so in the case of supply chain, we're talking about hundreds and hundreds of invoices per month. And in some cases, it could be thousands per month. So you have the problem of a huge volume. And one of the reasons we have this huge volume is because modern companies, don't have stock. They order just in time their items. So as the supply chain is working, they will just order enough for one or two days.
And this way they don't have to keep stock. Now the complexity comes from not just the creation and consumption of the invoice, but it is connected to, for example, the production system. So you're producing things. You have to know when to order and how much you order and what to order. So the data that is inside the invoice is not keyed by a human.
it's obtained from other systems that you need to be able to tap from the information to put it there and vice versa. The other end, when the invoice gets paid, you have reporting obligations, you have different entities, maybe that need to do approval. You have exceptions if the amounts is not right or there is no connection between a particular order.
There is also integration between invoicing and the deliveries and the logistics. So as things get delivered, they get invoiced, but not before. So you have to know exactly when things get delivered. So I would say in summary, there's complex business processes that are surrounding the invoicing and the invoice has to contain the right information at the right time and be sent at the right frequency and that contributes to the complexity of the problem.
Jon Scheele (06:26)
Uh-huh. So one of the popular standards or frameworks is PEPPOL (Pan-European Public Procurement On-Line). But can you give us a feel for what that helps, but then also what you need to do beyond simply connecting to a PEPPOL agent? Particularly if there are a number of jurisdictions around the world that have adopted PEPPOL, but they may not have done it all in the same way. What is the benefit and what is the challenge that you still have to do when you decide to go with a particular framework or standard?
Fethi Rabhi (07:03)
Yeah, PEPPOL has defined an international standard for the transmission of the invoicing. And I think that's important because that means, you know, as you transmit the invoice, you have a protocol of communication. And that's important because let's think about the internet. You you send an email, you have no idea that the email has been received or read. But with an invoice, you want a system where you know that you have sent something, you know that the other party has received it and you know that the information hasn't been corrupted in any way.
So PEPPOL provides the framework to enable that, but it is an international standard that has to be supported by local authorities. So ATO (Australian Tax Office) is the authority in Australia that manages PEPPOL. Particularly you have to manage the address and the users and make sure that you can communicate between the different parties. Singapore has one so all the different countries will have a PEPPOL authority that manages it, manage the communication.
The main challenge is that communication is not everything. You need other things and that goes back to the theme of your talk is data integration. Yeah, I have this protocol and it tells me I have to send XML and I have to do this and I have to do that. But my information is in a completely different form and shape. And how do I go from my local information system to that particular standard? And I think there is no one solution fits all because every entity that needs to send or receive an invoice will have very very unique specific context requirements that you know will not be easy to determine.
Jon Scheele (08:48)
Okay, well thanks for sharing that. So Alan, I'd like to perhaps if you could provide a picture of where you've seen standards work well but also what are some of the limitations of standards and where you see this evolving.
Alan Hsiao (09:04)
Yeah, this is an interesting space. I mean, I've done a little bit of work with Fethi and he's tried to explain to me the e- invoicing standards and we looked at the the payloads of these PDF documents with very complex XML embedded in them. And there are hundreds of fields and you got to ask yourself why are there so many fields, right? And it's because you have to cater for all the different processes that could happen and restore the stateful information that supports these processes without those processes being described within the document itself. That's kind of the technical challenge. I draw upon an interesting analogy to kind of make it a little bit easier to understand. We've been doing a lot of work with language models.
And then again, with Fethi in our research group, pairing that up with semantic models, right? So we have this world of natural language which backs onto a semantically defined ontology. And that's kind of the thing about these two worlds is one is very well structured, it's hard to create, and one is very fluid, right? And it's striking the right balance between these two worlds.
If you think about a process diagram, a process diagram as humans would understand it, is boxes with words inside it. If you turn that into BPMN, it all of a sudden explodes into this very, very verbose, lots and lots of fields, lots and lots of things. But the information we want to convey as humans is actually not that complicated.
So we need to strike the balance between things that are very verbose, but actually have less human meaning in that, the information to text ratio, versus something that might be very easily described. And there are pros and cons with this, right? When we talk, we can say, hey, we're two companies. Our trade terms are this and that. All right. And that's between you and I, and another company might have another set. So the standard must cater for all of the different combinations of this within the standard. Wouldn't it be easier if we just captured what those terms are in plain text?
So that's, you get what I'm trying to say. There are pros and cons with these two different approaches. If you think about the way a law is written, Model Driven Design in terms of can we model every single feature of our society and the right business rules and have the laws become that, then everything will be very well structured. But laws are not written in that way. Laws are written as plain human language. And there is interpretation in that.
So you've got to ask yourself, well, why is that the case? And the e-invoicing thing is like, OK, so we must have a lot of structured fields where it is common, and that makes a lot of sense. But what I'm trying to say is with natural language capabilities, semantic understanding backing into that, it opens up the potential to say, let's put natural language clauses, which is actually the way laws are defined anyway within that contract, thus serving as a way to increase the applicability of those standards and speed up commerce. So people and our conditions change at the speed in which we negotiate, but standards are very slow moving. So standards both have the effect of accelerating e-commerce, business to business commerce, but it can also have the effect of slowing it down in other circumstances.
Jon Scheele (12:32)
Yeah, guess the standards typically lag innovation, but not all innovations are worth creating standards for. There is an experimentation associated with innovating, so you can expect that standards will typically be created after people have tried a few things and decided this is the best way of solving this particular problem. And then standards allow that innovation to scale a lot more because people can pick that up and run with it. But if you create the standard too early, you may find that it's not actually fit for purpose and that puts a brake on it, I think, as is your point. But there are some things that are very deterministic, or need to be very deterministic. It may be a product code, it has to be exactly the product code. It may be the quantity of something and the price of something. We don't want those to change. We don't want generative AI to create something new. We want exactly that.
But I guess your point is that things like the interpretation of what is expected, maybe there are areas where you really want to be able to explain something, particularly across cultures, the use of language may be a little different, even if everybody's speaking English, it isn't necessarily interpreted exactly the same way. So what sort of data architectures
Alan Hsiao (13:50)
Mm-hmm.
Jon Scheele (14:01)
should we be developing in order to facilitate that combination or balance between standardization and innovation.
Fethi Rabhi (14:12)
Well, in the case of the work that we are doing, we rely a lot on using microservices, using API, using BPM, business process management technologies, using XML. So all these technologies used together give you that flexibility to evolve with changing processes, changing the the rules, adding, for example, the evolving architecture with new needs. So these are, I would say, proven techniques that help us to deal with the problem. However, taking on what Alan was saying, could we have more advanced architectures, for example, based on agents? Because agents have some autonomy to make decisions.
So do we need to codify everything? Could we have within our architectures give some initiative to the software to do things when things go wrong or when we're dealing with exceptions? This is not well sort of stable and proved technologies, but that's, I think, where the future is taking us. But, you know, we need more experimentation to...
Alan Hsiao (14:59)
Mm-hmm.
Mm-hmm.
Fethi Rabhi (15:20)
go into that kind of world. So I don't know Alan if you want to add more.
Alan Hsiao (15:24)
Yeah, I think Jon had a slant about data architectures and things like that. Fundamentally, I mean, if you think about the 33 trillion in global trade, the thing comes down to a flimsy piece of paper, right, which is the invoice. So the world effectively runs on this invoice. And it is probably the least efficient thing ever. Right? It's a piece of paper. Right? And as Fethi said, even if it's a PDF, it's still a picture, so it's completely manual. You have products that work, for example, invoice factoring, where, I have all of these receivables in 30, 60 days, but I'm a small company, I need that working capital now, I can take my invoices to a bank and I can get some money upfront. The banks are not very crash hot on these invoice factoring type products, because they now need to take every single invoice and take the due date, the supplier, and all of those things. And now the bank is doing the invoice processing. So that's a very costly product.
But 5 % off the face value of a 30-day invoice factor is like, you do the compound interest, probably 27 % interest rate, annual percentage rate. So it's a very profitable product. So you need to think about, well, if the world is run on global trade, and it's the least efficient way, how do we improve that? And this is where we're saying, yes, we should get agents, agents that can transact and settle payment amounts. It can read the invoices, and e-invoicing does fix a lot of that.
But we do need a way of trust. The communication is one thing, but the other one is, where will the invoice live that there is one copy that is immutable that we can rely upon, and we can transact upon that? It's part of the transaction flow. I send you an invoice to your company. You say, I accept that. Then a third party can look at that and say, OK, well, that company's accepted that. That's they'll pay whichever 60, 90 days. And that information, that single copy of information can be then used to transact in the real world. And where we say, well, the stock has arrived. The money has been paid in advance and when you've received it, this is my portion that is paid out back to you.
So we can basically automate this entire process, you know with agents and this is probably one of the very good use cases for blockchain and having a distributed ledger because that's effectively, you know, what we need is a inter company ledger Right, so we don't have the the double spending problem in this case So this is where I'll say one of the rare cases where I think distributed ledger technology is going to be very suitable for. And then you can lay upon that agents that understand the law in the country. So if you don't pay, then I can try to recover the money back and then all these ancillary automations that can happen off the back of it.
Jon Scheele (18:22)
So I guess what you're saying about the standard is, it has to be trusted and agreed by lots of people and some of the standards that have occurred or standardization that has occurred has been very industry specific with the car manufacturers introducing EDI in their supply chains. They had the market
Alan Hsiao (18:41)
Mm.
Jon Scheele (18:44)
power I guess to tell their suppliers "If you want to invoice us it has to be this way". You mentioned a last supermarket chain in Australia and that they also have a significant amount of buying power so they can, dictate is the wrong word, but they can specify the best way of interacting with them. But when you have
Alan Hsiao (19:04)
Mm-hmm
Jon Scheele (19:06)
lots of different players and not necessarily one dominant, one or two dominant players in that industry or use case, then I guess that trust and that standardization is something that has to be ⁓ agreed across a number of parties.
Alan Hsiao (19:10)
Hmm
Yeah, that's right. And the absence of that is the monolithic approach. So I've seen a lot of large companies that implement a big procurement system. Coupa is a very well known one. And basically what they're saying is, well, you're a, you're a supplier to me. You send me your invoice, but you lodge it in my system.
And what they do is, so I use my accounting system to generate an invoice and then I log into their system to create another invoice and attach the picture of my invoice to their invoice. That is extremely, extremely inefficient. It's not really solving the problem at all, right? Yeah, and in fact, now that if you look into their invoice, am I gonna be bothered to type all the line items in there? So that is, it's a compromise, but at least they get...
Jon Scheele (19:53)
Yeah.
Fethi Rabhi (19:58)
Thanks.
Jon Scheele (19:58)
Yeah.
Yeah.
Alan Hsiao (20:08)
That solves the telephone system in that you don't need the telephone system. You've gone there to tell the thing, the message directly. PEPPOL and EDI, PEPPOL is a telephone system. So at least my accounting system can talk to their accounting system. As a supplier, I don't necessarily want to log into everyone else's procurement system. It's just not something that I want.
Fethi Rabhi (20:31)
Let me give you an example of the French reform which we are working with at the moment. So in France they did a hybrid system where different parties can use PEPPOL to exchange the invoicing but it's their duty to take a summary of that invoice and send it to the French tax office portal. So you have these hybrid models where you could have trust between entities that collaborate with each other at the same time some obligations to send some of the information to the authorities for example. So we'll see that's why you know it used to be centralized or distributed now we will see more of these hybrid systems and that will make data architectures even more complex and definitely we will need to come up with very scalable, intelligent, adaptable architectures to deal with that complexity. Again, we're going to work for the next few years. And invoicing is touching on other areas. So procurement, ordering, reporting, even pricing. You do your pricing based on the volume that you're selling.
Alan Hsiao (21:29)
Mm-hmm.
Fethi Rabhi (21:46)
So this will start from that kind of area and extend to other processes because once you automate one party, the temptation to extend it to the other parts will be very great.
Jon Scheele (21:58)
So I guess there's a lot of need. Where do you see things happening in the next three to five years? Where do you see things converging to?
Fethi Rabhi (22:07)
It's a difficult question because Europe and the US are taking different approaches and not surprisingly Europe is going, there's something called the ViDA initiative. It's the VAT in the Digital Age and it's for having consistent reporting across all the European jurisdictions. So now we're not dealing with the invoicing in one country, we're looking at the whole EU and so on.
US is not going that much into that area. It's still trailing in this space. So I guess the regulatory pressure is not going to be as high as we would have expected it maybe a couple of years ago. However, the financial incentive, efficiency incentives will be there still and encourage companies to, you know, automate their processes. You said two, three years, I'm still skeptical about AI. I don't think it would be sufficient time for AI to have an impact simply because the ⁓ state of adoption is still very, very low. Companies are just understanding the power of the APIs for them. Wow, using an API is a big thing.
XML also is fairly recent. So all of these things that maybe big companies and banks have already adopted in this space, they are considered to be big innovations and they're just starting to penetrate. So I would expect more automation, more of the traditional technologies that have shown their BPM APIs. These will have probably more impact in the next two, three years. And we lay the ground perhaps for the AI revolution to come at later stage.
Jon Scheele (23:46)
Thanks for sharing that. Alan, your closing thoughts on where you see this heading?
Alan Hsiao (23:50)
⁓
The counter-contrarian view is if I look at these standards and ERPs, it's a very German way of doing things, and they're very good at these kinds of industries, is that you've got one way of doing things. And I think we do see from the early 2000s a decline in ERP type systems. Things are more decentralized and the architecture is basically agent to agent communication, MCP (Model Context Protocol) as we see that. And I think how do we apply that to do more commerce? And I don't think it's about eroding the parts of our industry that are well modeled and standardized. They will continue to grow and penetrate in their way. But what we're saying is with AI and what that can do, the sponginess between very well defined things and human articulated things is that there will be more things being digitized. So it's not like it's competing, it's an "or", but there would just be more digital commerce in general. So I think that's from a productivity perspective, we all stand to gain. But I'm basically, I call time on the ERP era.
Jon Scheele (25:01)
I guess what you're saying is that we'll be able to digitize the things that have been impractical to digitize so far. Yeah. Okay. All right. Well, thanks for sharing that perspective, Alan, and also you too, Fethi. I've learned a lot about integration, particularly with invoicing. Thanks very much.
Alan Hsiao (25:09)
Correct. Correct. Yeah.
Fethi Rabhi (25:22)
All right, thank you. Thanks for inviting us.
Alan Hsiao (25:22)
Thanks. Bye.
Start with the customer – find out what they want and give it to them.
See more about APIs, AI and tech
Navigating the Future of API and AI Gateways, with Buu Lam
In this conversation, Jon Scheele and Buu Lam discuss the evolving landscape of API and AI gateways, the importance of security in technology, and the need for human oversight in AI-driven processes. They explore the challenges posed by evolving security threats and the necessity for individuals to adapt and embrace change in their careers.
The Interconnection of AI and APIs with Aki Ranin
Conversation with Aki Ranin about his journey in the AI data science space. Aki explains how AI and APIs are intricately linked. He highlights the potential of large language models and AI agents in transforming industries and making AI-assisted tasks more efficient. He also discusses the challenges of discoverability and the importance of metadata in making information accessible to AI agents. Aki provides recommendations for individuals looking to understand the trajectory of AI and APIs.
https://www.apiconnections.io/podcast/ep-01-the-interconnection-of-ai-and-apis-with-aki-ranin
Putting GenAI to Work in Software Development
with Uli Hitzel
In this conversation, Jon Scheele and Uli Hitzel discuss the transformative impact of generative AI on the software development landscape. Uli shares his journey into AI, emphasizing the importance of understanding language and how to apply Generative AI tools in coding and productivity. They explore various tools available for software developers, the significance of team management in adopting these tools, and the role of AI throughout the system development life cycle, including testing and deployment. The discussion highlights the need for developers to adapt and learn how to effectively utilize AI tools while maintaining best practices in coding and documentation.
powered by blue connector
API Strategy and Tech Advisory, Training and Events
We connect your organisation, your customers, partners and suppliers with the information and knowledge you need to make your tech work for you
