Inquisitive M365 https://thomasdaly.net Yet another SharePoint / Office 365 blog Wed, 25 Mar 2026 03:39:47 +0000 en-US hourly 1 116451836 Why We Rebuilt a Power App as a Web App (And How AI Helped Us Ship It Fast) https://thomasdaly.net/2026/03/05/power-app-to-web-app-migration/ https://thomasdaly.net/2026/03/05/power-app-to-web-app-migration/#respond Fri, 06 Mar 2026 00:00:00 +0000 https://thomasdaly.net/?p=3413 Power App shattering into React components flowing into a modern web application

A mom-and-pop camera parts store came to us with a problem. They'd been running their inventory consignment tracking on a Power App built on top of SharePoint lists, and it was falling apart. Not in a dramatic way — more like death by a thousand cuts. Slow screens, security workarounds, lists bumping up against row limits, and a UI that fought them every time they needed to add a feature.

The app tracked consignment memos — when parts went out on consignment to other dealers, what was sent, what came back, what was still outstanding. It's the kind of workflow that sounds simple until you realize it touches inventory, customers, sales staff, PDF generation, email notifications, and reporting. The Power App had gotten them through the first couple of years, but they'd hit a ceiling.

We rebuilt it as a custom web application. And the part that made it feasible for a small team on a reasonable budget was AI-assisted development — not as a gimmick, but as a genuine force multiplier for the grunt work.


What Went Wrong with the Power App

Power Apps are great for getting something running quickly. You connect to a SharePoint list or Dataverse table, drag some controls onto a screen, write a few formulas, and you've got a working app in a day. For simple workflows with a handful of users, that's genuinely powerful.

But this app had grown well past that sweet spot. Here's what broke down:

Speed

The app was painfully slow. Every screen load involved multiple lookups against SharePoint lists, and Power Apps doesn't give you much control over how or when those queries execute. Users would tap a button and wait several seconds for the next screen to render. For a tool people use dozens of times a day, that friction adds up fast.

Delegation was a constant headache. Power Apps delegates certain operations to the data source, but many common operations — like certain filter combinations or sorting on calculated fields — can't be delegated. That means the app silently pulls only the first 500 or 2,000 rows and works with that subset, which led to missing records in search results. Users would search for a memo they knew existed and get nothing back.

The Large List Problem

SharePoint lists have a 5,000-item list view threshold. Once you cross it, queries that aren't indexed start failing or returning incomplete results. The consignment memo list had blown past this limit months ago. We'd added indexes on the most-queried columns, but Power Apps' delegation model meant some queries still couldn't take advantage of them.

The workarounds were ugly — splitting data across multiple lists, pre-filtering with flows, caching subsets locally. Each workaround added complexity and new failure modes.

Too Many Controls

Power Apps renders every control on a screen, even ones that aren't visible. As the screens grew more complex — conditional sections, expandable details, dynamic form fields — performance degraded. We'd hit screens with 200+ controls, and the app would visibly struggle to render them.

The recommended fix is to split screens into smaller pieces, but that creates a disjointed user experience and makes state management even harder. You end up passing data between screens through global variables and collections, which is fragile and hard to debug.

Security Limitations

Power Apps' security model is tied to the underlying data source. With SharePoint, that means item-level permissions or breaking inheritance — neither of which scales well. The business needed role-based access: admins see everything, sales staff see only their stores, viewers get read-only access. Implementing this cleanly in Power Apps required a tangle of conditional visibility rules and duplicate screens, and it was never quite right.

Hard to Hand Off

This is the one that doesn't get talked about enough. Power Apps are notoriously difficult to hand off to another developer. There's no proper source control. The "code" is a mix of Excel-like formulas scattered across control properties. There's no way to do a meaningful code review or diff. Documentation is whatever comments you've added to your formulas — and let's be honest, most Power App formulas don't have comments.

When the original developer moves on, the next person has to reverse-engineer the app by clicking through every screen and reading every control's property panel. For a complex app, that can take weeks just to understand what it does, let alone modify it safely.

Hard to Extend

Every new feature request felt like a negotiation with the platform. Need to generate a PDF? You're piping data to a Power Automate flow that calls a third-party connector or a custom API. Need to send styled HTML emails on a schedule? Another flow, another connector, another set of failure points. Need a proper grid with sorting, filtering, and inline editing? You're stacking galleries and buttons and toggle controls in ways they weren't designed for.

The app had become a Rube Goldberg machine. It worked, but adding anything new meant understanding all the existing moving parts and hoping nothing broke.


The Decision to Rebuild

We didn't take this lightly. Rewriting a working application is risky — you're spending money to get back to where you already are before you can move forward. But the writing was on the wall:

  1. The Power App couldn't scale further. Every new feature was getting harder and slower to build.
  2. Performance was hurting the business. Staff were avoiding the app when they could and falling back to spreadsheets.
  3. Maintenance risk was growing. The original developer's availability was limited, and nobody else could work on it.
  4. The requirements were well understood. After two years of using the Power App, the business knew exactly what they needed. No discovery phase, no guesswork.

That last point is underrated. The Power App had essentially served as a working prototype — an expensive one, but it meant we weren't building from a blank page. We had screens, workflows, and edge cases already mapped out.


How We Built the Replacement

We chose Next.js with TypeScript, Ant Design for the UI, Azure AD for authentication, and Azure SQL Database for storage. The app runs on Azure App Service with Azure Functions handling background jobs like scheduled email notifications.

Requirements and Mockups First

Before writing any code, we documented every feature the Power App had — and every feature they wished it had. We walked through the existing app screen by screen, cataloging functionality, business rules, and pain points. This gave us a clear requirements document and a set of UI mockups to build against.

This step matters more than people think. When you're using AI to help generate code (more on that in a minute), the quality of what you get out is directly proportional to the quality of what you put in. Vague requirements produce vague code. Detailed requirements with mockups produce components that are 80-90% right on the first pass.

AI-Assisted Development

Here's where it gets interesting. This was a solo developer project with a tight timeline. Building a full-featured web application — authentication, role-based access, CRUD operations, PDF generation, email automation, responsive UI — would normally take months of dedicated work.

Our AI tool of choice was Claude on the Max plan ($100/month). That's it — one subscription, one developer, and a clear set of requirements. We used it extensively throughout the build. Not to replace the developer, but to accelerate the tedious parts:

Scaffolding and boilerplate. Setting up API routes, database queries, TypeScript types, form validation — the kind of code that's necessary but not creative. With clear requirements, we could describe what an API endpoint needed to do and get a working implementation that just needed review and testing.

UI components. Given a mockup and a component library (Ant Design), generating the initial component code was fast. "Build a form page with these fields, these validation rules, and this layout" produces something usable quickly. The developer's job shifted from writing every line to reviewing, adjusting, and integrating.

Database migrations. Translating a data model into SQL migration scripts, complete with indexes and constraints. Describing the schema in plain English and getting back working DDL.

Repetitive patterns. Once we'd established how one CRUD feature worked (API route, service layer, component, types), generating the next five features followed the same pattern. The AI was good at applying established conventions consistently.

Security hardening. We ran the codebase through security review — JWT validation, XSS protection, SQL injection prevention, input sanitization, security headers. The AI identified vulnerabilities and generated fixes, which we then verified.

What AI didn't do well: architectural decisions, complex business logic edge cases, performance optimization, and anything requiring deep understanding of how the pieces fit together. Those still required an experienced developer thinking carefully.

The net effect was that the core application was rebuilt in a few weeks. But here's the part that doesn't make the highlight reel: we spent significantly more time on validation than on building. The goal wasn't just feature parity — it was parity or better on every workflow. That meant going through every screen, every edge case, every business rule from the Power App and verifying the new app handled it at least as well, if not better.

QA Process

Building fast with AI makes a rigorous QA process non-negotiable. The build phase was the easy part — the validation phase was where the real time went. When code is generated rather than hand-typed, you can't rely on the "I wrote it so I understand it" safety net. We implemented:

  • Code review on every generated component. Nothing went in without the developer reading and understanding it line by line.
  • Manual testing against the requirements doc. Every feature was tested against its original requirement, not just "does it compile."
  • Security review. We did a dedicated security pass, treating the app as if it were written by an untrusted junior developer. We found and fixed JWT validation gaps, missing input sanitization, console logging of sensitive data, and transaction handling issues.
  • User acceptance testing. The actual users (the store staff) tested the app against their real workflows before go-live. Their feedback drove the final round of adjustments.
  • Side-by-side comparison. We ran both the Power App and the web app in parallel for two weeks, ensuring the new app handled every scenario the old one did.

What We Gained

The difference was night and day:

Speed. Pages load in under a second. Search is instant, even across tens of thousands of records. No delegation limits, no 500-row caps, no phantom missing results.

Polish. This was the surprise win. With a proper component library and consistent UI patterns, the web app ended up far more polished than the Power App ever was. We could add modal dialogs for editing contacts, sales staff, and store details — all inline, without navigating away from the current screen. Form layouts were consistent across every page. Loading states, validation feedback, error messages — all uniform. In Power Apps, that level of UI consistency is a constant battle against the platform. In a web app with a design system like Ant Design, it's the default.

Security. Proper JWT-based authentication with Azure AD, role-based access control at the API level, parameterized queries, XSS protection, and security headers. The kind of security that's table stakes for a web app but nearly impossible to retrofit onto a Power App.

Maintainability. It's a standard Next.js/TypeScript codebase in a git repository. Any developer who knows React can pick it up. There are pull requests, code reviews, diffs, and a clear project structure. The README explains how to run it. The CLAUDE.md documents the architecture for AI-assisted future development.

Extensibility. Need a new feature? It's a new API route and a new component, following established patterns. Need to change the PDF layout? It's a pdfmake template in code, not a Power Automate flow calling a third-party connector. Need scheduled emails? It's an Azure Function with an HTML template. Need a new modal for managing a related entity? Copy an existing one, adjust the fields, done.

CI/CD. We set up deployment pipelines for both dev and production environments. Push to a branch, it deploys to dev. Merge to main, it goes to production. The Power App had… "publish."


When Power Apps Still Makes Sense

This isn't a "Power Apps is bad" article. Power Apps is the right tool when:

  • Your data fits comfortably in SharePoint or Dataverse limits
  • You have a small number of users (under ~20)
  • The app is simple — a few screens, straightforward data entry, basic reporting
  • You need something running this week, not this quarter
  • The builder will also be the maintainer for the foreseeable future

The camera parts store's app had simply outgrown all of those conditions. It had thousands of records, complex business rules, multiple user roles, PDF generation, automated emails, and needed to be maintainable by someone other than the original builder.


The AI Development Takeaway

The real story here isn't "Power Apps bad, custom code good." It's that AI-assisted development has changed the build-vs-buy calculation. Two years ago, rebuilding this app as a custom web application would have been hard to justify for a small business. The development cost would have been too high relative to the pain of living with the Power App's limitations.

But with Claude Max at $100/month and a single experienced developer, we built a production-quality application in a few weeks. The real investment was in validation — weeks more of meticulous testing to ensure parity or better on every workflow. The developer's role shifted from writing every line of code to architecting the solution, defining the requirements clearly, reviewing generated code, and handling the genuinely complex parts that require human judgment.

That's the pattern I expect to see more of: businesses that started with low-code tools outgrowing them, and AI-assisted development making the jump to custom code practical on a small-business budget. The key ingredients are clear requirements (ideally from having already run the low-code version), an experienced developer who can architect and review, and a disciplined QA process that doesn't trust generated code blindly. The build is the fun part. The validation is where you earn the trust.

The camera parts store is now running on an app that's faster, more secure, easier to extend, and maintainable by any competent developer. Total AI tooling cost: a few months of a $100 subscription. That's not magic — it's the practical reality of knowing what to build and having better tools to build it with.

]]>
https://thomasdaly.net/2026/03/05/power-app-to-web-app-migration/feed/ 0 3413
Three Years of Community Days: Built by the Community, For the Community https://thomasdaly.net/2025/10/12/community-days-three-years/ https://thomasdaly.net/2025/10/12/community-days-three-years/#respond Mon, 13 Oct 2025 00:00:00 +0000 https://thomasdaly.net/?p=3395 Packed session room at a Microsoft community event — attendees filling the seats, speaker presenting at the front with slides on screen

CommunityDays.org turned three on October 13th. I still can't quite believe that.

What started as a scramble to fill a gap when SPSEvents.org shut down has turned into something I'm genuinely proud of. It's not perfect, and there's always more to do, but three years in feels like a good time to look back at how we got here.


How It Started

When SPSEvents.org announced it was going away, it left a real hole. That site had been the central hub for SharePoint Saturdays and Microsoft community events for years. Organizers relied on it. Speakers browsed it to find events to submit to. Attendees used it to figure out what was happening near them.

I figured someone needed to step up, and I had the skills to try. So I did.

The CommunityDays.org events page in its early days — showing the first wave of community events listed on the platform

It was a lot of late nights and weekends. SoHo Dragon sponsored the effort, Microsoft backed it, and Jeff Teper was kind enough to announce the launch at Ignite — which was a surreal moment for me. But the idea itself was always simple: give organizers a place to list their events and give everyone else a place to find them.


By the Numbers

Hundreds of organizers. Thousands of sessions. Speakers from all over the world covering Microsoft 365 (M365), Power Platform, Azure, and more.

I'm proud of those numbers. But honestly, the numbers aren't the part that keeps me going.


The Stories That Stick With Me

After the M365 DC event, Charles Lakes II came up to me and said the site had helped him find more events, get in front of more audiences, and that it played a part in him earning his MVP. I want to be clear — Charles earned that through his own hard work. But hearing that Community Days helped him find those opportunities? That meant a lot to me.

At that same event, I sat in on a Power Platform session from Diego Da Silva who was just getting started as a speaker. You could see it click for him — that moment when someone realizes they have something worth sharing and people want to hear it. That's what community events do. They give people a stage before they even know they're ready for one.

Attendees mingling in the hallway between sessions at an M365 community event

Those are the moments I think about when the work gets heavy.


The People Who Make It Work

Community Days doesn't run on one person. There's a group of people who show up consistently — reviewing events, promoting the platform, flagging bugs, keeping things funded and running. I could list every name but the list would get long and I'd inevitably leave someone out. You know who you are, and I appreciate what you do. You can see the full list of contributors and sponsors on the Community Days about page.

On the partnerships side, Adis Jugo and the team at run.events and Domagoj Pavlesic and the crew at Sessionize have been real partners in this — not transactional, but built on a shared belief that community events deserve good tools and good visibility. Working with them has made Community Days better.

Group photo of the Community Days team and speakers at M365 NYC

Why I Keep Doing This

People ask me sometimes. It's volunteer work. It takes real time away from other things. There's no big team behind the curtain.

The honest answer is that I enjoy it. I've been part of the Microsoft community for a long time, and I've seen what happens when people share what they know — when they mentor someone new, when they give up a Saturday to teach a room full of strangers something useful. I want to support that however I can.

Community Days is free for organizers. The event listings, the call-for-sponsorship tools, the Speakerboard — all free. I feel strongly that the moment you start putting barriers in front of community, it stops being community.


Looking Ahead

I'll be honest — the last three years haven't always been easy. There are stretches where the to-do list feels infinite and the time feels short. But then someone shares a story about how the site helped them, or just says thanks, and it puts things back in perspective.

The CommunityDays.org events page showing a grid of upcoming Microsoft community events from around the world

394 events listed. On to year four. Thank you to everyone who's been part of this — the organizers, the speakers, the attendees, and the people working alongside me to keep it going. I appreciate all of you.


Get Involved

If any of this resonates with you, there are a few ways to get involved:

  • List your event on CommunityDays.org — it's free for organizers
  • Browse the Speakerboard at CommunityDays.org/speakerboard — speakers who present at community events are automatically listed, making them discoverable by other event organizers
  • Organize or help with community events — the Microsoft Global Community Initiative (MGCI) runs regular office hours and training for new and existing event organizers
  • Sponsor a community event — browse events looking for sponsors through the call-for-sponsorship listings
  • Want to help out? — reach out through the contact page and let us know how you'd like to contribute
  • Just show up — attend an event near you. That's how most of us got started

Related Links

]]>
https://thomasdaly.net/2025/10/12/community-days-three-years/feed/ 0 3395
Renewed as Microsoft MVP for 2025-2026 https://thomasdaly.net/2025/07/31/renewed-microsoft-mvp-2025-2026/ https://thomasdaly.net/2025/07/31/renewed-microsoft-mvp-2025-2026/#respond Fri, 01 Aug 2025 00:00:00 +0000 https://thomasdaly.net/?p=3407 Contributing to the tech community takes time — something that gets harder to find as life gets busier. That's part of why this recognition means so much.

I'm happy to share that I've been renewed as a Microsoft MVP for 2025-2026.

This is my ninth year in the program, and it still means as much now as it did when I first received the award back in 2017. It's not the kind of thing you get used to — every renewal is a reminder that the work you're putting into the community is being recognized.

What the Microsoft MVP Program Provides

The MVP award is an invaluable resource to me personally. It gives me a more direct line of communication with the Microsoft product teams and employees — the ability to provide feedback, collaborate on things together, and get early insight into where the platform is heading. That access makes me better at what I do, and it makes the guidance I share with the community more informed.

The award also validates the work I'm doing in the Microsoft space and the impact on its community. The majority of my contributions live on the CommunityDays.org platform — organizing events, supporting other organizers, and helping grow the community events ecosystem.

But that's just one piece. I'm still highly technical and sharing knowledge through blog posts and talks. I'm mentoring the next generation of developers coming into this space and solving real customer problems every day. I'm running events locally and visiting and supporting other events around the world.

And it helps with community events in a real, tangible way. When we organize and run events for our local community, people are more confident that the event will be of value when it's organized and delivered by Microsoft MVPs. That credibility opens doors — for speakers, for sponsors, and most importantly for attendees who take time out of their day to show up and learn.

What This Past Year Looked Like

On the MGCI (Microsoft Global Community Initiative) side, I joined the board and participate in weekly meetings with Microsoft to help steer the direction of Community Days. A few times a month there are MGCI board meetings, and I'm running the Tech and Tools subcommittee — focused on the platform and tooling that powers CommunityDays.org. I also pop in on the MGCI Community Event Organizer Training & Office Hours to help new organizers get started.

A new role this year: a recurring bi-weekly appearance on the M365 General Dev Special Interest Group (SIG). It's been a great way to stay connected with the broader developer community and share what we're building.

Behind the scenes, I've put thousands of hours into developing on CommunityDays.org — the platform that supports community events worldwide. That work doesn't always get visibility, but it's foundational to everything else.

Events — Running, Attending, Supporting

On the events front, this was a big year. I helped bring M365 NJ back, ran M365 NYC for our biggest and best event since the pandemic, organized AI Conference (AICO) New York City and M365 Philly, and helped out with M365 DC. I also attended events like TechCon Atlanta and M365 Memphis! — showing up to support other communities matters just as much as running your own.

Mentorship and Connections

Beyond events, I've been focused on mentoring — helping people navigate their careers in this space, making introductions, and connecting folks with job opportunities when I can. Some of the most rewarding work isn't visible at all. It's a conversation that helps someone land their next role or find the confidence to submit their first session.

The Ideas That Didn't Ship

I'll also be honest about the handful of initiatives that started with momentum but didn't have enough bandwidth to fully develop. Time and resources are finite. Some ideas need to wait for the right season — and knowing when to put something on hold is just as important as knowing when to start.

Continuing to Contribute While Life Changes

I'll be honest — the landscape of contributing has changed for me personally. Early in my MVP journey, time was abundant. I could travel to events, write late into the night, and pour hours into side projects without thinking twice. Now I'm a father of two young girls, and that math is different. I'm doing what I can to give back while also being an involved parent.

That tension has actually made me more intentional about where I spend my time. I focus on contributions that have the most impact — the events that reach the most people, the blog posts that solve real problems, the mentoring conversations that help someone take their next step. Less time means I have to make it count.

The goal hasn't changed since before I ever had the MVP title: if something I share helps one person solve a problem or learn something new, that's the mission accomplished. I just have to be smarter about how I get there now.

Thank you to Microsoft and to everyone in the community who makes this worth doing.

If you're thinking about contributing to the community — speaking, organizing events, or sharing what you've learned — start small and get involved. The ecosystem grows stronger every time someone decides to give back.

]]>
https://thomasdaly.net/2025/07/31/renewed-microsoft-mvp-2025-2026/feed/ 0 3407
Managing Document Library Versioning and Expiration Across a SharePoint Tenant https://thomasdaly.net/2025/06/14/managing-document-library-versioning-tenant/ https://thomasdaly.net/2025/06/14/managing-document-library-versioning-tenant/#respond Sun, 15 Jun 2025 00:00:00 +0000 https://thomasdaly.net/?p=3434 Steampunk furnace with a robotic arm sorting holographic document versions on a conveyor belt, a glowing 60 DAYS display overhead

If you've looked at your SharePoint storage report lately and wondered where all the space went, check version history. Depending on your tenant's configuration — up to 500 major versions, or Microsoft's newer "automatic" model with no fixed cap — there's no expiration by default. Every save, every autosave from co-authoring, every edit creates a version that sticks around forever.

We found libraries where version history was consuming 3-4x the space of the current files. Co-authoring makes it worse — a one-hour session with multiple editors can generate dozens of versions. Multiply that across hundreds of sites and it adds up fast.

Microsoft added tenant-level version controls in the SharePoint Admin Center (Settings > Version history limits), which works for broad policy. But we needed per-library control baked into our site provisioning pipeline — applied consistently whether a site was created today or two years ago, without relying on someone remembering to toggle an admin setting.

We needed a consistent policy: 100 major versions, minor versions enabled, and automatic expiration after 60 days. Applied to every site in the tenant — both existing sites and every new site as it's provisioned. Here's how we did it.


The Versioning Policy

We reviewed storage reports and asked the business how far back they'd ever actually needed to roll back. The answer was almost always days, not months. We landed on:

SettingValueRationale
Major versions100Covers months of active editing, even on busy documents
Minor versionsEnabledSupports draft/publish workflows
Version expiration60 daysAuto-trims anything older than two months

The 60-day expiration is the key lever. It bounds version history regardless of edit frequency. Both major and minor versions are subject to the policy — anything older than 60 days is eligible for cleanup.


Applying Versioning at Provisioning Time

Our sites are provisioned via an Azure Function queue trigger that creates the site, applies a Patterns and Practices (PnP) template, copies template folders, and configures versioning. Here's the relevant section:

# Enable versioning on the Documents library
Write-Host "Enabling versioning on 'Documents' library."
Set-PnPList -Identity "Documents" `
    -EnableVersioning $true `
    -MajorVersions 100 `
    -EnableMinorVersions $true `
    -Connection $newSiteConnection

Write-Host "Setting expiration for versions in 'Documents' library to 60 days."
Set-PnPList -Identity "Documents" `
    -ExpireVersionsAfterDays 60 `
    -Connection $newSiteConnection

Note the two separate Set-PnPList calls. Combining -ExpireVersionsAfterDays with the other versioning parameters in a single call can produce inconsistent results depending on your PnP PowerShell version. We learned that the hard way — split them and it's reliable.


Remediating Existing Sites

New sites are handled, but existing sites still had default settings and months of accumulated version history. We built a bulk remediation script that reads a site inventory CSV and applies the same policy:

Import-Module PnP.PowerShell

# Path to the CSV file containing site inventory
$csvFilePath = "sites.csv"

# Read the CSV file
$sites = Import-Csv -Path $csvFilePath

# Iterate through each active site
foreach ($site in $sites) {
    if ($site.Status -eq 'C' -and -not [string]::IsNullOrWhiteSpace($site.SiteUrl)) {
        try {
            $siteUrl = $site.SiteUrl
            Connect-PnPOnline -Url $siteUrl -ClientId "your-app-registration-client-id"

            # Apply versioning policy
            Set-PnPList -Identity "Documents" `
                -EnableVersioning $true `
                -MajorVersions 100 `
                -EnableMinorVersions $true

            Set-PnPList -Identity "Documents" `
                -ExpireVersionsAfterDays 60

            Write-Host "Successfully updated site: $siteUrl" -ForegroundColor Green
        }
        catch {
            Write-Host "Failed to update site: $siteUrl. Error: $_" -ForegroundColor Red
        }
        finally {
            Disconnect-PnPOnline
        }
    }
}

The CSV tracks every site with a status column — C for active, R for retired. The script skips retired sites and empty URLs, so it's safe to run against the full inventory.


What Happens When Expiration Kicks In

-ExpireVersionsAfterDays 60 doesn't delete old versions immediately — SharePoint handles it as a background job. A few things to know:

  • Background timer job. Versions older than 60 days become eligible for deletion, but actual cleanup can take days on large libraries.
  • Current version is never expired. Only historical versions are affected.
  • Recycle bin delay. Deleted versions go to the recycle bin (93 days total across first and second stage). True storage recovery happens after the recycle bin clears.

If you need to reclaim storage immediately:

Clear-PnPRecycleBinItem -All -Force -Connection $siteConnection

Not reversible. We ran this selectively on sites with the highest storage pressure, not tenant-wide.


Monitoring the Results

After rolling this out, we tracked storage in SharePoint Admin Center. Within a few weeks, total tenant storage dropped as background expiration worked through the backlog. The bigger win was that storage growth flattened — no more unbounded accumulation.


Lessons Learned

Split the Set-PnPList calls. Versioning and expiration in separate calls avoids parameter combination edge cases.

Run remediation quarterly. Settings can drift — someone changes a library manually, or a new library gets created outside the pipeline. Re-running the bulk script catches it.

Watch the recycle bin. Expired versions go to the recycle bin, not into thin air. If you're trying to reclaim storage urgently, clear the recycle bins too.

Communicate before you enforce. We notified users that versions older than 60 days would be automatically cleaned up. Nobody complained — most didn't know versioning existed. But it's good governance to communicate before changing retention behavior.


Wrapping Up

SharePoint versioning is one of those features that works against you when left on autopilot. The defaults are safe but wasteful. By enforcing a consistent policy — 100 major versions with 60-day expiration — across every site in the tenant, we reduced storage consumption significantly and eliminated a growing cost problem.

The approach is two-pronged: apply the policy automatically when new sites are provisioned, and run a bulk remediation script to bring existing sites in line. Both use PnP PowerShell's Set-PnPList, and both can run unattended.

If your tenant storage report has been creeping upward and you're not sure why, check your version history. Chances are that's where the space is going. And if you're managing versioning across a tenant differently — or have hit edge cases with ExpireVersionsAfterDays — I'd like to hear about it in the comments.

]]>
https://thomasdaly.net/2025/06/14/managing-document-library-versioning-tenant/feed/ 0 3434
Building a Two-Stage AI Pipeline for Invoice Processing with AWS Textract and Amazon Bedrock https://thomasdaly.net/2025/03/04/textract-plus-bedrock-two-stage-ai-pipeline/ https://thomasdaly.net/2025/03/04/textract-plus-bedrock-two-stage-ai-pipeline/#respond Wed, 05 Mar 2025 00:00:00 +0000 https://thomasdaly.net/?p=3384 Processing PDF invoices by hand doesn't scale. I was working on a reconciliation project where the finance team had thousands of PDF invoices that needed to be matched against records in their accounts payable system. The data lived in the PDFs — invoice numbers, dates, totals, vendor names — but getting it out meant someone had to open each one, find the fields, and type them into a spreadsheet. It was slow, error-prone, and took way too much time every month.

I wanted to automate the extraction end to end. Drop a PDF in, get structured data out. No manual intervention, no per-format configuration. The system needed to handle invoices, statements, contracts — whatever showed up in the inbox.

The twist was that these invoices came from hundreds of different vendors, and every single one formats their invoices differently. Different layouts, different labels, different date formats, different ways of presenting totals. Writing extraction rules per vendor wasn't going to work — there were too many, and new ones showed up all the time.

What I ended up building was a two-stage AI pipeline using AWS Textract for OCR and Amazon Bedrock for the AI normalization layer. Textract reads the page. Bedrock figures out what it all means and boils every format down to one consistent JSON structure — regardless of how the vendor laid out the invoice.

Intelligent Document Processing Pipeline Architecture — S3 inbox to SQS to Lambda (Textract + Bedrock) to S3 processed and RDS database

The Real Problem: Every Vendor's Invoice Looks Different

This was the fundamental challenge. We weren't dealing with one invoice format — we were dealing with hundreds. Every vendor sends their invoices laid out differently. Some have the invoice number at the top right. Others bury it in a table. Some label it "Invoice #", others call it "Reference No." or "Document ID." Dates show up as "01/15/2025," "January 15, 2025," "2025-01-15," or "15 Jan 25." Totals might be in a summary box, a footer line, or the last row of a table.

If you try to solve this with traditional OCR alone, you end up writing extraction rules for each vendor's format. That might work for your top 10 vendors, but when you have hundreds — and new ones showing up regularly — it's a losing game. You'd spend more time maintaining the rules than you'd save on manual entry.

That's where the AI comes in. The AI is the normalizer. It doesn't care that Vendor A puts the invoice number in the header and Vendor B puts it in line 3 of a table. You give it the raw extracted text and tell it: "Find me the invoice number, the date, the total, and the vendor." It figures out which field is which regardless of layout, labeling, or format — and returns it in a single consistent JSON structure every time.

Why Two AI Services Instead of One?

So if the AI handles the normalization, why not just send the PDF directly to the model and skip OCR entirely?

I actually tried this first. Bedrock supports direct PDF input — you base64-encode the file and send it straight to the model. But there are practical limits. Bedrock caps direct PDF input at 5MB, and a lot of real-world invoices — especially scanned multi-page documents — blow right past that.

More importantly, even when the files were small enough, the results were inconsistent. The model would miss table data, misread amounts, or confuse fields that were visually close together on the page. Out of roughly 15,000 documents, the direct-to-Bedrock approach failed on close to 100% of them when used as the primary path. It was trying to solve two problems at once, and neither one got the attention it needed.

The two problems are:

  1. The spatial problem — Where is the text on the page? What's a form label vs. a value? What belongs to which table column? This is a vision problem.
  2. The semantic problem — Out of all these fields, which one is the invoice number? Is this date the invoice date or the due date? Is this amount the subtotal or the total? This is a comprehension problem.

Textract is purpose-built for the spatial problem. It doesn't just do OCR — it identifies key-value pairs (like "Invoice Date: 12/15/2025"), detects table structures with rows and columns, and maps out form fields. It understands the layout of the page in a way that a general-purpose language model can't reliably match.

Bedrock handles the semantic problem — the normalization. I take all that structured data from Textract and hand it to the AI model. Now the model doesn't have to figure out where things are on the page. It just has to look at a list of key-value pairs and lines of text and decide: "This is the invoice number. This is the date. This is the total." And it does that consistently, regardless of whether the invoice came from a massive distributor or a one-person shop with a Word document template.

The combination means Textract does what it's best at (reading the page), and the AI does what it's best at (understanding the content and normalizing it into a single format). Neither one alone would have worked nearly as well.

The AWS Architecture

The whole pipeline is serverless. Drop a PDF into an S3 bucket, and everything else happens automatically — no servers to manage, no workers to monitor.

Serverless IDP Pipeline — S3 inbox to SQS to Lambda with Textract and Bedrock, showing the S3 prefix lifecycle from inbox to processing to processed

Here's the flow:

  1. PDF lands in S3 — uploaded to the inbox/ prefix of an S3 bucket
  2. S3 Event Notification fires — triggers automatically when a new object appears in inbox/
  3. SQS Queue receives the event — buffers the work and handles retries if the Lambda fails
  4. Lambda picks up the message — runs Textract, sends results to Bedrock, writes output to S3 and the database
  5. PDF moves through prefixesinbox/processing/processed/ (or error/ if something fails)

The S3 prefix movement is a nice pattern. You always know the state of a document by where it sits in the bucket. If something fails mid-processing, you can see exactly which files are stuck in processing/ and investigate.

The Lambda: Textract + Bedrock in One Function

The Lambda function does all the heavy lifting. When an SQS message arrives, it parses the S3 event, grabs the PDF, and runs it through both AI stages.

Step 1: Run Textract with FORMS + TABLES

const result = await textract.send(
  new StartDocumentAnalysisCommand({
    DocumentLocation: {
      S3Object: { Bucket: bucket, Name: key }
    },
    FeatureTypes: ["FORMS", "TABLES"]
  })
);

The FORMS and TABLES feature types are key. Basic OCR just gives you text. With these enabled, Textract identifies key-value pairs (like "Invoice Date: 12/15/2025") and detects table structures with rows and columns. This structured data is what makes the Bedrock step so much more accurate.

Raw Textract blocks on the left transformed into a clean structured payload with keyValues, tables, and lines on the right

Step 2: Send to Bedrock for AI Extraction

Here's where the normalization happens. I send the structured Textract data to Amazon Bedrock with a prompt that tells the model exactly what I need back.

The prompt is specific about the output format:

You extract key information from invoices or invoice-like documents.
Focus on identifying the invoice number - it is the most important field.

Return ONLY valid JSON matching exactly:
{
  "document_type": "invoice|statement|contract|other",
  "specific_number": {
    "label": "Invoice Number|Claim Number|PO Number|...",
    "value": "string or null",
    "confidence": 0.0
  },
  "key_fields": {
    "date": null, "due_date": null,
    "total": null, "tax": null,
    "vendor": null, "customer": null
  },
  "summary": "",
  "notes": []
}

Rules:
- Confidence is 0.0 to 1.0. If not found, value=null and confidence=0.
- ALL dates MUST use format MM/DD/YYYY.
- ALL dollar amounts MUST use format $X,XXX.XX.

Getting the prompt right is critical. This is the part that took the most iteration. The AI will do exactly what you tell it to — and if you're not specific enough, you'll get inconsistent results across thousands of documents.

A few things I learned through trial and error:

  • Define the exact JSON schema. I tell the model exactly what shape to return. No extra keys, no variations. Without this, some documents come back with extra fields, others are missing fields, and your parsing code has to handle every variation. Lock down the schema and the output is predictable.
  • Enforce formats in the prompt. Dates as MM/DD/YYYY, amounts as $X,XXX.XX. This is huge. Without explicit format rules, the AI will return dates in whatever format it finds on the document — "January 15, 2025" from one vendor, "2025-01-15" from another, "01/15/25" from a third. The whole point of this pipeline is normalization, so the prompt has to enforce it. If you don't do this here, you end up writing format conversion code downstream for every variation.
  • Confidence scores. The model rates its own confidence on the invoice number. A 0.95 means it's pretty sure. A 0.4 means the document might not even have an invoice number. This is valuable downstream for deciding what needs human review vs. what can flow through automatically.
  • Give the model a place to flag problems. The notes array lets the AI say "multiple invoice numbers found" or "document appears to be a statement not an invoice." Without this, the model silently picks one and you never know it was uncertain. Much better to have it tell you.

Step 3: Write the Results

The Lambda writes a .ai.json file to the processed/ prefix in S3 and upserts a database record using a SQL MERGE on source_file — so reprocessing the same PDF updates instead of duplicating. After writing, the PDF moves from processing/ to processed/, or to error/ if something fails.

S3 bucket prefix structure showing inbox, processing, processed, and error prefixes with PDF and JSON files in each

The Textract Fallback — When OCR Fails

This one surprised me. Some PDFs just don't work well with Textract. Scanned documents with poor quality, unusual layouts, or PDFs that are actually just embedded images. Textract would time out or return very few blocks.

Rather than failing the entire document, I built a fallback: send the PDF directly to Bedrock. Bedrock can accept PDFs directly (up to ~5MB). The extraction quality isn't as good as the two-stage approach — the model has to handle both the spatial and semantic problems — but it's significantly better than returning nothing.

The .ai.json output records whether Textract was used or if it fell back to direct PDF ("status": "FALLBACK_PDF"), so you can audit which documents might need a closer look.

This is a non-ideal workaround. I'm pointing this out for transparency. The direct PDF path loses the structured key-value pairs and table data that Textract provides. But in practice, getting 80% of the fields from a difficult document is better than getting 0%.

When Everything Fails — The Error Pipeline Matters

Here's something that doesn't get talked about enough in AI pipeline posts: you need a plan for errors, not just results.

Out of ~15,000 documents, the vast majority processed fine through the two-stage pipeline. But a meaningful number didn't — Textract timed out, Bedrock returned malformed JSON, the PDF was corrupt, the file was too large, or the document just wasn't an invoice at all (contracts, cover letters, blank pages).

When both Textract and the direct PDF fallback fail, the document lands in the error/ prefix in S3 with a detailed .error.json file next to it. That file captures what went wrong — which stage failed, the error message, timestamps. This gives you a clear picture of what needs attention.

But here's the reality: some documents still end up requiring manual entry. No pipeline handles 100% of real-world data. The goal isn't zero errors — it's making the error path visible and manageable. You want to know exactly which documents failed, why they failed, and have a clean way to either fix and reprocess them or route them to a person.

A few things that helped:

  • Error sidecar files per stage. If Textract fails, a .textract-error.json gets written. If Bedrock fails, a .bedrock-error.json gets written. You can tell at a glance which stage broke.
  • The error/ prefix preserves the original folder structure. If a file was in inbox/vendor-123/invoice.pdf, it lands in error/vendor-123/invoice.pdf. Easy to find, easy to reprocess — just move it back to inbox/.
  • Error counting scripts. I built simple scripts to scan the error files and produce counts by error type. This tells you if you have a systemic issue (Textract throttling, Bedrock model changes) vs. one-off bad files.

The pipeline that processes 14,800 out of 15,000 documents automatically is valuable. But the system that tells you exactly which 200 need human attention — and why — is what makes it production-ready.

What I'd Do Differently

A few things I've been thinking about for the next iteration:

  • Store dates and amounts as proper types. Right now dates are nvarchar and amounts are varchar with dollar signs. This means every comparison requires TRY_CAST and REPLACE gymnastics. It works, but it's fragile. I'd normalize these at write time in the Lambda instead.
  • Add a review queue for low-confidence extractions. The confidence scores are there but nothing surfaces the low-confidence results for human review yet. A second SQS queue that catches anything under 0.7 confidence and routes it to a review UI would be a natural next step.
  • Dead letter queue for persistent failures. Right now failed documents land in the error/ prefix in S3, which works but requires manual inspection. A DLQ with CloudWatch alarms would make this more operationally solid.

Conclusion

The core problem was never "how do I OCR a PDF." It was "how do I take invoices from hundreds of vendors — all with different layouts, labels, and formats — and normalize them into one consistent structure." That's the problem the two-stage pipeline solves.

Textract handles the reading. The AI handles the normalization. You don't write rules per vendor. You don't maintain a mapping table of "Vendor A calls it Reference No., Vendor B calls it Invoice #." The model figures that out, and every document comes out the other side in the same JSON format ready for matching.

If you're evaluating AWS services for document processing, I'd recommend starting with Textract + Bedrock together rather than either one alone. Textract alone gives you raw data in whatever format the vendor decided to use. An LLM alone struggles with the spatial layout of real documents. The combination is where it clicks.

I'm still iterating on the matching and reconciliation side of this — automatically pairing the extracted data with accounts payable import records using vendor number, date, and amount as a composite key. If there's interest, I'll cover that in a follow-up post.

Have you dealt with multi-format invoice extraction? I'd be curious whether you went the rules-based route or let AI handle the normalization. Drop a comment below — I'd love to hear what worked and what didn't.

Invoice reconciliation dashboard showing matched, manual review, and unmatched statuses across vendor invoices
]]>
https://thomasdaly.net/2025/03/04/textract-plus-bedrock-two-stage-ai-pipeline/feed/ 0 3384
Sort Your SharePoint Site Directory Alphabetically https://thomasdaly.net/2025/02/02/sorting-your-sharepoint-site-directory-by-title-with-pnp-search/ https://thomasdaly.net/2025/02/02/sorting-your-sharepoint-site-directory-by-title-with-pnp-search/#comments Sun, 02 Feb 2025 21:53:07 +0000 https://thomasdaly.net/?p=3339 In my previous article, Build a Site Directory with PnP Search Web Parts, I walked through how to create a dynamic site directory using PnP Modern Search. While that setup provides a powerful and flexible way to display SharePoint sites, you may have noticed a limitation—SharePoint doesn’t allow sorting by the Site Title property out of the box.

This article covers the extra step needed to sort your site directory alphabetically by title. The trick? Leveraging RefinableString fields to make the Site Title sortable. It’s a simple process with just a few tweaks, and by the end of this guide, you’ll have a properly sorted site directory in no time. Let’s dive in.

Updating the Search Schema

Navigate to the SharePoint Admin Center

Expand ‘Advanced’ then click ‘More Features’ and finally click ‘Open’ under the ‘Search’ group

Next click ‘Manage Search Schema’

Next enter ‘RefinableString’ in the search box and click the green button

Hover over any of the available properties and find the drop down, then click ‘Edit Map Property’

It’s not critical on what number refinable string you choose

Scroll all the way to the bottom and click ‘Add a Mapping’

Enter ‘display’ and click ‘Find’, then select Basic:displaytitle and click O

Verify the mapped managed property and click ‘OK’

This completed the Search Schema changes. These can take quite some time to take effect so be patient. It could be a day before you see it working.

Next navigate to the SharePoint page with the PnP Search Web Part configured as a Site Directory

NOTE: The following steps are only necessary if you have customized the Custom template.

  1. Edit the page and open the web part properties of the Search Results web part
  2. On the second page, click on the curly braces { }
  3. Take a copy of the entire

Change the template to Debug. This is so that you can visually see the property to ensure that the search property has taken effect.

Go back to the first page of the web part properties, enter the name of the RefinableString## you modified and hit Enter.

The result should appear on the left hand site – most likely it will read null to begin with. It will eventually populate. It can appear at any time but at least wait 8 hours or a full day. Recheck the steps in the search config but at this point there is not much more to do but wait.

What to do if the site title will no show up?

If it’s just not showing for a few sites

  • Update the Site Title of the site. Change it temporarily and then change it back
  • Trigger a reindex on the 1 site

If it’s not showing up for many sites

Recrawling puts strain on the service as a whole so it’s not meant to be run over and over as it can take weeks to finish on very large sites.

After the crawl property appears move on to the next step.

Apply the Sort

Navigate back to the Search web part page and edit the web part properties

Click ‘Edit sort settings’

Type in your RefinableString## into the Field name box, click Default Sort and then click ‘Add and save’

The sorting is now set to alphabetical order from A-Z by default.

Next go to the second page of the web part properties

Set the template back to what it previously was. In our case Custom and then click on the { } to edit the custom template

Copy / Paste in the template, Save and Republish.

The final result should be in order by site title.

Wrap Up

Sorting your SharePoint site directory by title might not be possible out of the box, but with a little creativity—leveraging RefinableString fields—it becomes a straightforward solution. By following these steps, you can ensure your directory is organized in a way that makes finding sites easier for users.

This small but impactful tweak enhances usability and keeps your directory structured exactly how you need it. If you’re already using PnP Search Web Parts, this is a great optimization to implement. Have questions or run into issues? Drop a comment—I’d love to hear how this worked for you!

]]>
https://thomasdaly.net/2025/02/02/sorting-your-sharepoint-site-directory-by-title-with-pnp-search/feed/ 2 3339
Build a Site Directory with PnP Search Web Parts https://thomasdaly.net/2025/01/20/build-a-site-directory-with-pnp-search-web-parts/ https://thomasdaly.net/2025/01/20/build-a-site-directory-with-pnp-search-web-parts/#comments Tue, 21 Jan 2025 04:07:34 +0000 https://thomasdaly.net/?p=3324 This article will demonstrate how PnP Search Web Parts can be used to build a comprehensive Site Directory that not only enhances navigation but also improves the overall user experience by providing a centralized resource for accessing various sites within your organization.

Pre-Requisites

PnP Modern Search – Search Web Parts – v4

From your SharePoint site

Instructions

Create a new Page

Edit the Page

Add the PnP Search Results Web Part

Next click ‘Configure’

Next click ‘SharePoint Search’

Next click ‘Customize’ under ‘Layouts slots’

Add SiteUrl — mapped to SPSiteUrl, click ‘Save’

Edit the Query Template, using this as my base. It will show all ‘Sites’ and not the app catalog or my sites.

contentclass:STS_Site -"{TenantUrl}/sites/contenttypehub" -"{TenantUrl}/sites/appcatalog" -SiteTemplate:SPSPERS

** Thanks Kasper Larsen for the comment and suggestion to use the tokens for more resuability! **

Make sure to hit Apply to save the query.

Scroll down to the Paging Options

For our example we want to show as many sites as possible.

Set the number of items per page to 500

Click ‘Next’ to go to the second page of the property pane

Click ‘Custom’ Template and then {}

Replace this CSS

 /* Insert your CSS overrides here */
       .example-themePrimary a {
            color: {{@root.theme.palette.themePrimary}};
        }

        .site-logo {
            width: 35px;
            margin-right: 10px;
        }

        .icon {
            width: 20px;
            height: 16px;
        }

        ul.template--custom {
            list-style: none;
            padding-left: 5px;
	 columns: 4;
        }

        ul.template--custom li {
            display: flex;
            padding: 8px;
        }
        
        .site-link {
            line-height: 20px;
        }
        
        .site-link > a {
            display: flex;
            align-items: center;
        }
        
        .site-link, .site-link a, .site-link a:visited, .site-link a:hover, .site-link a:link {
            color: #000 !important;
            text-decoration: none;
        }

Next replace the item template

<template id="content">

 {{#> resultTypes item=item}}
                            {{!-- The block below will be used as default item template if no result types matched --}}
                            <li class="site-link">
                                <a href="{{slot item @root.slots.SiteUrl}}">
                                   <img class="site-logo" src="{{slot item @root.slots.SiteUrl}}/_api/siteiconmanager/getsitelogo?type='1'"/>
                                   <span class="site-name">{{slot item @root.slots.Title}}</span>
                                </a>
                            </li>
                        {{/resultTypes}}

                    </template>

Click ‘Save’

Click Republish

Your page should now look something like this

Conclusion

This is the first step toward making a Site Directory with all the sites you have access to. You could easily extend this to use a search box to filter these items down or change the KQL query to show only certain types of sites. You might also notice that the sites are not in order. In the next article I’ll walk you through the process of implementing the sort on the Site Name.

]]>
https://thomasdaly.net/2025/01/20/build-a-site-directory-with-pnp-search-web-parts/feed/ 5 3324
SharePoint Site – Automatically Apply Site Templates on Site Creation https://thomasdaly.net/2024/02/20/sharepoint-site-automatically-apply-site-templates-on-site-creation/ https://thomasdaly.net/2024/02/20/sharepoint-site-automatically-apply-site-templates-on-site-creation/#comments Wed, 21 Feb 2024 04:17:16 +0000 https://thomasdaly.net/?p=3313 Applying a consistent site template across newly created SharePoint sites is essential for maintaining uniformity, particularly for departmental sites. While designing and creating templates is straightforward, the application process—requiring manual site visits or PowerShell script executions—is less efficient. This gap underscores the need for automation, eliminating the need for manual interventions or technical scripting. In this article, we introduce a streamlined method to automate site template applications, ensuring consistency and efficiency across your SharePoint environment.

Leverage Power Automate for Seamless SharePoint Site Template Applications

Unlock the power of automation with Power Automate by setting up a new flow using a SharePoint administrator account. Here’s how to streamline your process:

  1. Initiate your Flow: Select the ‘When an item is created’ SharePoint trigger to start your automation whenever a new site is created.
  2. Configure Site Address: Input the custom URL of your Admin site in the format https://{domain}-admin.sharepoint.com to specify where your flow should monitor for new site creations.
  3. Specify the List Name: Enter DO_NOT_DELETE_SPLIST_TENANTADMIN_AGGREGATED_SITECOLLECTIONS as the list name. This special list captures all newly created sites, ensuring no site, including those associated with Microsoft Teams, is overlooked.

By following these steps, you establish a foundation for applying templates automatically to every new SharePoint site, including those created for Microsoft Teams, enhancing consistency and efficiency across your digital workspace.

Streamlining Site Template Application with Power Automate and Azure Functions

Once a new SharePoint site is created, the flow initiates, albeit with a slight delay. This delay is a small trade-off for the automation benefits you gain. At this stage, your flow can send a message to a queue, including crucial site details such as the ‘SiteUrl’.

Important Note: The activation of this flow is not immediate. There may be a brief waiting period before the flow triggers. Despite this, it remains the most efficient automation method I’ve encountered for this purpose.

Next Steps: Leveraging Azure Functions

The automation journey continues with the creation of an Azure Function, specifically designed to respond to queue messages. This function, triggered by the queue, grants you the power to programmatically adjust your SharePoint site, including the application of the desired site template.

For guidance on creating an Azure Function to apply a site template, please consult the Microsoft documentation. This resource provides comprehensive steps and best practices:

Microsoft Documentation on Creating Azure Functions for Site Templates

This link directs you to detailed instructions for leveraging PnP provisioning with Azure Functions, facilitating the application of site templates in SharePoint.

]]>
https://thomasdaly.net/2024/02/20/sharepoint-site-automatically-apply-site-templates-on-site-creation/feed/ 1 3313
Updating Web Part Properties via Azure Function & PowerShell PnP + Searchable Content https://thomasdaly.net/2024/01/06/updating-web-part-properties-via-azure-function-powershell-pnp-searchable-content/ https://thomasdaly.net/2024/01/06/updating-web-part-properties-via-azure-function-powershell-pnp-searchable-content/#respond Sun, 07 Jan 2024 03:31:52 +0000 https://thomasdaly.net/?p=2901 I’ve been working on an SPFx web part that allows users to edit content on the page without being a Contributor on the site. It’s basically a remake of the Out of the Box Text webpart but putting it all together had some interesting challenges that I want to share.

The focus of this blog is not on the exact solution for the text editor but on how the web part calls an Azure function to update it’s properties AND make the information that is displayed searchable.

What are the major problems?

The users using this web part are visitors. Visitors cannot update edit pages or list items. This web part allows them to edit the content of the web part BUT they can’t actually edit the page and they can’t actually edit update page content or list items.

This is why we need to incorporate the Azure Function. The Azure Function will perform the update using elevated privileges. This allows visitors to edit content on the page without having rights to do so normally.

Architecture

Below is the architecture used in the scenario. The Azure Function in this case is using a Azure AD app permissions, Sites.ReadWrite.All and authenticating via a certificate to access the SharePoint site and make the necessary changes.

SPFx Web Part

On the client side there is a standard SPFx web part that will call an Azure Function with a POST request and a body. This body will contain the content of the web part text. In order for the web part to maintain this information it needs to store this in the Web Part Properties. The web part contains a Web Part Property called Content that is a string value. This is what the web part displays.

Azure Function – PnP PowerShell

using namespace System.Net

# Input bindings are passed in via param block.
param($Request, $TriggerMetadata)

# Write to the Azure Functions log stream.
Write-Host "PowerShell HTTP trigger function processed a request."

$cert = "$($TriggerMetadata.FunctionDirectory)\\cert.pfx"
$cert_password = (ConvertTo-SecureString -String $env:CERT_PASSWORD -AsPlainText -Force)

# Parameters for the Webpart update 
$SiteURL = $Request.Query.SiteURL
$Page = $Request.Query.Page
$WebPartIdentity = $Request.Query.WebPartIdentity
$PropertyKey = $Request.Query.PropertyKey
$PropertyValue = $Request.Query.PropertyValue

$v = (Get-Module "PnP.PowerShell").Version
Write-Host $v

$SiteURL = $Request.Body.SiteURL
if (-not $SiteURL) {
    $errorMessage = "SiteURL parameter can not be empty. "
}
$Page = $Request.Body.Page
if (-not $Page) {
    $errorMessage += "Page parameter can not be empty. "
}
$WebPartIdentity = $Request.Body.WebPartIdentity
if (-not $WebPartIdentity) {
    $errorMessage += "WebPartIdentity parameter can not be empty. "
}
$PropertyKey = $Request.Body.PropertyKey
if (-not $PropertyKey) {
    $errorMessage += "PropertyKey parameter can not be empty. "
}
$PropertyValue = $Request.Body.PropertyValue
if (-not $PropertyValue) {
    $errorMessage += "PropertyValue parameter can not be empty. "
}
 
Write-Host "Request Data is ... "

if (-not $errorMessage) {
    try {
        Connect-PnPOnline -Url $SiteURL -ClientId $env:APP_CLIENT_ID -CertificatePath $cert -CertificatePassword $cert_password -Tenant $env:APP_TENANT_ID
        Write-Host "Successfully connected"
        $web = Get-PnPWeb
        $webTitle = $web.Title
        Write-Host "Web: $webTitle"

        $page = Get-PnPPage -Identity $page.Substring($page.LastIndexOf("/") + 1)

        $controls = $page.Controls | Where-Object { $WebPartIdentity -eq $_.Title -or $WebPartIdentity -eq $_.WebPartId -or $WebPartIdentity -eq $_.InstanceId }    

        $controls | ForEach-Object {                        
            Write-Host "Updating web part, Title: " $($_.Title) ", InstanceId: " $($_.InstanceId)
            try {
                $webpartJsonObj = ConvertFrom-Json $_.PropertiesJson

                if ($PropertyKey -and $PropertyValue) {
                    # Check if both PropertyKey and PropertyValue are arrays of the same length
                    if ($PropertyKey.Count -eq $PropertyValue.Count) {
                        for ($i = 0; $i -lt $PropertyKey.Count; $i++) {
                            $webpartJsonObj | Add-Member -MemberType NoteProperty -Name $PropertyKey[$i] -Value $PropertyValue[$i] -Force
                        }
                    }
                    else {
                        # Handle the case where the arrays have different lengths
                        $errorMessage = "PropertyKey and PropertyValue arrays must have the same number of elements. $($PropertyKey.Count) $($PropertyValue.Count)" 
                    }
                }
                else {
                    $errorMessage = "PropertyKey and PropertyValue parameters must be provided as arrays."
                }

                if (-not $errorMessage) {
                    $_.PropertiesJson = $webpartJsonObj | ConvertTo-Json
                    Write-Host "Web part properties updated!" -ForegroundColor Green
                    $body = "Web part properties updated!"
                }
                else {
                    Write-Host $errorMessage -ForegroundColor Red
                    $body = $errorMessage
                }
            }
            catch {       
                $errorMessage += "Failed updating web part, Title: $($_.Title), InstanceId: $($_.InstanceId), Error: $($_.Exception)"                    
                Write-Host "Failed updating web part, Title: $($_.Title), InstanceId: $($_.InstanceId), Error: $($_.Exception)"
            }
        }

        # Save the changes and publish the page
        Write-Host "Saving Page"
        $page.Save()
        $page.Publish()
        Write-Host "Saved Page"

        try {
            $item = Get-PnPListItem -Id $page.PageListItem.Id -List $page.PagesLibrary.Id -Fields "CanvasContent1"
            $canvasContent = $item["CanvasContent1"]
            
            # Load the HTML document using AngleSharp
            $parser = [AngleSharp.Html.Parser.HtmlParser]::new() 
            $document = $parser.ParseDocument($canvasContent)

            # Define the target element selector
            $targetElementSelector = "div[data-sp-webpartdata*='$($WebPartIdentity)'] div[data-sp-htmlproperties='']"

            # Find the target element
            $targetElement = $document.QuerySelector($targetElementSelector)

            $index = $PropertyKey.IndexOf("searchableTextString")
            if ($index -ne -1) {
                # Create a new div element
                $newDiv = $document.CreateElement("div")
                $newDiv.SetAttribute("data-sp-prop-name", "searchableTextString")
                $newDiv.SetAttribute("data-sp-searchableplaintext", "true")
                $newDiv.TextContent = $PropertyValue[$index]

                # Remove all child nodes from the target element
                if ($targetElement.Children.Length -ge 1) {
                    $targetElement.Children.Remove()
                }

                # Append the new div element to the target element
                $targetElement.AppendChild($newDiv)
            }  
            
            # Get the updated HTML content as a string
            $updatedHtmlContent = $document.DocumentElement.OuterHtml

            # Encode special characters
            $encodedString = $updatedHtmlContent -replace '{', '{' -replace '}', '}' -replace ':', ':' -replace [char]0x00A0, " "

            # Update the CanvasContent1 property with the encoded string
            $item["CanvasContent1"] = $encodedString

            $item.SystemUpdate()
            Invoke-PnPQuery
        }
        catch {
            $status = [HttpStatusCode]::BadRequest   
            $errorMessage += $_.Exception.Message
            Write-Host $_.Exception.Message
        }

        Write-Host "$($_) saved and published." -ForegroundColor Green 
        $body += " Page saved and published." 
        $status = [HttpStatusCode]::OK
    }
    catch {
        $status = [HttpStatusCode]::BadRequest   
        $errorMessage += $_.Exception.Message
        Write-Host $_.Exception.Message
    }
}

if (-not $errorMessage) {
    $message = @{
        message = $body | ConvertTo-Json -Compress
    }
} else {
    $message = @{
        message = $errorMessage | ConvertTo-Json -Compress
    }
    $status = [HttpStatusCode]::BadRequest
}

# Associate values to output bindings by calling 'Push-OutputBinding'.
Push-OutputBinding -Name Response -Value ([HttpResponseContext]@{
    StatusCode = $status 
    Body       = $message | ConvertTo-Json -Compress
})

Issue – Updating Web Part Properties from the backend (PnP PowerShell) won’t make the content searchable

There is a bit of magic that happens when you edit a page through the UI. When you edit the page and publish, SharePoint will update a page property called CanvasContent1. In this property it contains web part properties for all web parts on the page as well as the content of the webparts. This holds key information in searchable text strings. All this magic happens when you use the product as designed but when you make an update via PnP PowerShell the CanvasContent1 will not be automagically updated.

This sample code demonstrates a method to update the web part and render a new CanvasContent1. By creating a new CanvasContent1 your content would now be searchable.

]]>
https://thomasdaly.net/2024/01/06/updating-web-part-properties-via-azure-function-powershell-pnp-searchable-content/feed/ 0 2901
Community Days / Sessionize Key Deep Dive https://thomasdaly.net/2023/11/25/community-days-sessionize-key-deep-dive/ https://thomasdaly.net/2023/11/25/community-days-sessionize-key-deep-dive/#respond Sat, 25 Nov 2023 23:32:19 +0000 https://thomasdaly.net/?p=3040 Intro

In the previous article, we discussed creating a basic JSON-formatted API endpoint in Sessionize to integrate your event content, including Speakers, Sessions, and Schedule, into your Community Days event listing.

In this article, we’ll delve into each section in more detail and examine how it affects your event listing on Community Days.

Read First: Creating a Sessionize Key for Community Days

Section Highlights:

Sessionize API Endpoint

Let’s get started – in the following sections we cover the items near the bottom of the page, more specifically the Schedule Grid, Sessions, and Speakers options.

Schedule Grid & Sessions Section

RECOMMENDATION: keep these sections in sync with each other

Parts of the API utilize the Schedule Grid, while others rely on Sessions. To ensure the most consistent experience, maintain the same options for both sections.

NOTE: Your submission fields will be specific to your event. Jump to the Submission Fields / Filter Fields section for more info.

For Hybrid / Virtual events

  • Check Live icon with Link – this will add the link of the session in Community Days for easy access
  • Video Recording Link [optional if available]

NOTE: Live links will be displayed only before the event; they won’t be shown once the event is over.

Result

In Community Days, you can find this on the Sessions Tab.

In Community Days, you can access this on the Speakers Tab.

In Community Days, you can find this information on the Schedule Tab.

Adding Live Links

You can add Live Links to either your Sessions directly or to Rooms. Adding Live Links to Rooms will automatically apply to any Session scheduled in that Room.

Rooms Live Stream Links

Result

The result will be the same as in the previous sections. All Sessions scheduled in that Room will receive the Live Stream Link.

Session Live Stream Links

From the Sessionize left menu, Click Session, then Click the edit button on the Session you wish to add a Live Stream Link to.

Enter the ‘Live stream link’ and optionally the ‘Video recording link’, then click ‘Save changes’.

Result

The result will be the same as in the previous sections. That specific Session will now display a Live Stream Link.

NOTE: If you have added both a Room Live Stream link and a Session Live Stream link, the Session Live Stream link will take priority.

Submission Fields / Filter Fields

This section will show the fields that you can filter by on Community Days, Session Tab. Each event would have different options.

Example: On this call for Speakers page we ask the speakers to provide the Session Format & Track they are applying for.

On the API page we want to include Track so that attendees can filter by this option.

Result

On Community Days, on the event page, Sessions Tab – we now have the option to filter Sessions by Track.

Service Sessions – Registration, Breaks, Lunch, etc..

Service Sessions are blocks of time allocated for Registration, long breaks, Lunch or non-speaker sessions such as Opening, Closing or Network related sessions.

RECOMMENDATION: It’s highly recommended to Include Service Sessions on your Schedule. This lets attendees know about expected breaks so that they can better plan their day and you can set expectations in regard to the event timeline.

Adding Service Sessions

In the left Sessionize Menu – Click Schedule, then Schedule Builder

Click ‘+ Add Service Session’

Enter Detailed Information on this Service Session

Add Service Session to Schedule

Click and drag the Service Session into the Schedule

Click Save Changes

Result

On Community Days, on the event page, Sessions Tab – we now have the new Service Sessions included.

On Community Days, on the event page, Schedule Tab – we now have the new Service Sessions included.

Speakers

This section will show how the Speaker options can add detail to your Speaker listing in Community Days

Links

Community Days will read in and display the following links:

  • Twitter / X
  • LinkedIn
  • Blog
  • Company Website
  • Sessionize

Result

On Community Days, on the event page, Sessions Tab or Speakers Tab, clicking a Speaker will show their provided links.

Show Top Speakers first

Community Days will automatically showcase Top Speakers if they are marked Top Speakers / Featured.

It is recommended to check this option to Show Top Speakers first in the mobile App.

As stated in the previous section, Community Days will always show Top Speakers as Featured Speakers

How to Enable a Top / Feature Speaker

From the Sessions left Menu, Click Speakers, then Click on the Edit button on the Speaker you wish to turn into a Top / Featured Speaker

Check ‘Is top speaker of this event?’, then Click ‘Save changes’

Result

The selected Speaker(s) will be displayed at the top of the Speaker Tab in your event listing.

Conclusion

This concludes this guide on the various options in Sessionize and how they tie into Community Days. If you need more help please leave me a comment or head over to CommunityDays.org and fill out our feedback form.

]]>
https://thomasdaly.net/2023/11/25/community-days-sessionize-key-deep-dive/feed/ 0 3040