
关于
Azure AI Document Intelligence .NET SDK。使用预构建和自定义模型从文档中提取文本、表格和结构化数据
name: azure-ai-document-intelligence-dotnet description: Azure AI 文档智能 SDK (.NET)。使用预构建和自定义模型从文档中提取文本、表格和结构化数据。 risk: unknown source: community date_added: '2026-02-27'
Azure.AI.DocumentIntelligence (.NET)
使用预构建和自定义模型从文档中提取文本、表格和结构化数据。
安装
dotnet add package Azure.AI.DocumentIntelligence
dotnet add package Azure.Identity
当前版本: v1.0.0 (GA)
环境变量
DOCUMENT_INTELLIGENCE_ENDPOINT=https://<resource-name>.cognitiveservices.azure.com/
DOCUMENT_INTELLIGENCE_API_KEY=<your-api-key>
BLOB_CONTAINER_SAS_URL=https://<storage>.blob.core.windows.net/<container>?<sas-token>
身份验证
Microsoft Entra ID(推荐)
using Azure.Identity;
using Azure.AI.DocumentIntelligence;
string endpoint = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_ENDPOINT");
var credential = new DefaultAzureCredential();
var client = new DocumentIntelligenceClient(new Uri(endpoint), credential);
注意: Entra ID 需要自定义子域(例如
https://<resource-name>.cognitiveservices.azure.com/),而非区域端点。
API 密钥
string endpoint = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_ENDPOINT");
string apiKey = Environment.GetEnvironmentVariable("DOCUMENT_INTELLIGENCE_API_KEY");
var client = new DocumentIntelligenceClient(new Uri(endpoint), new AzureKeyCredential(apiKey));
客户端类型
| 客户端 | 用途 |
|--------|------|
| DocumentIntelligenceClient | 分析文档、分类文档 |
| DocumentIntelligenceAdministrationClient | 构建/管理自定义模型和分类器 |
预构建模型
| 模型 ID | 描述 |
|---------|------|
| prebuilt-read | 提取文本、语言、手写内容 |
| prebuilt-layout | 提取文本、表格、选择标记、结构 |
| prebuilt-invoice | 提取发票字段(供应商、项目、总计) |
| prebuilt-receipt | 提取收据字段(商户、项目、总计) |
| prebuilt-idDocument | 提取身份证件字段(姓名、出生日期、地址) |
| prebuilt-businessCard | 提取名片字段 |
| prebuilt-tax.us.w2 | 提取 W-2 税表字段 |
| prebuilt-healthInsuranceCard.us | 提取健康保险卡字段 |
核心工作流
1. 分析发票
using Azure.AI.DocumentIntelligence;
Uri invoiceUri = new Uri("https://example.com/invoice.pdf");
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-invoice",
invoiceUri);
AnalyzeResult result = operation.Value;
foreach (AnalyzedDocument document in result.Documents)
{
if (document.Fields.TryGetValue("VendorName", out DocumentField vendorNameField)
&& vendorNameField.FieldType == DocumentFieldType.String)
{
string vendorName = vendorNameField.ValueString;
Console.WriteLine($"Vendor Name: '{vendorName}', confidence: {vendorNameField.Confidence}");
}
if (document.Fields.TryGetValue("InvoiceTotal", out DocumentField invoiceTotalField)
&& invoiceTotalField.FieldType == DocumentFieldType.Currency)
{
CurrencyValue invoiceTotal = invoiceTotalField.ValueCurrency;
Console.WriteLine($"Invoice Total: '{invoiceTotal.CurrencySymbol}{invoiceTotal.Amount}'");
}
if (document.Fields.TryGetValue("Items", out DocumentField itemsField)
&& itemsField.FieldType == DocumentFieldType.List)
{
foreach (DocumentField item in itemsField.ValueList)
{
var itemFields = item.ValueDictionary;
if (itemFields.TryGetValue("Description", out DocumentField descField))
Console.WriteLine($" Item: {descField.ValueString}");
}
}
}
2. 提取布局(文本、表格、结构)
Uri fileUri = new Uri("https://example.com/document.pdf");
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-layout",
fileUri);
AnalyzeResult result = operation.Value;
foreach (DocumentPage page in result.Pages)
{
Console.WriteLine($"Page {page.PageNumber}: {page.Lines.Count} lines, {page.Words.Count} words");
foreach (DocumentLine line in page.Lines)
{
Console.WriteLine($" Line: '{line.Content}'");
}
}
foreach (DocumentTable table in result.Tables)
{
Console.WriteLine($"Table: {table.RowCount} rows x {table.ColumnCount} columns");
foreach (DocumentTableCell cell in table.Cells)
{
Console.WriteLine($" Cell ({cell.RowIndex}, {cell.ColumnIndex}): {cell.Content}");
}
}
3. 分析收据
Operation<AnalyzeResult> operation = await client.AnalyzeDocumentAsync(
WaitUntil.Completed,
"prebuilt-receipt",
receiptUri);
兼容工具
Claude CodeCursor
标签
AI与机器学习