Skillquality 0.70

azure-hdinsight

Expert knowledge for Azure HDInsight development including troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. Use when working with HDInsight Spark/Hive/Kafka/

Price
free
Protocol
skill
Verified
no

What it does

Azure HDInsight Skill

This skill provides expert guidance for Azure HDInsight. Covers troubleshooting, best practices, decision making, architecture & design patterns, limits & quotas, security, configuration, integrations & coding patterns, and deployment. It combines local quick-reference content with remote documentation fetching capabilities.

How to Use This Skill

IMPORTANT for Agent: Use the Category Index below to locate relevant sections. For categories with line ranges (e.g., L35-L120), use read_file with the specified lines. For categories with file links (e.g., [security.md](security.md)), use read_file on the linked reference file

IMPORTANT for Agent: If metadata.generated_at is more than 3 months old, suggest the user pull the latest version from the repository. If mcp_microsoftdocs tools are not available, suggest the user install it: Installation Guide

This skill requires network access to fetch documentation content:

  • Preferred: Use mcp_microsoftdocs:microsoft_docs_fetch with query string from=learn-agent-skill. Returns Markdown.
  • Fallback: Use fetch_webpage with query string from=learn-agent-skill&accept=text/markdown. Returns Markdown.

Category Index

CategoryLinesDescription
TroubleshootingL37-L132Diagnosing and fixing HDInsight cluster issues: creation/auth, networking, storage, Ambari/HDFS/Hive/HBase/Kafka/Spark/YARN problems, performance, disk/CPU, and known error codes/workarounds.
Best PracticesL133-L174Best practices for designing, securing, monitoring, scaling, and tuning HDInsight clusters and workloads (Hadoop, Spark, Hive, HBase, Kafka), including storage, migration, and performance optimization.
Decision MakingL175-L199Planning and migration guidance for HDInsight: sizing and performance, choosing storage/VMs/tools, upgrading versions/components, and moving Hadoop, HBase, Kafka, and configs to newer clusters.
Architecture & Design PatternsL200-L214HDInsight cluster architecture, security/VNet design, HA/DR and business continuity patterns, migration from on-prem Hadoop, shared storage, streaming (Spark/YARN), and Oozie-based pipelines.
Limits & QuotasL215-L222Guidance on HDInsight capacity limits: log size/retention, supported cluster node sizes, external metastore constraints, and requesting/managing CPU core quota increases.
SecurityL223-L266Securing HDInsight clusters: identity and access (Entra, LDAP, Ranger, RBAC), network isolation (NSG, Private Link), TLS/encryption, Kafka/Hive/Spark security, and security best practices.
ConfigurationL267-L323Configuring and tuning HDInsight clusters: networking/VPN, Ambari/Hive/Spark/HBase settings, autoscale, monitoring/logging, SSH/Jupyter/VS Code access, and script-based customizations.
Integrations & Coding PatternsL324-L391Patterns and code samples for integrating HDInsight (Hive, Spark, Kafka, HBase, MapReduce, Sqoop) with tools, SDKs, REST/CLI, and external services like SQL, Cosmos DB, Power BI, IoT, and Synapse
DeploymentL392-L405Creating, configuring, migrating, and automating HDInsight clusters (Hadoop, HBase, Kafka) using portal, CLI, PowerShell, ARM/REST, Data Factory, Marketplace, AMA, and runbooks

Troubleshooting

TopicURL
Address reliability issues on older HDInsight imageshttps://learn.microsoft.com/en-us/azure/hdinsight/cluster-reliability-issues
Fix component version validation errors in HDInsight ARM templateshttps://learn.microsoft.com/en-us/azure/hdinsight/component-version-validation-error-arm-templates
Troubleshoot Azure HDInsight cluster creation errorshttps://learn.microsoft.com/en-us/azure/hdinsight/create-cluster-error-dictionary
Troubleshoot authentication issues for secure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/domain-joined-authentication-issues
Run diagnostic script when HDInsight cluster creation fails with DomainNotFoundhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/sample-script
Fix DomainNotFound errors during HDInsight cluster creationhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/troubleshoot-domainnotfound
Fix Apache Ambari directory alerts in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-directory-alerts
Troubleshoot Ambari UI down hosts and services in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-down-hosts-services
Fix Apache Ambari UI 502 errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-fivezerotwo-error
Resolve Apache Ambari heartbeat issues in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-heartbeat-issues
Troubleshoot Apache Ambari Metrics Collector on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-metricservice-issues
Resolve Apache Ambari stale alerts in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-ambari-troubleshoot-stale-alerts
Fix local HDFS stuck in safe mode on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-hdfs-troubleshoot-safe-mode
Fix HDInsight cluster creation failureshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-cluster-creation-fails
Convert service principal certificates to base-64 for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-converting-service-principal-certificate
Resolve Data Lake storage file access issues in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-data-lake-files
Fix InvalidNetworkSecurityGroupSecurityRules for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-invalidnetworksecuritygroupsecurityrules-cluster-creation-fails
Resolve HDInsight node disk space exhaustionhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-out-disk-space
Fix Watchdog BUG soft lockup CPU errors in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-soft-lockup-cpu
Resolve node addition failures in HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-unable-add-nodes
Troubleshoot login failures to HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-troubleshoot-unable-log-in-cluster
Manage and troubleshoot disk space issues in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-disk-space
Resolve InvalidNetworkConfigurationErrorCode in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-invalidnetworkconfigurationerrorcode-cluster-creation-fails
Restore Key Vault access for encrypted HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-lost-key-vault-access
Fix port conflicts when starting HDInsight serviceshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-port-conflict
Fix 'account does not support http' storage errors in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-wasbs-storage-exception
Fix invalid BCFile errors when reading YARN logshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/troubleshoot-yarn-log-invalid-bcfile
Resolve BindException address-in-use on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-bindexception-address-use
Fix HBase hbck inconsistency errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-hbase-hbck-inconsistencies
Troubleshoot pegged CPU on HBase region servershttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-pegged-cpu-region-server
Resolve Apache Phoenix connectivity issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-phoenix-connectivity
Fix missing data in Phoenix views after HDP upgradehttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-phoenix-no-data
Fix HBase REST service not responding on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-rest-not-spending
Fix HBase Master startup failures on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-start-fails
Resolve storage exceptions after connection resethttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-storage-exception-reset
Resolve timeouts with hbase hbck on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-timeouts-hbase-hbck
Troubleshoot HBase region server issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/hbase-troubleshoot-unassigned-regions
Fix HBase TTL data retention issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/troubleshoot-data-retention-issues-expired-data
Troubleshoot HBase REST API issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/troubleshoot-rest-api
Access and interpret YARN application logs on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-access-yarn-app-logs-linux
Enable and collect Hadoop heap dumps on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-collect-debug-heap-dump-linux
Resolve Hive out-of-memory errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-hive-out-of-memory-error-oom
Lookup and resolve Hadoop stack trace errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-stack-trace-error-messages
Understand and resolve WebHCat errors on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-templeton-webhcat-debug-errors
Known issues and troubleshooting for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues
Fix Ambari access failures after certificate rotationhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-ambari-access-certificate-issue
Resolve Ambari user switch issues on HDInsight 5.1https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-ambari-users-cache
Recover HDInsight headnodes from /tmp disk usage leakhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-cluster-head-node-unresponsive
Mitigate conda version regression on HDInsight 5.1https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-conda-version-regression
Resolve Ranger startup failures on ESP HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-known-issues-ranger-cluster-create-failure
Diagnose slow or failing jobs on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-failed-cluster
HDInsight troubleshooting guide indexhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-guide
Troubleshoot HDFS issues in Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-hdfs
Common Hive issues and fixes on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-hive
Troubleshoot YARN issues in Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-troubleshoot-yarn
Restore error messages in Ambari Hive View on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-error-message-hive-view
Resolve Hive log disk space issues on HDInsight head nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-hive-logs-diskspace-full-headnodes
Fix Hive View inaccessibility due to Zookeeper issueshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-inaccessible-hive-view
Troubleshoot Hive join OutOfMemory GC overhead errorshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-outofmemory-overhead-exceeded
Resolve permission denied errors creating Hive tableshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-permission-error-create-table
Diagnose poor Hive LLAP query performance in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-query-performance
Fix slow reducers and data skew in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-slow-reducer
Troubleshoot Apache Tez application hangs in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-tez-hangs
Fix slow or failing Ambari Tez View in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-tez-view-slow
Fix Hive View query result timeout in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-view-time-out
Correct Hive JDBC URL in Zeppelin interpreter on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/interactive-query-troubleshoot-zookeeperhiveclientexception-hiveserver-configs
Resolve Ambari Hive View gateway timeout exceptionshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/troubleshoot-gateway-timeout
Troubleshoot Hive LLAP workload management issueshttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/troubleshoot-workload-management-issues
Resolve Kafka broker startup failures from full diskshttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/kafka-troubleshoot-full-disk
Fix HDInsight Kafka error: insufficient fault domainshttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/kafka-troubleshoot-insufficient-domains
Debug Spark apps using HDInsight History Server extensionshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-azure-spark-history-server
Debug Spark job failures with IntelliJ Azure Toolkithttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-failure-debug
Remotely debug Apache Spark apps on HDInsight via IntelliJhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-plugin-debug-jobs-remotely
Debug HDInsight Spark jobs with YARN and Spark UIshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-job-debugging
Known issues and workarounds for HDInsight Spark clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-known-issues
Troubleshoot Spark Streaming apps stopping after 24 dayshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-application-stops
Fix Jupyter 404 'Blocking Cross Origin API' on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-blocking-cross-origin
Resolve RequestBodyTooLarge errors in HDInsight Spark streaminghttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-event-log-requestbodytoolarge
Fix IllegalArgumentException in HDInsight Spark activitieshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-illegalargumentexception
Resolve InvalidClassException version mismatch in HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-job-fails-invalidclassexception
Fix NoClassDefFoundError for Spark-Kafka on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-job-fails-noclassdeffounderror
Improve slow Spark jobs with many Azure Storage fileshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-job-slowness-container
Resolve OutOfMemoryError in HDInsight Spark clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-outofmemory
Resolve RpcTimeoutException and 502 errors in Spark Thrift on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-rpctimeoutexception
Troubleshoot large result downloads via JDBC/ODBC and Thrift on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-troubleshoot-sparkexception-kryo-serialization-failed
Common Spark issues and fixes on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-troubleshoot-spark
Debug WASB file operations for HDInsight storage issueshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/troubleshoot-debug-wasb
Fix Jupyter Notebook creation issues on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/troubleshoot-jupyter-notebook-convert
Troubleshoot Apache Oozie workflows on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-oozie
Resolve Azure HDInsight resource creation capacity errorshttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-resource-creation-fails
Troubleshoot script action failures in Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-script-action
Work around Sqoop import/export failures on ESP HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/troubleshoot-sqoop

Best Practices

TopicURL
Use Azure Monitor logs for HDInsight availabilityhttps://learn.microsoft.com/en-us/azure/hdinsight/cluster-availability-monitor-logs
Apply cluster management best practices in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/cluster-management-best-practices
Apply general best practices for HDInsight Enterprise Securityhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/general-guidelines
Plan and execute data migration to Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-data-migration
Apply infrastructure best practices for Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-infrastructure
Implement storage best practices for HDInsight migrationshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-storage
Optimize HDInsight HBase with Accelerated Writeshttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-accelerated-writes
Apply HBase performance advisor recommendations on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-advisor
Tune Apache Phoenix performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-phoenix-performance
Tune Apache HBase performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/troubleshoot-hbase-performance-issues
Scale HiveServer2 on HDInsight using edge nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-install-hiveserver2
Monitor HDInsight availability with Apache Ambarihttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-cluster-availability
Create HDInsight clusters with secure transfer-enabled storage accountshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-with-secure-transfer-storage
Apply Linux-specific tips for Hadoop on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-linux-information
Optimize Apache Hive query performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-optimize-hive-query
Monitor and optimize HDInsight cluster performancehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-key-scenarios-to-monitor
Schedule and apply OS patches for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-os-patching
Apply pre-creation best practices for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-overview-before-you-start
Manually scale HDInsight clusters for workload patternshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-scaling-best-practices
Apply gateway best practices for Hive on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/gateway-best-practices
Operate LLAP schedule-based autoscale on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/llap-schedule-based-autoscale-best-practices
Configure Kafka partition replicas for high availabilityhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-high-availability
Mirror Kafka topics between HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-mirroring
Tune Kafka on HDInsight for optimal performancehttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-performance-tuning
Configure managed disks to scale Kafka on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-scalability
Migrate HDInsight Log Analytics data to new tableshttps://learn.microsoft.com/en-us/azure/hdinsight/log-analytics-migration
Use Azure Storage effectively as HDInsight default filesystemhttps://learn.microsoft.com/en-us/azure/hdinsight/overview-azure-storage
Leverage Data Lake Storage Gen2 with HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/overview-data-lake-storage-gen2
Optimize Apache Spark job performance on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-perf
Manage Python packages for Jupyter on HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-python-package-installation
Configure Spark Streaming on HDInsight for exactly-once processinghttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-streaming-exactly-once
Optimize Apache Spark cluster configuration on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-cluster-configuration
Optimize data processing operations for Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-data-processing
Optimize data storage for Apache Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-data-storage
Tune Apache Spark memory usage on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/optimize-memory-usage
Safely manage JAR dependencies on HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/safely-manage-jar-dependency
Apply Apache Spark performance guidelines on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/spark-best-practices
Use SparkCruise to optimize Spark queries on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/spark-cruise

Decision Making

TopicURL
Plan ETL at scale with Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-etl-at-scale
Assess benefits of migrating on-premises Hadoop to Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-motivation
Choose HDInsight tools for custom MapReduce jobshttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-run-custom-programs
Choose backup and replication options for HBasehttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-backup-replication
Migrate Apache HBase clusters to HDInsight 5.1https://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-hdinsight-5-1
Migrate HBase to HDInsight 5.1 with a new storage accounthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-hdinsight-5-1-new-storage-account
Migrate Apache HBase clusters to a newer HDInsight versionhttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-new-version
Migrate HBase to new HDInsight version and storage accounthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-migrate-new-version-new-storage-account
Plan HDInsight cluster capacity and performancehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-capacity-planning
Plan migrations for retiring Azure HDInsight componentshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-component-retirements-and-action-required
Compare storage services for Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-compare-storage-options
Upgrade Azure HDInsight to Apache Ranger 2.3.0https://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-ranger-5-1-migration
Assess and migrate from retired HDInsight versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-retired-versions
Select appropriate VM sizes for HDInsight nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-selecting-vm-size
Plan migration to newer Azure HDInsight cluster versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-upgrade-cluster
Size HDInsight Interactive Query (LLAP) clustershttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-llap-sizing-guide
Use Kafka MirrorMaker 2.0 for migration and replicationhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/kafka-mirrormaker-2-0-guide
Migrate Apache Kafka workloads from HDInsight 4.0 to 5.1https://learn.microsoft.com/en-us/azure/hdinsight/kafka/migrate-5-1-versions
Migrate Apache Kafka workloads from HDInsight 3.6 to 4.0https://learn.microsoft.com/en-us/azure/hdinsight/kafka/migrate-versions
Migrate HDInsight clusters from Basic to Standard Load Balancerhttps://learn.microsoft.com/en-us/azure/hdinsight/load-balancer-migration-guidelines
Migrate Ambari configurations from HDInsight 4.x to 5.xhttps://learn.microsoft.com/en-us/azure/hdinsight/migrate-ambari-recent-version-hdinsight

Architecture & Design Patterns

TopicURL
Use Apache Ambari for HDInsight cluster managementhttps://learn.microsoft.com/en-us/azure/hdinsight/apache-ambari-usage
Understand HDInsight architecture with Enterprise Security Packagehttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-architecture
Design architecture for migrating on-premises Hadoop to HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-architecture
Choose HDInsight business continuity architectureshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-business-continuity-architecture
Study HDInsight high availability and DR case designhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-high-availability-case-study
Understand HDInsight high availability architecture componentshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-high-availability-components
Share one Data Lake Storage account across multiple HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-multiple-clusters-data-lake-store
Operationalize HDInsight data pipelines with Ooziehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-operationalize-data-pipeline
Design scalable streaming architectures with HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-streaming-at-scale-overview
Azure HDInsight virtual network architecture and resourceshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-virtual-network-architecture
Design highly available Spark Streaming jobs on YARN in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-streaming-high-availability

Limits & Quotas

TopicURL
Plan HDInsight log sizes and retention policieshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-log-management
Use supported node configurations for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-supported-node-configuration
Use external metastores and understand HDInsight default metastore limitshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-use-external-metadata-stores
Request and manage HDInsight CPU core quota increaseshttps://learn.microsoft.com/en-us/azure/hdinsight/quota-increase-request

Security

TopicURL
Configure managed identity access to Blob storage for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/configure-azure-blob-storage
Configure double disk encryption for HDInsight data at resthttps://learn.microsoft.com/en-us/azure/hdinsight/disk-encryption
Configure HDInsight clusters with Entra Domain Services integrationhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-configure-using-azure-adds
Create and configure HDInsight Enterprise Security Package clustershttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-create-configure-enterprise-security-cluster
Manage users, roles, and security for HDInsight ESP clustershttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-manage
Configure Apache Ranger policies for HBase with ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-run-hbase
Configure Apache Ranger Hive policies in HDInsight ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-run-hive
Set Apache Ranger policies for Kafka with ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/apache-domain-joined-run-kafka
Implement encryption in transit for Azure HDInsight nodeshttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/encryption-in-transit
Plan enterprise security options for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/hdinsight-security-overview
Secure Oozie workflows with HDInsight Enterprise Securityhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/hdinsight-use-oozie-domain-joined-clusters
Set up Azure HDInsight ID Broker for OAuth and MFAhttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/identity-broker
Configure LDAP sync for Ranger and Ambari in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/ldap-sync
Manage SSH access for Entra domain accounts on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/domain-joined/ssh-domain-accounts
Configure Private Link for HDInsight Kafka REST Proxyhttps://learn.microsoft.com/en-us/azure/hdinsight/enable-private-link-on-kafka-rest-proxy-hdi-cluster
Implement Enterprise Security Package for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/enterprise-security-package
Apply security and DevOps best practices for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-on-premises-migration-best-practices-security-devops
Manage Ambari Views permissions on ESP-enabled HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-authorize-users-to-ambari
Implement non-interactive .NET auth for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-create-non-interactive-authentication-dotnet-applications
Use managed identities with Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-managed-identities
Allow HDInsight management IPs in NSGs and routeshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-management-ip-addresses
Migrate to granular role-based access for HDInsight cluster configurationshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-migrate-granular-access-cluster-configurations
Enable Azure Private Link for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-private-link
Restrict public connectivity for Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-restrict-public-connectivity
Safely rotate HDInsight storage account access keyshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-rotate-storage-keys
Use HDInsight NSG service tags for management traffichttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-service-tags
Restrict HDInsight Blob data access using SAS tokenshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-storage-sharedaccesssignature-permissions
Synchronize Microsoft Entra users to HDInsight ESP clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-sync-aad-users-to-cluster
Create and manage Entra ID-authenticated HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/create-clusters-with-entra
Configure ARM templates for Entra ID-enabled HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/manage-entra-id-enabled-azure-hdinsight-clusters-with-arm-templates
Manage Entra ID-enabled HDInsight clusters via REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/manage-entra-id-enabled-cluster-with-rest-api
Configure security options for Hive in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hdinsight-security-options-for-hive
Set up TLS and client auth for ESP Kafka clustershttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-esp-kafka-ssl-encryption-authentication
Configure TLS encryption and client auth for HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-ssl-encryption-authentication
Secure Spark–Kafka streaming integration on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/secure-spark-kafka-streaming-integration-scenario
Fetch OAuth tokens from HDInsight to access Azure serviceshttps://learn.microsoft.com/en-us/azure/hdinsight/msi-support-to-access-azure-services
Apply built-in Azure Policy definitions for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/policy-reference
Configure Ranger policies for Spark SQL in HDInsight ESPhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/ranger-policies-for-spark
Configure TLS versions for Azure HDInsight gatewayshttps://learn.microsoft.com/en-us/azure/hdinsight/transport-layer-security
Configure HDInsight managed identity for SQL authenticationhttps://learn.microsoft.com/en-us/azure/hdinsight/use-managed-identity-for-sql-database-authentication-in-azure-hdinsight

Configuration

TopicURL
Configure Ambari Web UI auto-logout timeout in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/ambari-web-ui-auto-logout
Connect HDInsight clusters to on-premises networks with VPN and DNShttps://learn.microsoft.com/en-us/azure/hdinsight/connect-on-premises-network
Configure HBase cluster replication in Azure VNetshttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-replication
Use HBCK2 to repair HBase on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/how-to-use-hbck2-tool
Check HDInsight 4.0 open-source component versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-40-component-versioning
Check HDInsight 5.x open-source component versionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-5x-component-versioning
Manage HDInsight clusters using Azure CLI commandshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-command-line
Automate HDInsight cluster management with PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-powershell
Configure and use empty edge nodes in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-use-edge-node
Configure HDInsight Autoscale policies and limitshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-autoscale-clusters
Tune HDInsight cluster settings using Ambarihttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-changing-configs-via-ambari
Review bundled open-source components and versions in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-component-versioning
Configure Azure HDInsight VS Code extension settingshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-config-for-vscode
Create and configure VNets, NSGs, and DNS for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-create-virtual-network
Configure custom Ambari database for HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-custom-ambari-db
Preload Apache Hive libraries during HDInsight cluster creationhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-add-hive-libraries
Add extra Azure Storage accounts to existing HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-add-storage
Programmatically customize HDInsight cluster configuration with bootstraphttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-customize-cluster-bootstrap
Customize HDInsight clusters using script actionshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-customize-cluster-linux
Connect to Azure HDInsight clusters using SSHhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-linux-use-ssh-unix
Enable Azure Monitor logs for HDInsight cluster operationshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-oms-log-analytics-tutorial
Reference ports for Hadoop services on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-port-settings-for-services
Configure and customize HDInsight clusters across toolshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-provision-linux-clusters
Develop script actions to configure Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-script-actions-linux
Configure SSH tunneling to access HDInsight web UIshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-linux-ambari-ssh-tunnel
Secure HDInsight outbound traffic using Azure Firewallhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-restrict-outbound-traffic
Custom-tune HDInsight Autoscale advanced settingshttps://learn.microsoft.com/en-us/azure/hdinsight/how-to-custom-configure-hdinsight-autoscale
Configure Apache Hive replication on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-replication
Migrate Hive default metastore to external SQL Database on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-default-metastore-export-import
Configure Hive LLAP workload management pools in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-workload-management
Use Hive LLAP workload management commands in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/workload-management-commands
Enable automatic topic creation in HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-auto-create-topics
Configure VPN and VNet access to HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-connect-vpn-gateway
Configure Azure Monitor logs for HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-log-analytics-operations-management
Configure cross-VNet connectivity to HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/connect-kafka-cluster-with-vm-in-different-vnet
Configure cross-VNet client connectivity to HDInsight Kafkahttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/connect-kafka-with-vnet
Configure monitoring and alerts for Azure HDInsight with Azure Monitorhttps://learn.microsoft.com/en-us/azure/hdinsight/monitor-hdinsight
Reference of monitoring data for Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/monitor-hdinsight-reference
Configure non-Azure Firewall network virtual appliances for HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/network-virtual-appliance
Optimize HBase performance with Ambari configurationhttps://learn.microsoft.com/en-us/azure/hdinsight/optimize-hbase-ambari
Optimize Hive performance via Ambari settings in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/optimize-hive-ambari
Tune Pig properties with Ambari on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/optimize-pig-ambari
Configure selective logging for AMA on HDInsight via script actionshttps://learn.microsoft.com/en-us/azure/hdinsight/selective-logging-analysis
Configure selective logging for HDInsight clusters with script actionshttps://learn.microsoft.com/en-us/azure/hdinsight/selective-logging-analysis-azure-logs
Configure service endpoint policies for HDInsight virtual networkshttps://learn.microsoft.com/en-us/azure/hdinsight/service-endpoint-policies
Set up PySpark interactive environment with VS Code HDInsight Toolshttps://learn.microsoft.com/en-us/azure/hdinsight/set-up-pyspark-interactive-environment
Configure HDInsight IO Cache to speed up Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-improve-performance-iocache
Use HDInsight Spark Jupyter kernels effectivelyhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-notebook-kernels
Configure Jupyter on HDInsight to use Maven packageshttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-jupyter-notebook-use-external-packages
Configure and scope Spark dependencies on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-manage-dependencies
Tune Spark resource configuration on HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-resource-manager
Configure Apache Spark settings on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-settings
Transfer files to Azure HDInsight using SCPhttps://learn.microsoft.com/en-us/azure/hdinsight/use-scp

Integrations & Coding Patterns

TopicURL
Configure Ambari email alerts with SendGrid in HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/apache-ambari-email
Stream from Kafka to Azure Cosmos DB with Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/apache-kafka-spark-structured-streaming-cosmosdb
Execute common HDInsight tasks with Azure CLI sampleshttps://learn.microsoft.com/en-us/azure/hdinsight/azure-cli-samples
Connect Excel to HDInsight Hadoop via Power Queryhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-connect-excel-power-query
Query HDInsight Hive from Java using JDBChttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-connect-hive-jdbc-driver
Visualize HDInsight Hive data in Power BI via ODBChttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-connect-hive-power-bi
Integrate C# UDFs with Hive and Pig on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-hive-pig-udf-dotnet-csharp
Call WebHCat REST API for Hive with Curlhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-curl
Submit Hive jobs using HDInsight .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-dotnet-sdk
Run HDInsight Hive queries with Azure PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-powershell
Use Visual Studio Data Lake tools for Hive on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-hive-visual-studio
Submit MapReduce jobs to HDInsight using Curl and WebHCathttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-curl
Submit MapReduce jobs to HDInsight with .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-dotnet-sdk
Run HDInsight MapReduce jobs using Azure PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-powershell
Run MapReduce jobs on HDInsight via SSHhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-mapreduce-ssh
Submit Sqoop jobs to HDInsight via Curl and WebHCathttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-curl
Run Sqoop jobs on HDInsight using .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-dotnet-sdk
Use Sqoop on HDInsight Linux headnodes for SQL integrationhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-mac-linux
Submit Sqoop jobs to HDInsight with PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-use-sqoop-powershell
Use Visual Studio Data Lake Tools with HDInsight Hadoophttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/apache-hadoop-visual-studio-tools-get-started
Configure Beeline connections to HDInsight HiveServer2https://learn.microsoft.com/en-us/azure/hdinsight/hadoop/connect-install-beeline
Run Sqoop jobs between HDInsight and Azure SQL Databasehttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/hdinsight-use-sqoop
Use Python UDFs with Hive and Pig on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/python-udf-hdinsight
Submit Hadoop jobs to HDInsight via .NET, curl, and PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hadoop/submit-apache-hadoop-jobs-programmatically
Build and deploy a Java HBase client with Mavenhttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-build-java-maven-linux
Run HBase SQL queries with Phoenix and Zeppelinhttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-phoenix-zeppelin
Use the HBase .NET SDK with HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-rest-sdk
Use Phoenix Query Server REST SDK on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-using-phoenix-query-server-rest-sdk
Use HDInsight .NET SDK for cluster management taskshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-administer-use-dotnet-sdk
Use Spark DStreams with Kafka on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apache-spark-with-kafka
Install custom Hadoop applications on Azure HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-install-custom-applications
Use Spark & Hive Tools for VS Code with HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-for-vscode
Use the Azure HDInsight SDK for Go with Hadoop clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-go-sdk-overview
Install and access Hue on Azure HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-hue-linux
Manage HDInsight Hadoop clusters using Ambari REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-manage-ambari-rest-api
Run .NET MapReduce jobs on Linux-based HDInsight using Monohttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-migrate-dotnet-to-linux
Define and run Oozie workflows on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-use-oozie-linux-mac
Use Spark HBase Connector between HDInsight Spark and HBasehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-using-spark-query-hbase
Manage Entra-enabled HDInsight clusters using .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/manage-hadoop-cluster-dot-net-sdk
Run Hive queries on Entra-enabled HDInsight using PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-apache-hive-queries-using-powershell-on-entra-enabled-hdinsight-cluster
Run Hive queries on HDInsight using the REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-apache-hive-queries-using-rest-api
Run Hive queries on Entra-enabled HDInsight with .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-hive-queries-using-dot-net-sdk
Submit MapReduce jobs to Entra-enabled HDInsight using .NET SDKhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-map-reduce-jobs-dot-net-sdk
Run MapReduce jobs on Entra-enabled HDInsight with PowerShellhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-map-reduce-jobs-entra-id-enabled-using-powershell
Run MapReduce jobs on Entra-enabled HDInsight via REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-map-reduce-rest-jobs
Submit Spark jobs to Entra-enabled HDInsight via Livy REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-with-entra-authentication/run-spark-jobs-using-rest-api
Use Power BI DirectQuery with HDInsight Hivehttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hadoop-connect-hive-power-bi-directquery
Integrate Spark and Hive using Hive Warehouse Connectorhttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-warehouse-connector
Run Spark operations via Hive Warehouse Connectorhttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-warehouse-connector-operations
Use Hive Warehouse Connector from Zeppelin via Livyhttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/apache-hive-warehouse-connector-zeppelin
Use Hive Warehouse Connector APIs on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-warehouse-connector-apis
Use Hive Warehouse Connector 2.x APIs on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/interactive-query/hive-warehouse-connector-v2-apis
Integrate HDInsight Kafka with Azure IoT Hubhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-connector-iot-hub
Use Kafka REST Proxy with HDInsight clustershttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/rest-proxy
Use Kafka REST Proxy on HDInsight via Azure CLIhttps://learn.microsoft.com/en-us/azure/hdinsight/kafka/tutorial-cli-rest-proxy
Connect Synapse Spark pools to HDInsight external Hive Metastorehttps://learn.microsoft.com/en-us/azure/hdinsight/share-hive-metastore-with-synapse
Analyze Application Insights telemetry with Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-analyze-application-insight-logs
Connect HDInsight Spark to Azure SQL Databasehttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-connect-to-sql-database
Create and submit Scala Spark apps from Eclipse to HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-eclipse-tool-plugin
Develop and submit Spark apps with IntelliJ Azure Toolkithttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-intellij-tool-plugin
Submit remote Spark jobs to HDInsight using Livy REST APIhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-livy-rest-interface
Integrate Microsoft Cognitive Toolkit with Spark on HDInsighthttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-microsoft-cognitive-toolkit
Run Azure Machine Learning AutoML on HDInsight Sparkhttps://learn.microsoft.com/en-us/azure/hdinsight/spark/apache-spark-run-machine-learning-automl
Run Apache Pig workloads on HDInsight Hadoophttps://learn.microsoft.com/en-us/azure/hdinsight/use-pig

Deployment

TopicURL
Migrate HDInsight monitoring to Azure Monitor Agent (AMA)https://learn.microsoft.com/en-us/azure/hdinsight/azure-monitor-agent
Deploy HBase clusters in Azure Virtual Networkshttps://learn.microsoft.com/en-us/azure/hdinsight/hbase/apache-hbase-provision-vnet
Publish Azure HDInsight applications to Azure Marketplacehttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-apps-publish-applications
Operationalize on-demand HDInsight Hadoop clusters with Data Factoryhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-adf
Deploy HDInsight clusters using ARM templateshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-arm-templates
Provision HDInsight 4.0 clusters using Azure CLIhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-azure-cli
Create Linux HDInsight clusters using PowerShell scriptshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-azure-powershell
Create HDInsight clusters via Azure REST and ARM templateshttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-curl-rest
Create Linux-based HDInsight clusters via Azure portalhttps://learn.microsoft.com/en-us/azure/hdinsight/hdinsight-hadoop-create-linux-clusters-portal
Migrate HDInsight Kafka clusters using MirrorMaker 2https://learn.microsoft.com/en-us/azure/hdinsight/kafka/apache-kafka-mirror-maker-2
Provision and delete HDInsight clusters via Automation runbookshttps://learn.microsoft.com/en-us/azure/hdinsight/manage-clusters-runbooks

Capabilities

skillsource-microsoftdocsskill-azure-hdinsighttopic-agenttopic-agent-skillstopic-agentic-skillstopic-agentskilltopic-ai-agentstopic-ai-codingtopic-azuretopic-azure-functionstopic-azure-kubernetes-servicetopic-azure-openaitopic-azure-sql-databasetopic-azure-storage

Install

Installnpx skills add MicrosoftDocs/Agent-Skills
Transportskills-sh
Protocolskill

Quality

0.70/ 1.00

deterministic score 0.70 from registry signals: · indexed on github topic:agent-skills · 497 github stars · SKILL.md body (55,077 chars)

Provenance

Indexed fromgithub
Enriched2026-04-22 06:53:32Z · deterministic:skill-github:v1 · v1
First seen2026-04-18
Last seen2026-04-22

Agent access