From Software Search to Code Generation: The Agentic Coding Revolution

2025-04-08

User Story: RSS Feed Clean-up Journey

Over the past 15 years, I’ve accumulated a substantial collection of RSS feeds, numbering over 200 subscriptions. While RSS usage has dramatically declined in recent years, some of these feeds remain part of my daily reading routine. However, the collection has become cluttered:

  • Many feeds have become completely inaccessible
  • Some bloggers have stopped updating their sites
  • Certain feeds are still active but rarely updated

The Challenge:

  • Manual verification of each feed would be tedious and time-consuming
  • My RSS reader (News Explorer) lacks feed cleanup functionality
  • Alternative solutions like Inoreader require paid subscriptions
  • The task remained pending due to lack of efficient solutions

The Agentic Coding Solution:
What was previously a daunting task transformed into a manageable project:

  • Total time from requirement writing to completion: ~2 hours
  • Automated validation of all feeds
  • Generated comprehensive statistics and visualizations
  • Successfully categorized feeds into active, inactive, and dead
  • Pleasant and efficient development experience

This experience perfectly illustrates how agentic coding can turn a long-postponed task into an achievable solution through clear requirement description and AI-assisted development.

The Traditional Approach

Traditionally, when faced with a specific requirement like validating and analyzing OPML feed subscriptions, the typical workflow would be:

  1. Search for existing software that might solve the problem
  2. Evaluate multiple tools and their features
  3. Choose the closest match, often compromising on exact requirements
  4. Learn how to use the chosen software
  5. Deal with limitations and missing features

This process is time-consuming and often results in settling for a solution that doesn’t perfectly match our needs.

The Agentic Coding Paradigm

With agentic coding, the approach transforms dramatically:

  1. Clearly describe your requirements in natural language
  2. Let AI understand and break down the problem
  3. Generate custom code that exactly matches your needs
  4. Iterate and refine the solution through conversation

Real-World Example: OPML Feed Validator

This project demonstrates the power of agentic coding. Instead of searching for an existing OPML feed validator:

  • We described our need for a tool that could:

    • Validate RSS feeds in an OPML file
    • Check feed accessibility
    • Analyze update frequencies
    • Generate meaningful statistics
    • Visualize the results
  • The AI agent:

    • Designed the system architecture
    • Implemented the required functionality
    • Created visualization components
    • Generated comprehensive documentation
    • All while following best practices and proper error handling

Benefits of Agentic Coding

  1. Perfect Fit: Solutions are tailored exactly to your requirements
  2. Rapid Development: No need to spend time searching and evaluating existing tools
  3. Full Control: Complete access to the source code for modifications
  4. Learning Opportunity: Understanding how the solution works through generated code
  5. Cost-Effective: No need to purchase or subscribe to multiple tools
  6. Maintenance Freedom: Ability to modify and extend the solution as needs evolve

Future Implications

This shift from “finding” to “generating” solutions represents a fundamental change in how we approach software development. As AI continues to evolve:

  • Development will become more requirement-driven than tool-driven
  • Custom solutions will become as accessible as off-the-shelf software
  • The focus will shift from “what exists” to “what’s possible”

Agentic coding empowers developers and users alike to create exactly what they need, breaking free from the limitations of existing software solutions.

Lessons Learned and Experience

1. The Importance of Clear Requirements

Product thinking and clear requirements are crucial for successful AI-assisted development:

  • Clear Vision Leads to Better Code: When requirements are well-defined and specific about how the tool should behave, the AI generates higher quality code
  • Product Mindset: Requirement providers need to have a clear understanding of:
    • Desired user interactions
    • Expected outputs and their formats
    • Error handling scenarios
    • Performance expectations
  • Iterative Refinement: Unclear requirements often lead to multiple iterations and code quality issues

2. Technology Stack Selection Matters

The choice of programming languages and libraries significantly impacts AI-assisted development success:

  • Language Popularity Impact:

    • More widely used languages (like Python) often result in better AI-generated code
    • Popular languages have more training data and real-world examples
    • In this project, while we chose TypeScript with Deno for learning purposes, Python might have been an easier choice
  • Library Selection Strategy:

    • Popular, widely-used libraries lead to better AI comprehension and implementation
    • Example from this project:
      • Initial attempt: Using less common deno_chart library resulted in multiple errors
      • Successful pivot: Switching to standard SVG generation led to immediate success
    • Lesson: Prefer mainstream libraries over niche ones when working with AI

Best Practices for AI-Assisted Development

  1. Requirements Phase:

    • Invest time in detailed requirement documentation
    • Include specific examples of desired behavior
    • Define clear success criteria
  2. Technology Selection:

    • Consider language popularity and ecosystem maturity
    • Choose widely-adopted libraries when possible
    • Balance learning goals with development efficiency
  3. Development Process:

    • Start with core functionality using proven technologies
    • Experiment with newer technologies only after basic features are stable
    • Be prepared to pivot when encountering AI limitations with specific technologies

This project serves as a practical example of these lessons, demonstrating both the potential and limitations of AI-assisted development while highlighting the importance of making informed technology choices.

The project can be found here.

Read More

Getting Started with Deno: A Modern Twist on JavaScript Runtimes

2025-03-28

If you’ve been in the JavaScript world for a while, you’ve probably heard of Deno—the runtime that’s been making waves as a “better Node.js.” Built by Ryan Dahl (the original creator of Node.js), Deno takes a fresh approach to running JavaScript and TypeScript, aiming to fix some of Node’s pain points while embracing modern standards. In this post, I’ll walk you through what Deno is, how it works, and how it stacks up against Node.js—especially based on my recent dive into it while tinkering with a Supabase integration.

What is Deno?

Deno is a secure, modern runtime for JavaScript and TypeScript, launched in 2020. It’s designed to be simple, safe, and developer-friendly, with built-in support for TypeScript, ES Modules, and a standard library—no extra tools required. Think of it as Node.js reimagined with lessons learned from the past decade.

Here’s a quick taste of Deno in action:

1
2
3
4
// main.ts
import { serve } from "https://deno.land/std@0.224.0/http/server.ts";

serve((req) => new Response("Hello, Deno!"), { port: 8000 });

Run it with:

1
deno run --allow-net main.ts

Boom—a web server in three lines, no npm install or node_modules in sight.

Key Features of Deno

1. TypeScript Out of the Box

Deno runs TypeScript natively—no tsconfig.json or tsc needed. Write your .ts file, run it with deno run, and Deno compiles it in memory. Compare that to Node.js, where you’d need typescript installed and a build step (or ts-node for a quicker dev loop).

2. URL-Based Imports

Forget node_modules. Deno fetches dependencies from URLs:

1
import { load } from "https://deno.land/std@0.224.0/dotenv/mod.ts";

It caches them globally (more on that later) and skips the package manager entirely.

3. Security by Default

Deno won’t let your script touch the network, filesystem, or environment unless you explicitly allow it:

1
deno run --allow-env --allow-read main.ts

This is a stark contrast to Node.js, where scripts have free rein unless you sandbox them yourself.

4. Centralized Dependency Cache

Deno stores all dependencies in a single global cache (e.g., ~/.cache/deno/deps on Unix). Run deno info to see where:

1
2
deno info
# Outputs: Modules cache: "/home/username/.cache/deno/deps"

No per-project node_modules bloating your disk.

5. Standard Library

Deno ships with a curated std library (e.g., https://deno.land/std@0.224.0), covering HTTP servers, file I/O, and even a dotenv module for .env files—stuff you’d normally grab from npm in Node.js.

Deno vs. Node.js: A Head-to-Head Comparison

I recently played with Deno to connect to Supabase, and it highlighted some big differences from Node.js. Here’s how they stack up:

Dependency Management

  • Node.js: Uses npm and package.json to install dependencies into a local node_modules folder per project. Cloning a repo? Run npm install every time.
    1
    npm install @supabase/supabase-js
  • Deno: Imports modules via URLs, cached globally at ~/.cache/deno/deps. Clone a Deno repo, and you’re ready to run—no install step.
    1
    import { createClient } from "https://esm.sh/@supabase/supabase-js@2.49.3";
  • Winner?: Deno for simplicity, Node.js for isolation (different projects can use different versions of the same module without URL juggling).

TypeScript Support

  • Node.js: Requires setup—install typescript, configure tsconfig.json, and compile to JavaScript (or use ts-node). It’s mature but clunky.
  • Deno: TypeScript runs natively. No config, no build step. Write .ts and go.
  • Winner: Deno, hands down, unless you’re stuck on a legacy Node.js workflow.

Configuration Files

  • Node.js: Relies on package.json for dependencies and scripts, often paired with tsconfig.json for TypeScript.
  • Deno: Optional deno.json for imports and settings, but not required. My Supabase script didn’t need one—just a .env file and std/dotenv.
  • Winner: Deno for minimalism.

Security

  • Node.js: Open by default. Your script can read files or hit the network without warning.
  • Deno: Locked down. Want to read .env? Add --allow-read. Network access? --allow-net. It forced me to think about permissions when connecting to Supabase.
  • Winner: Deno for safety.

Ecosystem

  • Node.js: Massive npm ecosystem—hundreds of thousands of packages. Whatever you need, it’s there.
  • Deno: Smaller but growing ecosystem via deno.land/x and CDNs like esm.sh. It worked fine for Supabase, but niche libraries might be missing.
  • Winner: Node.js for sheer volume.

Learning Curve

  • Node.js: Familiar to most JavaScript devs, but the setup (npm, TypeScript, etc.) can overwhelm beginners.
  • Deno: Fresh approach, but URL imports and permissions might feel alien if you’re Node.js-native.
  • Winner: Tie—depends on your background.

A Real-World Example: Supabase with Deno

Here’s how I set up a Supabase client in Deno:

1
2
3
4
5
6
7
8
9
10
11
12
13
import { createClient } from "https://esm.sh/@supabase/supabase-js@2.49.3";
import { load } from "https://deno.land/std@0.224.0/dotenv/mod.ts";

const env = await load({ envPath: "./.env" });
const supabaseUrl = env["SUPABASE_URL"] || Deno.env.get("SUPABASE_URL");
const supabaseKey = env["SUPABASE_ANON_KEY"] || Deno.env.get("SUPABASE_ANON_KEY");

if (!supabaseUrl || !supabaseKey) {
throw new Error("Missing SUPABASE_URL or SUPABASE_ANON_KEY");
}

const supabase = createClient(supabaseUrl, supabaseKey);
console.log("Supabase client created! Attributes:", Object.keys(supabase));

Run it:

1
deno run --allow-env --allow-read main.ts
  • .env file: SUPABASE_URL and SUPABASE_ANON_KEY (grabbed from Supabase’s dashboard—not my database password!).
  • VS Code linting needed the Deno extension and a deno cache main.ts to quiet TypeScript errors.

In Node.js, I’d have installed @supabase/supabase-js via npm, set up a dotenv package, and skipped the permissions flags. Deno’s way felt leaner but required tweaking for editor support.

Should You Use Deno?

  • Use Deno if:
    • You love TypeScript and hate build steps.
    • You want a secure, minimal setup for small projects or experiments.
    • You’re intrigued by a modern take on JavaScript runtimes.
  • Stick with Node.js if:
    • You need the npm ecosystem’s depth.
    • You’re working on a legacy project or with a team entrenched in Node.
    • You prefer per-project dependency isolation.

Wrapping Up

Deno’s not here to kill Node.js—it’s a different flavor of the same JavaScript pie. After messing with it for Supabase, I’m hooked on its simplicity and TypeScript support, but I’d still reach for Node.js on bigger, ecosystem-heavy projects. Try it yourself—spin up a Deno script, check your cache with deno info, and see if it clicks for you.

What’s your take? Node.js veteran or Deno newbie? Let me know in the comments!


This post covers Deno’s core concepts, contrasts it with Node.js, and ties in our Supabase example for a practical angle. Feel free to tweak the tone or add more details if you’re aiming for a specific audience! Want me to adjust anything?

Read More

How to Generate a VSIX File from VS Code Extension Source Code

2024-12-12

As I’ve been using Windsurf as my primary code editor, I encountered a situation where the vs-picgo extension wasn’t available in the Windsurf marketplace. This necessitated the need to manually package the extension from its source code. This guide documents the process of generating a VSIX file for VS Code extensions, which can then be installed manually in compatible editors like Windsurf.

In this guide, I’ll walk you through the process of generating a VSIX file from a VS Code extension’s source code. We’ll use the popular vs-picgo extension as an example.

Prerequisites

Before we begin, make sure you have the following installed:

  • Node.js (version 12 or higher)
  • npm (comes with Node.js)

Step 1: Install Required Tools

First, we need to install two essential tools:

  • yarn: A package manager that will handle our dependencies
  • vsce: The VS Code Extension Manager tool that creates VSIX packages
1
2
3
4
5
# Install Yarn globally
sudo npm install -g yarn

# Install vsce globally
sudo npm install -g @vscode/vsce

Step 2: Prepare the Project

  1. Clone or download the extension source code:

    1
    2
    git clone https://github.com/PicGo/vs-picgo.git
    cd vs-picgo
  2. Install project dependencies:

    1
    yarn install

This command will:

  • Read the package.json file
  • Install all required dependencies
  • Create or update the yarn.lock file

Note: The yarn.lock file is important! Don’t delete it as it ensures consistent installations across different environments.

Step 3: Build the Extension

Build the extension using the production build command:

1
yarn build:prod

This command typically:

  • Cleans the previous build output
  • Compiles TypeScript/JavaScript files
  • Bundles all necessary assets
  • Creates the dist directory with the compiled code

In vs-picgo’s case, the build process:

  1. Uses esbuild for fast bundling
  2. Creates both extension and webview bundles
  3. Generates source maps (disabled in production)
  4. Optimizes the code for production use

Step 4: Package the Extension

Finally, create the VSIX file:

1
vsce package

This command:

  1. Runs any pre-publish scripts defined in package.json
  2. Validates the extension manifest
  3. Packages all required files into a VSIX file
  4. Names the file based on the extension’s name and version (e.g., vs-picgo-2.1.6.vsix)

The resulting VSIX file will contain:

  • Compiled JavaScript files
  • Assets (images, CSS, etc.)
  • Extension manifest
  • Documentation files
  • License information

What’s Inside the VSIX?

The VSIX file is essentially a ZIP archive with a specific structure. For vs-picgo, it includes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
vs-picgo-2.1.6.vsix
├─ [Content_Types].xml
├─ extension.vsixmanifest
└─ extension/
├─ LICENSE.txt
├─ changelog.md
├─ logo.png
├─ package.json
├─ package.nls.json
├─ readme.md
└─ dist/
├─ extension.js
└─ webview/
├─ index.css
└─ index.js

Installing the Extension

You can install the generated VSIX file in VS Code or any compatible editor by:

  1. Opening VS Code/Windsurf/Cursor …
  2. Going to the Extensions view
  3. Clicking the “…” menu (More Actions)
  4. Selecting “Install from VSIX…”
  5. Choosing your generated VSIX file

Troubleshooting

If you encounter any issues:

  1. Missing dist directory error:

    • This is normal on first build
    • The build process will create it automatically
  2. Dependency errors:

    • Run yarn install again
    • Check if you’re using the correct Node.js version
  3. VSIX packaging fails:

    • Verify your package.json is valid
    • Ensure all required files are present
    • Check the extension manifest for errors

Conclusion

Building a VS Code extension VSIX file is straightforward once you have the right tools installed. The process mainly involves installing dependencies, building the source code, and packaging everything into a VSIX file.

Remember to keep your yarn.lock file and always build in production mode before packaging to ensure the best performance and smallest file size for your users.

Happy extension building! 🚀

Read More

What is DBOS and What Should We Expect

2024-11-14

Introduction

The computing world is witnessing a paradigm shift in how we think about operating systems. A team of researchers has proposed DBOS (Database-Oriented Operating System), a radical reimagining of operating system architecture that puts data management at its core. But what exactly is DBOS, and why should we care?

What is DBOS?

DBOS is a novel operating system architecture that treats data management as its primary concern rather than traditional OS functions like process management and I/O. The key insight behind DBOS is that modern applications are increasingly data-centric, yet our operating systems still follow designs from the 1970s that prioritize computation over data management.

Instead of treating databases as just another application, DBOS makes database technology the fundamental building block of the operating system itself. This means core OS functions like process scheduling, resource management, and system monitoring are implemented using database principles and technologies.

Who is behind DBOS?

DBOS is a collaborative research project involving almost twenty researchers across multiple institutions including MIT, Stanford, UW-Madison, Google, VMware, and other organizations. The project is notably led by database pioneer Michael Stonebraker, who is an ACM Turing Award winner (2014) and Professor Emeritus at UC Berkeley, currently affiliated with MIT.

Key institutions and researchers involved include:

  • MIT: Michael Stonebraker, Michael Cafarella, Çağatay Demiralp, and others
  • Stanford: Matei Zaharia, Christos Kozyrakis, and others
  • UW-Madison: Xiangyao Yu
  • Industry partners: Researchers from Google, VMware, and other organizations

The people behind DBOS can be found at DBOS Project.

Ultimate Goal

The ultimate goal of DBOS is to create an operating system that is data-centric and data-driven, the OS is on top of a DBMS, not like today’s DBMS on top of the OS.
ALL system data should reside in the DBMS.

  • Replace the “everything is a file” mantra with “everything is a table”
  • All system state and metadata stored in relational tables
  • All changes to OS state should be through database transactions
  • DBMS provides All functions that a DBMS can do, for example, files are blobs and tables in the DBMS.
  • SQL-based interface for both application and system data access
  • To achieve very high performance, the DBMS must leverage sophisticated caching and parallelization strategies and compile repetitive queries into machine code.

Benefits

  1. Strong Security and Privacy
  • Native GDPR compliance through data-centric design
  • Attribute-based access control (ABAC)
  • Complete audit trails and data lineage
  • Privacy by design through unified data management
  • Fine-grained access control at the data level
  • Enhanced monitoring and threat detection
  • Simplified compliance with regulatory requirements
  • Built-in data encryption and protection mechanisms
  1. Enhanced Performance and Efficiency
  • Optimized resource allocation through database-driven scheduling
  • Reduced data movement and copying
  • Better cache utilization through database techniques
  • Intelligent workload management
  • Advanced query optimization for system operations
  • Improved resource utilization through data-aware decisions
  • Reduced system overhead through unified architecture
  • Better support for modern hardware architectures
  1. Improved Observability and Management
  • Comprehensive system-wide monitoring
  • Real-time analytics on system performance
  • Easy troubleshooting through SQL queries
  • Better capacity planning capabilities
  • Unified logging and debugging interface
  • Historical analysis of system behavior
  • Predictive maintenance capabilities
  • Simplified system administration
  1. Advanced Application Support
  • Native support for distributed applications
  • Better handling of microservices architecture
  • Simplified state management
  • Enhanced support for modern cloud applications
  • Built-in support for data-intensive applications
  • Improved consistency guarantees
  • Better transaction management
  • Simplified development of distributed systems

Technical Implementation

DBOS proposes implementing core OS functions using database principles:

  • Process Management: Processes and their states managed as database tables
  • Resource Scheduling: SQL queries and ML for intelligent scheduling decisions
  • System Monitoring: Metrics collection and analysis through database queries
  • Security: Access control and auditing via database mechanisms
  • Storage: File system metadata stored in relational tables
  • Networking: Network state and routing managed through database abstractions

What Should We Expect?

Near-term Impact

  • Proof of Concept: The researchers are working on demonstrating DBOS’s capabilities through specific use cases like log processing and accelerator management.
  • Performance Improvements: Early implementations might show significant improvements in data-intensive workloads.
  • Development Tools: New tools and frameworks that leverage DBOS’s database-centric approach.

Long-term Possibilities

Cloud Native Integration: DBOS could become particularly relevant for cloud computing environments where data management is crucial.

  • AI/ML Operations: Better support for AI and machine learning workloads through intelligent resource management.
  • Privacy-First Computing: A new standard for building privacy-compliant systems from the ground up.

Challenges Ahead

Several technical and practical challenges need to be addressed:

  1. Performance
  • Minimizing database overhead for system operations
  • Optimizing query performance for real-time OS operations
  • Efficient handling of high-frequency system events
  1. Compatibility
  • Supporting existing applications and system calls
  • Maintaining POSIX compliance where needed
  • Migration path for legacy systems
  1. Distributed Systems
  • Maintaining consistency across distributed nodes
  • Handling network partitions and failures
  • Scaling database operations across clusters
  1. Adoption
  • Convincing stakeholders to adopt radical architectural changes
  • Training developers in the new paradigm
  • Building an ecosystem of compatible tools and applications

Conclusion

DBOS represents a bold reimagining of operating system design for the data-centric world. While it’s still in its early stages, the potential benefits for security, privacy, and developer productivity make it an exciting project to watch. As data continues to grow in importance, DBOS’s approach might prove to be prescient.
The success of DBOS will largely depend on how well it can demonstrate its advantages in real-world scenarios and whether it can overcome the inherent challenges of introducing such a fundamental change to system architecture. For developers, system administrators, and anyone interested in the future of computing, DBOS is definitely worth keeping an eye on.
Whether DBOS becomes the next evolution in operating systems or remains an interesting academic exercise, its ideas about putting data management at the center of system design will likely influence future OS development.

References

  1. M. Stonebraker et al., “DBOS: A DBMS-oriented Operating System,” Proceedings of the VLDB Endowment, Vol. 15, No. 1, 2021.
    https://www.vldb.org/pvldb/vol15/p21-stonebraker.pdf

  2. DBOS Project Official Website, MIT CSAIL and Stanford University.
    https://dbos-project.github.io/

  3. X. Yu et al., “A Case for Building Operating Systems with Database Technology,” Proceedings of the 12th Conference on Innovative Data Systems Research (CIDR ‘22), 2022.
    https://www.cidrdb.org/cidr2022/papers/p82-yu.pdf

  4. DBOS Research Group Publications.
    https://dbos-project.github.io/papers/

These references provide detailed technical information about DBOS’s architecture, implementation, and potential impact on the future of operating systems. The papers discuss various aspects from system design to performance evaluation, security considerations, and practical applications.

Read More

2024-10-05

The best practices for using partition table in PostgreSQL

What is partition table?

Partition table is a table that is divided into smaller parts. Each part is called a partition.

Why use partition table?

  1. Performance: Partition table can improve the performance of the table. Because each partition can be stored in a separate file, the query can be executed faster.
  2. Maintenance: Partition table can make the maintenance of the table easier. Because each partition can be maintained separately, the maintenance of the table can be done faster.

How to create a partition table?

  1. Create a partition table:
1
2
3
4
5
6
CREATE TABLE measurement (
city_id int not null,
logdate date not null,
peaktemp int,
unitsales int
) PARTITION BY RANGE (logdate);
  1. Create a partition table for each partition:
1
2
3
4
5
CREATE TABLE measurement_y2006m02 PARTITION OF measurement
FOR VALUES FROM ('2006-02-01') TO ('2006-03-01');
...
CREATE TABLE measurement_y2007m11 PARTITION OF measurement
FOR VALUES FROM ('2007-11-01') TO ('2007-12-01');

Partition table types

There are 3 basic types of partition table in PostgreSQL:

  1. List Partition: The table is partitioned by explicitly listing which key value(s) appear in each partition.
  2. Range Partition: The table is partitioned into “ranges” defined by a key column or set of columns, with no overlap between the ranges of values assigned to different partitions.
  3. Hash Partition: The table is partitioned by specifying a modulus and a remainder for each partition.

Combine these 3 basic types of partition table, we can create more complex partition table. This is called Composite Partitioning.

There are 4 types of composite partitioning:

  1. List-Range Partition: The partition is divided into several ranges.
    The step is: create table A partition by list, then create table A1 partition of A partition by range, then create table A1-1 partition of A1
  2. Range-List Partition: The partition is divided into several ranges.
    The step is: create table A partition by range, then create table B of table A but partition by list.
  3. Range-Hash Partition: The partition is divided into several ranges.
  4. Hash-Range Partition: The partition is divided into several ranges.

Data dictionary about partition table

  1. pg_partitioned_table: This table stores the information about the partitioned table.
  2. pg_partition_rule: This table stores the information about the partition rule.
  3. pg_partition_column: This table stores the information about the partition column.
  4. pg_partition_index: This table stores the information about the partition index.
  5. pg_partition_constraint: This table stores the information about the partition constraint.
  6. pg_partition_constraint_column: This table stores the information about the partition constraint column.
  7. pg_partition_constraint_index: This table stores the information about the partition constraint index.

Read More

Some Misconceptions About Database Performance - Application Concurrent Requests, Database Connections, and Database Connection Pools

2024-08-21

Let’s start with a chart from a database stress test result.

1724216413686

In the chart above, the X-axis represents the number of concurrent connections, and the Y-axis represents the TPS (Transactions Per Second) provided by the database. The stress testing tool used is pgbench, and the database is a single instance of PostgreSQL 16. TPS_PGDirect represents the TPS when pgbench directly sends the load to PostgreSQL, while TPS_PGBouncer represents the TPS when pgbench sends the load to the PGBouncer database connection pool.

We can clearly see that when the number of connections exceeds 256, the TPS begins to gradually decline when the load is sent directly to the database. When the number of connections eventually rises to over 8000, the database is almost at a standstill. However, when using PGBouncer, the TPS remains stable and excellent, with the database consistently providing high TPS regardless of the number of connections.

Many people might conclude from this chart that if the concurrent connections increase beyond a certain threshold, PostgreSQL becomes inadequate, and PGBouncer must be used to improve performance.

Some users may attempt to conduct various hardware environment combination tests to derive a reference threshold number to guide at what pressure PostgreSQL can cope, and once it exceeds a certain pressure, PGBouncer should be enabled.

So, is this conclusion correct? And does testing out a threshold number have real guiding significance?

Understanding Concurrent Performance

To determine how many concurrent business requests a database can handle, we should focus on the Y-axis from the test chart above, rather than the X-axis. The Y-axis TPS represents the number of transactions the database can process per second, while the X-axis connection count is merely the database connection number we configured ourselves (which can be simply understood as the max_connections parameter value). We can observe that at 256 connections, the direct database stress test with pgbench allows the database to provide over 50,000 TPS, which is actually almost the highest number in the entire stress test (WITHOUT PGBouncer). Subsequent tests using PGBouncer, this number was not significantly exceeded. Therefore, PGBouncer is not a weapon to improve database performance.

What is performance? Performance means: for a determined business transaction pressure (how much work a business transaction needs to handle, such as how much CPU, memory, and IO it consumes), the more transactions the database can execute, the higher the performance. In the chart above, this is represented by the Y-axis TPS (Transactions Per Second).

For example, suppose a user request (Application Request) requires executing 5 transactions from start to finish, and assume these transactions consume similar amounts of time (even if there are differences, we can always get the average transaction runtime in a system). In this case, 50,000 TPS means the database can handle 10,000 user requests per second, which can be simply understood as handling 10,000 concurrent users. If TPS rises to 100,000, the database can handle 20,000 concurrent users; if TPS drops to 10,000, it can only handle 2,000 concurrent users. This is a simple calculation logic, and in this description, we did not mention anything about database connections (User Connections).

So why does it seem like PGBouncer improves performance when the number of connections continues to increase? Because in this situation, PGBouncer acts as a limiter on database connections, reducing the extra overhead caused by the database/operating system handling too many connections, allowing the database and operating system to focus on processing the business requests sent from each connection, and enabling user requests exceeding the connection limit to share database connections. What PGBouncer does is: if your restaurant can only provide 10 tables, then I only allow 10 tables of guests to enter, and other guests have to wait outside.

However, why should we let 100 tables of guests rush into the restaurant all at once? Without PGBouncer, can’t we control allowing only 10 tables of guests to enter? In modern applications, application-side connection pools are more commonly used, such as connection pools in various middleware software (Weblogic, WebSphere, Tomcat, etc.) or dedicated JDBC connection pools like HikariCP.

Exceeding the number of connections that can be handled is unnecessary. When the database is already at its peak processing capacity, having more connections will not improve overall processing capability; instead, it will slow down the entire database.

To summarize the meaning of the chart above more straightforwardly: when the total number of database connections is at 256, the database reaches its peak processing capacity, and direct database connections can provide 50,000 TPS without any connection pool. If 50,000 TPS cannot meet the application’s concurrent performance requirements, increasing the number of connections will not help. But if connections are increased (wrongly, without any other control), a layer of connection pool like PGBouncer can be added as a connection pool buffer to keep performance stable at around 50,000 TPS.

So, do we have any reason to arbitrarily increase the number of connections? No. What we need to do is: allow the database to open only the number of connections within its processing capacity and enable these connections to be shared by more user requests. When the restaurant size is fixed, the higher the turnover rate, the higher the restaurant’s daily revenue.

How to Ask Questions About Performance

Instead of asking: how can we ensure the database provides sufficient TPS with 4000 connections, we should ask: how can we ensure the database provides 50,000 TPS.

Because the number of connections is not a performance metric, it is merely a parameter configuration option.

In the example above, a connection count of 4000 or 8000 is not a fixed value, nor is it an optimized value. We cannot try to do more additional database optimizations on an unoptimized, modifiable parameter setting, as this may lead to a reversal of priorities. If we can already provide maximum processing capacity with 256 fully loaded database connections, why optimize a database with more than 256 connections? We should ensure the database always runs at full capacity with a maximum of 256 connections.

Now, our question about performance can be more specific. Suppose we have multiple sets of database servers with different configurations; what should the fully loaded connection count be for each set to achieve optimal processing capacity?

This requires testing. However, we will ultimately find that this fully loaded connection count is usually related only to the database server’s hardware resources, such as the number of CPU cores, CPU speed, memory size, storage speed, and the bandwidth and latency between storage and host.

Of course not. Concurrent performance (P) = Hardware processing capability (H) / Business transaction pressure (T).

As mentioned earlier, the performance metric TPS is for a “determined business transaction pressure,” meaning a determined T value.

When the T value is determined, we can only improve the H value to further enhance the P value. Conversely, there is naturally another optimization approach: reducing the T value. There are various ways to reduce the T value, such as adding necessary indexes, rewriting SQL, using partitioned tables, etc. The optimization plan should be formulated based on where the T value is consumed most, which is another topic.

Conclusion

To answer the two initial questions:

  1. If concurrent connections increase beyond a certain threshold, PostgreSQL becomes inadequate, and PGBouncer must be used to improve performance. Is this conclusion correct?

If we must create more connections to the database than the hardware processing capacity for certain hardware resources, using PGBouncer can stabilize performance. However, we have no reason to create more connections than the hardware processing capacity. We have various means to control the maximum number of connections the database can open, and various middleware solutions can provide connection pools. From this perspective, PGBouncer does not improve database performance; it merely reduces concurrency conflicts, doing what any connection pool software can do.

  1. Is it meaningful to attempt various hardware environment combination tests to derive a reference threshold number to guide at what pressure PostgreSQL can cope, and once it exceeds a certain pressure, PGBouncer should be enabled?

It has some significance, but simple stress tests like pgbench are not enough to guide real applications. Concurrent performance (P) = Hardware processing capability (H) / Business transaction pressure (T). Testing in a determined hardware environment means the H value is fixed, so we should focus on T during testing. The T value provided by pgbench is a very simple transaction pressure (only four tables, simple CRUD operations), and the P value tested under this pressure can only be used as a reference. To guide application deployment, typical transaction pressure of the application itself must be abstracted for testing to have guiding significance.

Read More

Flashback Features in MogDB

2024-06-15

What is Database Flashback

The flashback capability of a database allows users to query the contents of a table at a specific point in the past or quickly restore a table or even the entire database to a time before an erroneous operation. This feature can be extremely useful, and sometimes even a lifesaver.

Database flashback capabilities are generally divided into three levels:

  • Row-level Flashback: Also known as flashback query, this typically involves using a SELECT statement to retrieve data from a table as it was at a specific point in time, such as right before a DELETE command was issued.
  • Table-level Flashback: Also known as flashback table, this usually involves using specialized DDL statements to recreate a table from the recycle bin. This is often used to recover from a TRUNCATE TABLE or DROP TABLE operation.
  • Database-level Flashback: Also known as flashback database, this involves using specialized DDL statements to restore the entire database to a previous point in time. Unlike PITR (Point-In-Time Recovery) which involves restoring from backups, flashback database is faster as it does not require restoring the entire database from backup sets.

In MogDB version 5.0, only flashback query and flashback table have been implemented so far. Flashback database has not yet been implemented (Oracle has supported flashback database since version 10g).

Scenario

Imagine a regrettable scenario (Note: do not run the DELETE SQL command in a production environment).

You have a table that records account names and account balances.

1
2
3
4
5
6
7
8
MogDB=# select * from accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

You intended to execute an SQL command to delete account records with a balance of 99 units. Normally, this should delete one record. To demonstrate, let’s use SELECT instead of DELETE.

1
2
3
4
5
MogDB=# select * from accounts where amount=99;
name | amount
------+--------
D | 99
(1 row)

However, the minus sign “-“ and the equals sign “=” are adjacent on the keyboard, and you have fat fingers, so you accidentally pressed the minus sign. As a result, the command you sent to the database looks like this:

1
delete from accounts where amount-99;

For demonstration purposes, we’ll still use SELECT instead of DELETE.

1
2
3
4
5
6
7
MogDB=# select * from accounts where amount-99;
name | amount
------+--------
A | 100
B | 100
C | 100
(3 rows)

A terrifying thing happens: all data except the record with a balance exactly equal to 99 is returned. This means that if this were the DELETE command mentioned above, you would have deleted all accounts with a balance not equal to 99.

The good news is that starting from MogDB version 3.0, validation for such dangerous syntax involving the minus sign has been added. Now, executing the same SQL will result in an error.

1
2
3
4
MogDB=# delete from accounts where amount-99;
ERROR: argument of WHERE must be type boolean, not type integer
LINE 1: delete from accounts where amount-99;
^

However, in the community editions of openGauss, MySQL, or MariaDB, such dangerous syntax can still be executed normally.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
gsql ((openGauss 6.0.0-RC1 build ed7f8e37) compiled at 2024-03-31 12:41:30 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# select version();
version
------------------------------------------------------------------------------------------------------------------------------------------------------------
(openGauss 6.0.0-RC1 build ed7f8e37) compiled at 2024-03-31 12:41:30 commit 0 last mr on aarch64-unknown-linux-gnu, compiled by g++ (GCC) 10.3.1, 64-bit
(1 row)

openGauss=# create table accounts (name varchar2, amount int);
CREATE TABLE
openGauss=# insert into accounts values ('A',100),('B',100),('C',100),('D',99);
INSERT 0 4

openGauss=# delete from accounts where amount-99;
DELETE 3
openGauss=# select * from accounts;
name | amount
------+--------
D | 99
(1 row)

No matter what kind of erroneous operation occurs, suppose a data deletion error really happens. In MogDB, you still have the flashback feature available for recovery.

Flashback Feature in MogDB

The flashback feature and its related implementations have undergone some changes since MogDB version 3.0.

  1. Applicable Only to Ustore Storage Engine:

    The flashback feature only works for tables using the ustore storage engine. The default astore no longer supports flashback queries. Therefore, you need to set enable_ustore=on. This parameter is off by default, and changing it requires a database restart to take effect.

    1
    2
    3
    MogDB=# alter system set enable_ustore=on;
    NOTICE: please restart the database for the POSTMASTER level parameter to take effect.
    ALTER SYSTEM SET
  2. Setting undo_retention_time:

    This parameter specifies the retention time for old version data in the rollback segment, equivalent to the allowable time span for flashback queries. The default value is 0, meaning any flashback query will encounter a “restore point not found” error. Changing this parameter does not require a database restart.

    1
    2
    MogDB=# alter system set undo_retention_time=86400; -- 86400 seconds = 24 hours
    ALTER SYSTEM SET
  3. Enabling Database Recycle Bin for Truncate or Drop Operations:

    To flashback a table from a truncate or drop operation, you need to enable the database recycle bin by setting enable_recyclebin=on. This parameter is off by default, and changing it does not require a database restart.

    1
    2
    MogDB=# alter system set enable_recyclebin=on;
    ALTER SYSTEM SET

Creating and Populating the Ustore Table

Create a ustore-based accounts table and insert some test data.

1
2
3
4
5
MogDB=# create table accounts (name varchar2, amount int) with (storage_type=ustore);
CREATE TABLE

MogDB=# insert into accounts values ('A',100),('B',100),('C',100),('D',99);
INSERT 0 4

Simulating an Erroneous Deletion

Now, due to some erroneous operation, you delete all account records with amounts not equal to 99.

1
2
3
4
5
6
7
8
MogDB=# delete from accounts where amount<>99;
DELETE 3

MogDB=# select * from accounts;
name | amount
------+--------
D | 99
(1 row)

Flashback Query

When you realize the mistake, it might be 1 minute or 1 hour later. As long as it is within 24 hours (due to the undo_retention_time setting), you can recover the data.

Check the current timestamp and estimate the timestamp at the time of the erroneous operation. For simplicity, let’s assume you noted the system’s timestamp before issuing the delete command.

1
2
3
4
5
MogDB=# select sysdate;
sysdate
---------------------
2024-06-13 18:40:18
(1 row)

Issue a flashback query to retrieve the data as it was at the specified timestamp.

1
2
3
4
5
6
7
8
MogDB=# select * from accounts timecapsule TIMESTAMP to_timestamp('2024-06-13 18:40:18','YYYY-MM-DD HH24:MI:SS');
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

Recovering Data

You can recover the data by creating a temporary table with the flashback query results and then inserting the data back into the original table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
MogDB=# create table tmp_accounts as select * from accounts timecapsule TIMESTAMP to_timestamp('2024-06-13 18:40:18','YYYY-MM-DD HH24:MI:SS') where amount<>99;
INSERT 0 3

MogDB=# select * from tmp_accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
(3 rows)

MogDB=# insert into accounts select * from tmp_accounts;
INSERT 0 3

MogDB=# select * from accounts;
name | amount
------+--------
D | 99
A | 100
B | 100
C | 100
(4 rows)

Alternatively, if no new data has been added to the table after the erroneous operation, you can use the timecapsule table command to flashback the entire table to the specified timestamp.

1
2
3
4
5
6
7
8
9
10
11
MogDB=# timecapsule table accounts to TIMESTAMP to_timestamp('2024-06-13 18:40:18','YYYY-MM-DD HH24:MI:SS');
TimeCapsule Table

MogDB=# select * from accounts;
name | amount
------+--------
D | 99
A | 100
B | 100
C | 100
(4 rows)

Recovering from Truncate or Drop Table

If you accidentally issue a TRUNCATE or DROP command, In this situation, the before commands don’t help because the data has been truncated.

1
2
3
4
5
6
7
8
9
10
11
12
13
MogDB=# truncate table accounts;
TRUNCATE TABLE

MogDB=# select * from accounts timecapsule TIMESTAMP to_timestamp('2024-06-14 02:10:28','YYYY-MM-DD HH24:MI:SS');
ERROR: Snapshot too old, ScanRelation, the info: snapxmax is 45865, snapxmin is 45865, csn is 30047, relfrozenxid64 is 45877, globalRecycleXid is 17356.

MogDB=# timecapsule table accounts to TIMESTAMP to_timestamp('2024-06-14 02:10:28','YYYY-MM-DD HH24:MI:SS');
TimeCapsule Table

MogDB=# select * from accounts;
name | amount
------+--------
(0 rows)

For TRUNCATE or DROP TABLE operations, use the to before keyword to recover the table from the recycle bin.

1
2
3
4
5
6
7
8
9
10
11
MogDB=# timecapsule table accounts to before truncate;
TimeCapsule Table

MogDB=# select * from accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

Similarly, if the table was dropped, you can recover it from the recycle bin using the same to before keyword.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
MogDB=# drop table accounts;
DROP TABLE
MogDB=# select * from accounts;
ERROR: relation "accounts" does not exist on dn_6001
LINE 1: select * from accounts;
^
MogDB=# timecapsule table accounts to before drop;
TimeCapsule Table
MogDB=# select * from accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

Conclusion

MogDB’s flashback feature is an essential tool for recovering from accidental data deletions, truncations, or drops. By enabling the ustore storage engine, setting an appropriate undo_retention_time, and activating the recycle bin, you can leverage flashback queries and the timecapsule command to restore your data efficiently. These features ensure that you can maintain data integrity and quickly recover from human errors, providing robust data protection and operational resilience.

Read More

Exploring Oracle Compatibility in MogDB (Series I) - Data Dictionary

2024-06-06

Install PTK

PTK is the optimal tool for installing the MogDB database (an enterprise-grade commercial distribution based on openGauss) or the openGauss database, offering a seamless and smooth installation experience.

To install PTK, simply run the following command:

1
curl --proto '=https' --tlsv1.2 -sSf https://cdn-mogdb.enmotech.com/ptk/install.sh | sh

This command will automatically install PTK in the user’s home directory under $HOME/.ptk. This directory will serve as the working directory for PTK, storing cache files, metadata information, cluster configuration information, backup information, and other related files. Additionally, the installation command will add the path $HOME/.ptk/bin to the PATH environment variable in the corresponding shell profile file, enabling the user to use the ptk command directly after logging into the server. In this tutorial, we installed PTK using the root user, although this is not mandatory.

For more detailed PTK installation instructions, please refer to: PTK Installation Guide.

The environment used for this series of articles is CentOS 7.6 for x86-64. MogDB must currently run on a Linux operating system. If you wish to install MogDB on macOS or Windows, you can do so using container deployment. For more information, refer to: Container-based MogDB Installation

To check your CentOS version, you can use the following command:

1
2
# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

MogDB can also run on ARM architecture CPUs. You can list all supported CPU brands using the ptk candidate cpu command:

1
2
3
4
5
6
7
8
9
10
# ptk candidate cpu
CPU Model
------------------
Cortex-ARM64
Kunpeng-ARM64
Phythium-ARM64
Hygon-x64
Intel-x64
AMD-x64
zhaoxin-x64

If PTK notifies you of a new version while running any command, you can upgrade it directly using the ptk self upgrade command.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# ptk ls
cluster_name | id | addr | user | data_dir | db_version | create_time | comment
---------------+----+------+------+----------+------------+-------------+----------
Warning: New version '1.5.0' is available, you are using ptk version '1.1.3'.
You can upgrade ptk via command: 'ptk self upgrade'
You can also set environment by 'export PTK_NO_CHECK_VERSION=true' to disable this warning
# ptk self upgrade
INFO[2024-06-06T11:59:01.105] current version: 1.1.3 release, target version: latest
INFO[2024-06-06T11:59:01.105] download package from http://cdn-mogdb.enmotech.com/ptk/latest/ptk_linux_x86_64.tar.gz
INFO[2024-06-06T11:59:01.105] downloading ptk_linux_x86_64.tar.gz ...
> ptk_linux_x86_64.tar.gz: 17.78 MiB / 20.08 MiB [----------------------------------------------------------------------->_________] 88.52% 27.93 MiB p/s ETA 0s
> ptk_linux_x86_64.tar.gz: 20.08 MiB / 20.08 MiB [---------------------------------------------------------------------------------] 100.00% 29.07 MiB p/s 900ms
INFO[2024-06-06T11:59:02.956] upgrade ptk successfully
[root@mogdb-kernel-0004 ~]# ptk version
PTK Version: v1.5.0 release
Go Version: go1.19.10
Build Date: 2024-06-04T17:16:07
Git Hash: edd5dbb0
OS/Arch: linux/amd64

Install MogDB

With PTK, you can easily create multiple MogDB instances on a single server. Each MogDB instance can be assigned to a different operating system user, allowing you to manage multiple database instances with a single PTK installation.

To quickly create a test database without any configuration, you can use the ptk demo command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# ptk demo
Cluster Name: "demo_kVP"
+--------------+--------------+------------------------------+-------+---------+---------------------------------+----------+
| az(priority) | ip | user(group) | port | role | data dir | upstream |
+--------------+--------------+------------------------------+-------+---------+---------------------------------+----------+
| AZ1(1) | 172.16.0.164 | demo_user_kVP(demo_user_kVP) | 26000 | primary | /home/demo_user_kVP/demodb/data | - |
+--------------+--------------+------------------------------+-------+---------+---------------------------------+----------+
✔ Is cluster config correct (default=n) [y/n]: y
INFO[2024-06-06T12:12:05.948] start check hostname ...
INFO[2024-06-06T12:12:05.951] check hostname success
INFO[2024-06-06T12:12:05.951] skip check os
INFO[2024-06-06T12:12:05.951] start check distro ...
INFO[2024-06-06T12:12:05.972] check distro success
[....]
Demo Summary:

Deploy Status:

cluste_name | host | user | port | status | message
--------------+--------------+---------------+-------+---------------+----------
demo_kVP | 172.16.0.164 | demo_user_kVP | 26000 | start_success | success

Database Detail:

item | value
----------------+----------------------------------
user_password | Demo&kVP
db_password | Demo&kVP
base_dir | /home/demo_user_kVP/demodb
app_dir | /home/demo_user_kVP/demodb/app
data_dir | /home/demo_user_kVP/demodb/data
tool_dir | /home/demo_user_kVP/demodb/tool
tmp_dir | /home/demo_user_kVP/demodb/tmp

Mode Compatibility:

mode | database | plugins | other
-------+------------------------+-----------------------+-------------------------------------
PG | postgres_compatibility | none |
A | oracle_compatibility | whale[success] |
| | compat_tools[success] |
| | mogila[success] | db user: mogdb, password: Demo&kVP

Plugin Install Details:
mode | plugin | status | reference | error
-------+--------------+---------+-------------------------------------------+--------
A | whale | success | https://docs.mogdb.io/zh/mogdb/v3.0/whale |
| compat_tools | success | https://gitee.com/enmotech/compat-tools |
| mogila | success | https://gitee.com/enmotech/mogila |

The ptk demo command accomplishes the following tasks:

  1. Creates an operating system user named demo_user_kVP.
  2. Creates a single-instance MogDB database of the latest version (currently 5.0.7).
  3. Sets the database to listen on port 26000. Before running the ptk demo command, ensure that port 26000 is not occupied by other applications.
  4. Create an initial database user with the same name as the OS user: demo_user_kVP. Also, create a monitor user named mogdb. Both users will have their initial passwords specified in the “Database Detail” section.
  5. Creates two test databases: oracle_compatibility in Oracle-compatible mode and postgres_compatibility in PostgreSQL-compatible mode.
  6. For the automatically created oracle_compatibility database, the following enhancements are also performed:
    • Installs the mogila dataset. For more information about this dataset, refer to: Using Sample Dataset Mogila

    • Installs the whale plugin, which provides extensive Oracle compatibility features. For more information on the Whale plugin, refer to: whale

    • Installs the compat-tools utility, which supplements the whale plugin with additional Oracle compatibility features that are not yet implemented, such as Oracle-compatible data dictionary views. For more information on compat-tools, refer to: enmotech/compat-tools

      Note: compat-tools only work within a single database. This means that if you install compat-tools in the oracle_compatibility database (as is done automatically by the ptk demo command), you will only be able to query the corresponding Oracle-compatible data dictionary views when logged into that database. If you want to use these views in another database, you must install compat-tools in that database as well. For example, if you want to query Oracle-compatible data dictionary views in the postgres database, you need to download compat-tools separately from enmotech/compat-tools then run the following command:

      1
      gsql -h 127.0.0.1 -p 26000 -d postgres -f runMe.sql

Exploring Oracle-Compatible Data Dictionary in MogDB

Switch to the operating system user associated with the demo database. You can log in to the database using the gsql command.

Note: If you want to connect to the database remotely, you should use the mogdb user. The ptk demo command creates two users in the database. One starts with demo_user_, which is the initial user. Due to MogDB’s enhanced security features, the initial user is not allowed to connect remotely. The other user is mogdb, which can be used for remote connections. The initial password for the users is displayed at the end of the ptk demo command output. Additionally, you should modify the pg_hba.conf file to allow remote connections. For more information, refer to: Connecting to a Database Remotely.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# ptk ls
cluster_name | id | addr | user | data_dir | db_version | create_time | comment
---------------+------+--------------------+---------------+---------------------------------+------------------------------+---------------------+----------
demo_kVP | 6001 | 172.16.0.164:26000 | demo_user_kVP | /home/demo_user_kVP/demodb/data | MogDB 5.0.7 (build c4707384) | 2024-06-06T12:12:24 |
# su - demo_user_kVP
$ gsql -d oracle_compatibility -r
gsql ((MogDB 5.0.7 build c4707384) compiled at 2024-05-24 10:51:53 commit 0 last mr 1804 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

oracle_compatibility=# \dt
List of relations
Schema | Name | Type | Owner | Storage
--------+---------------+-------+-------+----------------------------------
public | actor | table | mogdb | {orientation=row,compression=no}
public | address | table | mogdb | {orientation=row,compression=no}
public | category | table | mogdb | {orientation=row,compression=no}
public | city | table | mogdb | {orientation=row,compression=no}
public | country | table | mogdb | {orientation=row,compression=no}
public | customer | table | mogdb | {orientation=row,compression=no}
public | film | table | mogdb | {orientation=row,compression=no}
public | film_actor | table | mogdb | {orientation=row,compression=no}
public | film_category | table | mogdb | {orientation=row,compression=no}
public | inventory | table | mogdb | {orientation=row,compression=no}
public | language | table | mogdb | {orientation=row,compression=no}
public | payment | table | mogdb | {orientation=row,compression=no}
public | rental | table | mogdb | {orientation=row,compression=no}
public | staff | table | mogdb | {orientation=row,compression=no}
public | store | table | mogdb | {orientation=row,compression=no}
(15 rows)

All these tables come from the Mogila test dataset.

Compat-tools brings a large number of Oracle-compatible data dictionary views (refer to: Oracle-Compatible Views). Here are some simple examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- sysdate function and dual table is supported
oracle_compatibility=# select sysdate from dual;
sysdate
---------------------
2024-06-06 12:47:57
(1 row)

-- V$ views are supported
oracle_compatibility=# select sid,username,status,program from v$session;
sid | username | status | program
----------------+---------------+--------+------------------------
47724487509760 | demo_user_kVP | active | gsql
47724085905152 | demo_user_kVP | idle | WLMArbiter
47724057593600 | demo_user_kVP | idle | WorkloadMonitor
47724032427776 | demo_user_kVP | active | workload
47723769759488 | demo_user_kVP | active | Asp
47723818518272 | demo_user_kVP | idle | statement flush thread
47723696359168 | demo_user_kVP | idle | CfsShrinker
47723730437888 | demo_user_kVP | idle | WDRSnapshot
47723597268736 | demo_user_kVP | idle | ApplyLauncher
47723658610432 | demo_user_kVP | idle | TxnSnapCapturer
47723563190016 | demo_user_kVP | active | JobScheduler
(11 rows)

-- DBA_ views are supported
oracle_compatibility=# select * from dba_users;
username | user_id | password | account_status | lock_date | expiry_date | profile
---------------+---------+----------+----------------+-----------+-------------+--------------
demo_user_kVP | 10 | ******** | NORMAL | | | DEFAULT_POOL
MOGDB | 18720 | ******** | NORMAL | | | DEFAULT_POOL
(2 rows)

-- USER_ views are supported
oracle_compatibility=# select table_name,num_rows,last_analyzed from user_tables;
table_name | num_rows | last_analyzed
---------------+----------+-------------------------------
COUNTRY | 0 |
CITY | 600 | 2024-06-06 12:13:24.504009+08
ACTOR | 0 |
FILM_ACTOR | 5462 | 2024-06-06 12:13:24.518718+08
CATEGORY | 0 |
FILM_CATEGORY | 1000 | 2024-06-06 12:13:24.531421+08
LANGUAGE | 0 |
FILM | 1000 | 2024-06-06 12:13:24.56158+08
PAYMENT | 0 |
CUSTOMER | 0 |
INVENTORY | 0 |
RENTAL | 0 |
STAFF | 0 |
ADDRESS | 0 |
STORE | 0 |
(15 rows)

oracle_compatibility=# select count(*) from CITY;
count
-------
600
(1 row)

-- some short name views are supported, like IND, OBJ, COLS
oracle_compatibility=# select index_name,table_owner,uniqueness,distinct_keys,partitioned from ind where table_name='CITY';
index_name | table_owner | uniqueness | distinct_keys | partitioned
-------------------+-------------+------------+---------------+-------------
CITY_PKEY | PUBLIC | UNIQUE | 600 | NO
IDX_FK_COUNTRY_ID | PUBLIC | NONUNIQUE | 109 | NO
(2 rows)

Note: Due to significant differences in the underlying structure between MogDB and Oracle, these Oracle-compatible views may not include all the fields present in the corresponding Oracle database views.

To be continued:

- Exploring Oracle Compatibility in MogDB (Series II) - System Functions

- Exploring Oracle Compatibility in MogDB (Series III) - DBMS Packages

Read More

Working Mechanism of synchronous_standby_names in MogDB

2024-06-05

In MogDB, the parameter synchronous_standby_names is used to configure synchronous replication settings. When this parameter is set to '*'(default setting), it indicates that any available standby server can be used as a synchronous standby. This configuration allows any currently connected standby to be utilized as a synchronous standby without explicitly specifying the standby’s name.

Working Mechanism

When synchronous_standby_names is set to '*', MogDB’s synchronous replication mechanism selects the synchronous standby through the following steps:

  1. Initial Connection: When the primary server starts or the parameter is changed, the primary server accepts all connected standby servers.

  2. Synchronous Standby Confirmation:

    • The primary server sends a synchronization request to all connected standby servers.
    • Each standby server, upon receiving the request, confirms it and reports its status back to the primary server.
  3. Selecting the Synchronous Standby:

    • The primary server selects the earliest responding standby server(s) as the synchronous standby.
    • This selection process is dynamic, meaning that if new standbys connect or current synchronous standbys disconnect, the primary server will reselect the synchronous standby.

Priority and Behavior

  • No Priority Order: Since '*' denotes any standby, all standbys have the same priority. The primary server simply selects the earliest responding standby as the synchronous standby.
  • Dynamic Adjustment: If a synchronous standby disconnects, the primary server automatically selects the next responding standby as the new synchronous standby, ensuring the continuity and reliability of synchronous replication.
  • Concurrent Management: If multiple standbys connect simultaneously, the primary server can handle these concurrent connections and select the synchronous standby based on the confirmation of synchronization requests.

Configuration Example and Usage

Suppose we have one primary server and three standby servers (standby1, standby2, and standby3). In the configuration file postgresql.conf, set synchronous_standby_names to '*':

1
synchronous_standby_names = '*'

Scenario Analysis

  1. Selection at Startup:

    • When the primary server starts, all connected standby servers send heartbeat signals and wait for the primary server’s synchronization request.
    • The primary server selects the earliest responding standby as the synchronous standby.
  2. Runtime Changes:

    • If the current synchronous standby standby1 disconnects, the primary server automatically selects the next responding standby standby2 as the new synchronous standby.
    • If a new standby standby3 connects to the primary server and there is no current synchronous standby, the primary server selects standby3 as the synchronous standby.

Dynamic Adjustment

If we start the primary server and three standby servers, setting synchronous_standby_names = '*', the following state transitions are possible:

  • Initial State: All standbys (standby1, standby2, and standby3) connect, and the primary server selects the earliest responding standby standby1 as the synchronous standby.
  • standby1 Disconnects: The primary server automatically selects the next responding standby standby2 as the new synchronous standby.
  • New Standby Connection: A new standby standby4 connects; the primary server will not change the current synchronous standby unless standby2 disconnects.

Summary

When synchronous_standby_names is set to '*', MogDB dynamically selects any currently available standby as the synchronous standby. This provides a flexible and highly available synchronous replication mechanism without requiring administrators to manually specify the standby names. The selection process is based on standby response times and automatically adjusts during runtime to ensure the continuity and reliability of synchronous replication.

Read More

MTK: The Ultimate Tool for Seamlessly Migrating Oracle Databases to MogDB

2024-06-03

Get the latest version MTK.

1
2
wget https://cdn-mogdb.enmotech.com/mtk/v2.6.3/mtk_2.6.3_linux_amd64.tar.gz
tar -xvf mtk_2.6.3_linux_amd64.tar.gz

Generate MTK trial license online

The trial license is valid for 1 month, and each email address can generate only one license. However, email addresses with the “enmotech.com” domain can generate licenses repeatedly. If clients wish to extend their trial of MTK beyond the initial month, they should contact Enmotech’s sales or pre-sales team to request an additional 1-month license.

1
2
3
4
5
6
7
[kamus@altlinux10 mtk_2.6.3_linux_amd64]$ ./mtk license gen
License File Not Found (default license.json)
The License code is invalid, start applying
✔ Email: kamus@enmotech.com█
Start applying for email kamus@enmotech.com authorization.
Start parsing the interface to return data.
Successful application for authorization. Please check the mail and save it as license.json.

Upon receiving the email, upload the attached license.json file to the MTK directory. Then, use the command mtk -v to verify the license validation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[kamus@altlinux10 mtk_2.6.3_linux_amd64]$ ./mtk -v
Using license file: /home/kamus/mogdb-tools/mtk_2.6.3_linux_amd64/license.json
Name: kamus@enmotech.com
Expiry: 2022-10-24 12:08:58.751194162 +0800 +0800
License key verified!
License checks OK!

MMMMMMMM MMMMMMMMTTTTTTTTTTTTTTTTTTTTTTTKKKKKKKKK KKKKKKK
M:::::::M M:::::::MT:::::::::::::::::::::TK:::::::K K:::::K
M::::::::M M::::::::MT:::::::::::::::::::::TK:::::::K K:::::K
M:::::::::M M:::::::::MT:::::TT:::::::TT:::::TK:::::::K K::::::K
M::::::::::M M::::::::::MTTTTTT T:::::T TTTTTTKK::::::K K:::::KKK
M:::::::::::M M:::::::::::M T:::::T K:::::K K:::::K
M:::::::M::::M M::::M:::::::M T:::::T K::::::K:::::K
M::::::M M::::M M::::M M::::::M T:::::T K:::::::::::K
M::::::M M::::M::::M M::::::M T:::::T K:::::::::::K
M::::::M M:::::::M M::::::M T:::::T K::::::K:::::K
M::::::M M:::::M M::::::M T:::::T K:::::K K:::::K
M::::::M MMMMM M::::::M T:::::T KK::::::K K:::::KKK
M::::::M M::::::M TT:::::::TT K:::::::K K::::::K
M::::::M M::::::M T:::::::::T K:::::::K K:::::K
M::::::M M::::::M T:::::::::T K:::::::K K:::::K
MMMMMMMM MMMMMMMM TTTTTTTTTTT KKKKKKKKK KKKKKKK

Release version: v2.6.3
Git Commit hash: da0ed8ee
Git Commit Date: 2022-09-22T01:17:49Z
Git Tag : v2.6.3
Build timestamp: 20220922011907

Install Oracle instant client

MTK requires the Oracle Instant Client to migrate Oracle objects to MogDB. In this tutorial, we will download the Oracle Instant Client for Linux x86-64 Basic Package. After downloading, unzip the file and set the LD_LIBRARY_PATH parameter as follows:

1
export LD_LIBRARY_PATH=/home/kamus/instantclient_21_7:$LD_LIBRARY_PATH

In this tutorial, we will migrate a sample schema “customer_orders” to MogDB. The scripts for Oracle can be downloaded from the db-sample-schemas repository on GitHub.

image-20220924140451621

Initialize migration project

1
./mtk init-project -s oracle -t mogdb -n ora2mogdb

Modify MTK configuration file

Modify the example MTK configuration file stored in the project_name_dir/config directory. Refer to the MTK documentation for detailed information on each parameter. The essential configuration sections for MTK are source, target, and object.

  • source section: This defines the connection to the source database. MTK needs to query the Oracle database dictionary to retrieve DDL. Therefore, it is typically recommended to use a DBA user, with the default system user being sufficient.

  • target section: This defines the connection to the target database.

  • object section: To migrate all objects within a schema, simply specify the schema name in the schemas section.

The mtk.json configuration file should look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
"source": {
"type": "oracle",
"connect": {
"version": "",
"host": "119.3.182.31",
"user": "system",
"port": 15221,
"password": "oracle",
"dbName": "ORACLE21C",
"charset": "",
"clientCharset": ""
}
},
"target": {
"type": "mogdb",
"connect": {
"version": "",
"host": "127.0.0.1",
"user": "co",
"port": 26000,
"password": "Enmo@123",
"dbName": "postgres",
"charset": "",
"clientCharset": ""
}
},
"object": {
"tables": [],
"schemas": ["co"],
"excludeTable": {},
"tableSplit": {}
},
"dataOnly": false,
"schemaOnly": false
}

For this tutorial, we plan to migrate all objects in the “CO” schema from the Oracle database to the same user in MogDB. For testing purposes, we will not create a new database in MogDB. Instead, we will create a new user “co” in the default database, postgres.

Run the following command to create the user:

1
2
3
4
5
6
7
[omm@altlinux10 ~]$ gsql -d postgres -p 26000 -r
gsql ((MogDB 3.0.2 build 9bc79be5) compiled at 2022-09-18 00:37:49 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

MogDB=# create user co identified by "Enmo@123";
CREATE ROLE

Start migration

Now, we can start migration.

1
./mtk -c ora2mogdb/config/mtk.json

Check migration report

The migration result report will be generated in the project report directory, available in both plain text and HTML formats. For simplicity, I have included the text format result in this tutorial.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-----------------------
ObjectName Type Summary
-----------------------

+------------------+-------------------+-------------------+--------+-----------+-------------+-------------+-------------+--------------------|-------------+
| Type | StartTime | EndTime | Status | Total Num | Success Num | Warring Num | Failed Num |Failed(Invalid) Num | Time |
+------------------+-------------------+-------------------+--------+-----------+-------------+-------------+-------------+--------------------|-------------+
|Schema |2022-09-24 15:12:36|2022-09-24 15:12:36|finish |1 |1 |0 |0 |0 |282 ms |
|Sequence |2022-09-24 15:12:36|2022-09-24 15:12:36|finish |0 |0 |0 |0 |0 |210 ms |
|ObjectType |2022-09-24 15:12:36|2022-09-24 15:12:36|finish |0 |0 |0 |0 |0 |356 ms |
|Queue |2022-09-24 15:12:36|2022-09-24 15:12:37|finish |0 |0 |0 |0 |0 |177 ms |
|Table |2022-09-24 15:12:37|2022-09-24 15:12:47|finish |7 |7 |0 |0 |0 |9 s 952 ms |
|TableData |2022-09-24 15:12:47|2022-09-24 15:12:53|finish |7 |7 |0 |0 |0 |6 s 743 ms |
|Index |2022-09-24 15:12:53|2022-09-24 15:12:53|finish |7 |7 |0 |0 |0 |1 ms |
|Constraint |2022-09-24 15:12:53|2022-09-24 15:12:53|finish |24 |23 |0 |1 |0 |51 ms |
|DBLink |2022-09-24 15:12:53|2022-09-24 15:12:53|finish |0 |0 |0 |0 |0 |67 ms |
|View |2022-09-24 15:12:53|2022-09-24 15:12:54|finish |4 |2 |0 |2 |0 |723 ms |
|MaterializedView |2022-09-24 15:12:54|2022-09-24 15:12:54|finish |0 |0 |0 |0 |0 |138 ms |
|Function |2022-09-24 15:12:54|2022-09-24 15:12:54|finish |0 |0 |0 |0 |0 |113 ms |
|Procedure |2022-09-24 15:12:54|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |109 ms |
|Package |2022-09-24 15:12:55|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |77 ms |
|Trigger |2022-09-24 15:12:55|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |404 ms |
|Synonym |2022-09-24 15:12:55|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |74 ms |
|TableDataCom |2022-09-24 15:12:55|2022-09-24 15:12:56|finish |7 |7 |0 |0 |0 |810 ms |
|AlterSequence |2022-09-24 15:12:56|2022-09-24 15:12:56|finish |0 |0 |0 |0 |0 |71 ms |
|CollStatistics |2022-09-24 15:12:56|2022-09-24 15:12:56|finish |7 |7 |0 |0 |0 |29 ms |
+------------------+-------------------+-------------------+--------+-----------+-------------+-------------+-------------+--------------------|-------------+

We can see that all tables and table data have been successfully migrated to MogDB without any errors. However, there is one failed constraint and two failed views.

The failed constraint is a JSON check constraint, which is not supported by MogDB.

image-20220924153609138

The failed views are due to the grouping_id function and the json_table function, which are not yet implemented in MogDB ().

SQL ERROR
image-20220924154621375 pq: function grouping_id(character varying, character varying) does not exist
image-20220924154529308 pq: syntax error at or near “columns”

Check migration result

Run the sample queries to ensure that all the data has been migrated to MogDB without any errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* 5 products with the highest revenue
With their corresponding order rank */
select p.product_name,
count(*) number_of_orders,
sum ( oi.quantity * oi.unit_price ) total_value,
rank () over (
order by count(*) desc
) order_count_rank
from products p
join order_items oi
on p.product_id = oi.product_id
group by p.product_name
order by sum ( oi.quantity * oi.unit_price ) desc
fetch first 5 rows only;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
MogDB=> select p.product_name,
MogDB-> count(*) number_of_orders,
MogDB-> sum ( oi.quantity * oi.unit_price ) total_value,
MogDB-> rank () over (
MogDB(> order by sum ( oi.quantity * oi.unit_price ) desc
MogDB(> ) revenue_rank
MogDB-> from products p
MogDB-> join order_items oi
MogDB-> on p.product_id = oi.product_id
MogDB-> group by p.product_name
MogDB-> order by count(*) desc
MogDB-> fetch first 5 rows only;
product_name | number_of_orders | total_value | revenue_rank
-----------------------+------------------+-------------+--------------
Girl's Trousers (Red) | 148 | 15794.76 | 1
Boy's Hoodie (Grey) | 100 | 3754.08 | 35
Men's Pyjamas (Blue) | 100 | 3274.61 | 36
Men's Coat (Red) | 98 | 4230.30 | 31
Boy's Socks (White) | 98 | 3081.12 | 38
(5 rows)

Conclusion

Migrating tables, table data, and indexes from Oracle to MogDB typically proceeds without issues. However, for views, procedures, functions, and packages, some modifications to the source code may be necessary. With the ongoing development of Oracle compatibility in MogDB, we believe that the need for such modifications will decrease over time, making the migration process from Oracle to MogDB even smoother.

Read More