How to Generate a VSIX File from VS Code Extension Source Code

2024-12-12

As I’ve been using Windsurf as my primary code editor, I encountered a situation where the vs-picgo extension wasn’t available in the Windsurf marketplace. This necessitated the need to manually package the extension from its source code. This guide documents the process of generating a VSIX file for VS Code extensions, which can then be installed manually in compatible editors like Windsurf.

In this guide, I’ll walk you through the process of generating a VSIX file from a VS Code extension’s source code. We’ll use the popular vs-picgo extension as an example.

Prerequisites

Before we begin, make sure you have the following installed:

  • Node.js (version 12 or higher)
  • npm (comes with Node.js)

Step 1: Install Required Tools

First, we need to install two essential tools:

  • yarn: A package manager that will handle our dependencies
  • vsce: The VS Code Extension Manager tool that creates VSIX packages
1
2
3
4
5
# Install Yarn globally
sudo npm install -g yarn

# Install vsce globally
sudo npm install -g @vscode/vsce

Step 2: Prepare the Project

  1. Clone or download the extension source code:

    1
    2
    git clone https://github.com/PicGo/vs-picgo.git
    cd vs-picgo
  2. Install project dependencies:

    1
    yarn install

This command will:

  • Read the package.json file
  • Install all required dependencies
  • Create or update the yarn.lock file

Note: The yarn.lock file is important! Don’t delete it as it ensures consistent installations across different environments.

Step 3: Build the Extension

Build the extension using the production build command:

1
yarn build:prod

This command typically:

  • Cleans the previous build output
  • Compiles TypeScript/JavaScript files
  • Bundles all necessary assets
  • Creates the dist directory with the compiled code

In vs-picgo’s case, the build process:

  1. Uses esbuild for fast bundling
  2. Creates both extension and webview bundles
  3. Generates source maps (disabled in production)
  4. Optimizes the code for production use

Step 4: Package the Extension

Finally, create the VSIX file:

1
vsce package

This command:

  1. Runs any pre-publish scripts defined in package.json
  2. Validates the extension manifest
  3. Packages all required files into a VSIX file
  4. Names the file based on the extension’s name and version (e.g., vs-picgo-2.1.6.vsix)

The resulting VSIX file will contain:

  • Compiled JavaScript files
  • Assets (images, CSS, etc.)
  • Extension manifest
  • Documentation files
  • License information

What’s Inside the VSIX?

The VSIX file is essentially a ZIP archive with a specific structure. For vs-picgo, it includes:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
vs-picgo-2.1.6.vsix
├─ [Content_Types].xml
├─ extension.vsixmanifest
└─ extension/
├─ LICENSE.txt
├─ changelog.md
├─ logo.png
├─ package.json
├─ package.nls.json
├─ readme.md
└─ dist/
├─ extension.js
└─ webview/
├─ index.css
└─ index.js

Installing the Extension

You can install the generated VSIX file in VS Code or any compatible editor by:

  1. Opening VS Code/Windsurf/Cursor …
  2. Going to the Extensions view
  3. Clicking the “…” menu (More Actions)
  4. Selecting “Install from VSIX…”
  5. Choosing your generated VSIX file

Troubleshooting

If you encounter any issues:

  1. Missing dist directory error:

    • This is normal on first build
    • The build process will create it automatically
  2. Dependency errors:

    • Run yarn install again
    • Check if you’re using the correct Node.js version
  3. VSIX packaging fails:

    • Verify your package.json is valid
    • Ensure all required files are present
    • Check the extension manifest for errors

Conclusion

Building a VS Code extension VSIX file is straightforward once you have the right tools installed. The process mainly involves installing dependencies, building the source code, and packaging everything into a VSIX file.

Remember to keep your yarn.lock file and always build in production mode before packaging to ensure the best performance and smallest file size for your users.

Happy extension building! 🚀

Read More

What is DBOS and What Should We Expect

2024-11-14

Introduction

The computing world is witnessing a paradigm shift in how we think about operating systems. A team of researchers has proposed DBOS (Database-Oriented Operating System), a radical reimagining of operating system architecture that puts data management at its core. But what exactly is DBOS, and why should we care?

What is DBOS?

DBOS is a novel operating system architecture that treats data management as its primary concern rather than traditional OS functions like process management and I/O. The key insight behind DBOS is that modern applications are increasingly data-centric, yet our operating systems still follow designs from the 1970s that prioritize computation over data management.

Instead of treating databases as just another application, DBOS makes database technology the fundamental building block of the operating system itself. This means core OS functions like process scheduling, resource management, and system monitoring are implemented using database principles and technologies.

Who is behind DBOS?

DBOS is a collaborative research project involving almost twenty researchers across multiple institutions including MIT, Stanford, UW-Madison, Google, VMware, and other organizations. The project is notably led by database pioneer Michael Stonebraker, who is an ACM Turing Award winner (2014) and Professor Emeritus at UC Berkeley, currently affiliated with MIT.

Key institutions and researchers involved include:

  • MIT: Michael Stonebraker, Michael Cafarella, Çağatay Demiralp, and others
  • Stanford: Matei Zaharia, Christos Kozyrakis, and others
  • UW-Madison: Xiangyao Yu
  • Industry partners: Researchers from Google, VMware, and other organizations

The people behind DBOS can be found at DBOS Project.

Ultimate Goal

The ultimate goal of DBOS is to create an operating system that is data-centric and data-driven, the OS is on top of a DBMS, not like today’s DBMS on top of the OS.
ALL system data should reside in the DBMS.

  • Replace the “everything is a file” mantra with “everything is a table”
  • All system state and metadata stored in relational tables
  • All changes to OS state should be through database transactions
  • DBMS provides All functions that a DBMS can do, for example, files are blobs and tables in the DBMS.
  • SQL-based interface for both application and system data access
  • To achieve very high performance, the DBMS must leverage sophisticated caching and parallelization strategies and compile repetitive queries into machine code.

Benefits

  1. Strong Security and Privacy
  • Native GDPR compliance through data-centric design
  • Attribute-based access control (ABAC)
  • Complete audit trails and data lineage
  • Privacy by design through unified data management
  • Fine-grained access control at the data level
  • Enhanced monitoring and threat detection
  • Simplified compliance with regulatory requirements
  • Built-in data encryption and protection mechanisms
  1. Enhanced Performance and Efficiency
  • Optimized resource allocation through database-driven scheduling
  • Reduced data movement and copying
  • Better cache utilization through database techniques
  • Intelligent workload management
  • Advanced query optimization for system operations
  • Improved resource utilization through data-aware decisions
  • Reduced system overhead through unified architecture
  • Better support for modern hardware architectures
  1. Improved Observability and Management
  • Comprehensive system-wide monitoring
  • Real-time analytics on system performance
  • Easy troubleshooting through SQL queries
  • Better capacity planning capabilities
  • Unified logging and debugging interface
  • Historical analysis of system behavior
  • Predictive maintenance capabilities
  • Simplified system administration
  1. Advanced Application Support
  • Native support for distributed applications
  • Better handling of microservices architecture
  • Simplified state management
  • Enhanced support for modern cloud applications
  • Built-in support for data-intensive applications
  • Improved consistency guarantees
  • Better transaction management
  • Simplified development of distributed systems

Technical Implementation

DBOS proposes implementing core OS functions using database principles:

  • Process Management: Processes and their states managed as database tables
  • Resource Scheduling: SQL queries and ML for intelligent scheduling decisions
  • System Monitoring: Metrics collection and analysis through database queries
  • Security: Access control and auditing via database mechanisms
  • Storage: File system metadata stored in relational tables
  • Networking: Network state and routing managed through database abstractions

What Should We Expect?

Near-term Impact

  • Proof of Concept: The researchers are working on demonstrating DBOS’s capabilities through specific use cases like log processing and accelerator management.
  • Performance Improvements: Early implementations might show significant improvements in data-intensive workloads.
  • Development Tools: New tools and frameworks that leverage DBOS’s database-centric approach.

Long-term Possibilities

Cloud Native Integration: DBOS could become particularly relevant for cloud computing environments where data management is crucial.

  • AI/ML Operations: Better support for AI and machine learning workloads through intelligent resource management.
  • Privacy-First Computing: A new standard for building privacy-compliant systems from the ground up.

Challenges Ahead

Several technical and practical challenges need to be addressed:

  1. Performance
  • Minimizing database overhead for system operations
  • Optimizing query performance for real-time OS operations
  • Efficient handling of high-frequency system events
  1. Compatibility
  • Supporting existing applications and system calls
  • Maintaining POSIX compliance where needed
  • Migration path for legacy systems
  1. Distributed Systems
  • Maintaining consistency across distributed nodes
  • Handling network partitions and failures
  • Scaling database operations across clusters
  1. Adoption
  • Convincing stakeholders to adopt radical architectural changes
  • Training developers in the new paradigm
  • Building an ecosystem of compatible tools and applications

Conclusion

DBOS represents a bold reimagining of operating system design for the data-centric world. While it’s still in its early stages, the potential benefits for security, privacy, and developer productivity make it an exciting project to watch. As data continues to grow in importance, DBOS’s approach might prove to be prescient.
The success of DBOS will largely depend on how well it can demonstrate its advantages in real-world scenarios and whether it can overcome the inherent challenges of introducing such a fundamental change to system architecture. For developers, system administrators, and anyone interested in the future of computing, DBOS is definitely worth keeping an eye on.
Whether DBOS becomes the next evolution in operating systems or remains an interesting academic exercise, its ideas about putting data management at the center of system design will likely influence future OS development.

References

  1. M. Stonebraker et al., “DBOS: A DBMS-oriented Operating System,” Proceedings of the VLDB Endowment, Vol. 15, No. 1, 2021.
    https://www.vldb.org/pvldb/vol15/p21-stonebraker.pdf

  2. DBOS Project Official Website, MIT CSAIL and Stanford University.
    https://dbos-project.github.io/

  3. X. Yu et al., “A Case for Building Operating Systems with Database Technology,” Proceedings of the 12th Conference on Innovative Data Systems Research (CIDR ‘22), 2022.
    https://www.cidrdb.org/cidr2022/papers/p82-yu.pdf

  4. DBOS Research Group Publications.
    https://dbos-project.github.io/papers/

These references provide detailed technical information about DBOS’s architecture, implementation, and potential impact on the future of operating systems. The papers discuss various aspects from system design to performance evaluation, security considerations, and practical applications.

Read More

Some Misconceptions About Database Performance - Application Concurrent Requests, Database Connections, and Database Connection Pools

2024-08-21

Let’s start with a chart from a database stress test result.

1724216413686

In the chart above, the X-axis represents the number of concurrent connections, and the Y-axis represents the TPS (Transactions Per Second) provided by the database. The stress testing tool used is pgbench, and the database is a single instance of PostgreSQL 16. TPS_PGDirect represents the TPS when pgbench directly sends the load to PostgreSQL, while TPS_PGBouncer represents the TPS when pgbench sends the load to the PGBouncer database connection pool.

We can clearly see that when the number of connections exceeds 256, the TPS begins to gradually decline when the load is sent directly to the database. When the number of connections eventually rises to over 8000, the database is almost at a standstill. However, when using PGBouncer, the TPS remains stable and excellent, with the database consistently providing high TPS regardless of the number of connections.

Many people might conclude from this chart that if the concurrent connections increase beyond a certain threshold, PostgreSQL becomes inadequate, and PGBouncer must be used to improve performance.

Some users may attempt to conduct various hardware environment combination tests to derive a reference threshold number to guide at what pressure PostgreSQL can cope, and once it exceeds a certain pressure, PGBouncer should be enabled.

So, is this conclusion correct? And does testing out a threshold number have real guiding significance?

Understanding Concurrent Performance

To determine how many concurrent business requests a database can handle, we should focus on the Y-axis from the test chart above, rather than the X-axis. The Y-axis TPS represents the number of transactions the database can process per second, while the X-axis connection count is merely the database connection number we configured ourselves (which can be simply understood as the max_connections parameter value). We can observe that at 256 connections, the direct database stress test with pgbench allows the database to provide over 50,000 TPS, which is actually almost the highest number in the entire stress test (WITHOUT PGBouncer). Subsequent tests using PGBouncer, this number was not significantly exceeded. Therefore, PGBouncer is not a weapon to improve database performance.

What is performance? Performance means: for a determined business transaction pressure (how much work a business transaction needs to handle, such as how much CPU, memory, and IO it consumes), the more transactions the database can execute, the higher the performance. In the chart above, this is represented by the Y-axis TPS (Transactions Per Second).

For example, suppose a user request (Application Request) requires executing 5 transactions from start to finish, and assume these transactions consume similar amounts of time (even if there are differences, we can always get the average transaction runtime in a system). In this case, 50,000 TPS means the database can handle 10,000 user requests per second, which can be simply understood as handling 10,000 concurrent users. If TPS rises to 100,000, the database can handle 20,000 concurrent users; if TPS drops to 10,000, it can only handle 2,000 concurrent users. This is a simple calculation logic, and in this description, we did not mention anything about database connections (User Connections).

So why does it seem like PGBouncer improves performance when the number of connections continues to increase? Because in this situation, PGBouncer acts as a limiter on database connections, reducing the extra overhead caused by the database/operating system handling too many connections, allowing the database and operating system to focus on processing the business requests sent from each connection, and enabling user requests exceeding the connection limit to share database connections. What PGBouncer does is: if your restaurant can only provide 10 tables, then I only allow 10 tables of guests to enter, and other guests have to wait outside.

However, why should we let 100 tables of guests rush into the restaurant all at once? Without PGBouncer, can’t we control allowing only 10 tables of guests to enter? In modern applications, application-side connection pools are more commonly used, such as connection pools in various middleware software (Weblogic, WebSphere, Tomcat, etc.) or dedicated JDBC connection pools like HikariCP.

Exceeding the number of connections that can be handled is unnecessary. When the database is already at its peak processing capacity, having more connections will not improve overall processing capability; instead, it will slow down the entire database.

To summarize the meaning of the chart above more straightforwardly: when the total number of database connections is at 256, the database reaches its peak processing capacity, and direct database connections can provide 50,000 TPS without any connection pool. If 50,000 TPS cannot meet the application’s concurrent performance requirements, increasing the number of connections will not help. But if connections are increased (wrongly, without any other control), a layer of connection pool like PGBouncer can be added as a connection pool buffer to keep performance stable at around 50,000 TPS.

So, do we have any reason to arbitrarily increase the number of connections? No. What we need to do is: allow the database to open only the number of connections within its processing capacity and enable these connections to be shared by more user requests. When the restaurant size is fixed, the higher the turnover rate, the higher the restaurant’s daily revenue.

How to Ask Questions About Performance

Instead of asking: how can we ensure the database provides sufficient TPS with 4000 connections, we should ask: how can we ensure the database provides 50,000 TPS.

Because the number of connections is not a performance metric, it is merely a parameter configuration option.

In the example above, a connection count of 4000 or 8000 is not a fixed value, nor is it an optimized value. We cannot try to do more additional database optimizations on an unoptimized, modifiable parameter setting, as this may lead to a reversal of priorities. If we can already provide maximum processing capacity with 256 fully loaded database connections, why optimize a database with more than 256 connections? We should ensure the database always runs at full capacity with a maximum of 256 connections.

Now, our question about performance can be more specific. Suppose we have multiple sets of database servers with different configurations; what should the fully loaded connection count be for each set to achieve optimal processing capacity?

This requires testing. However, we will ultimately find that this fully loaded connection count is usually related only to the database server’s hardware resources, such as the number of CPU cores, CPU speed, memory size, storage speed, and the bandwidth and latency between storage and host.

Of course not. Concurrent performance (P) = Hardware processing capability (H) / Business transaction pressure (T).

As mentioned earlier, the performance metric TPS is for a “determined business transaction pressure,” meaning a determined T value.

When the T value is determined, we can only improve the H value to further enhance the P value. Conversely, there is naturally another optimization approach: reducing the T value. There are various ways to reduce the T value, such as adding necessary indexes, rewriting SQL, using partitioned tables, etc. The optimization plan should be formulated based on where the T value is consumed most, which is another topic.

Conclusion

To answer the two initial questions:

  1. If concurrent connections increase beyond a certain threshold, PostgreSQL becomes inadequate, and PGBouncer must be used to improve performance. Is this conclusion correct?

If we must create more connections to the database than the hardware processing capacity for certain hardware resources, using PGBouncer can stabilize performance. However, we have no reason to create more connections than the hardware processing capacity. We have various means to control the maximum number of connections the database can open, and various middleware solutions can provide connection pools. From this perspective, PGBouncer does not improve database performance; it merely reduces concurrency conflicts, doing what any connection pool software can do.

  1. Is it meaningful to attempt various hardware environment combination tests to derive a reference threshold number to guide at what pressure PostgreSQL can cope, and once it exceeds a certain pressure, PGBouncer should be enabled?

It has some significance, but simple stress tests like pgbench are not enough to guide real applications. Concurrent performance (P) = Hardware processing capability (H) / Business transaction pressure (T). Testing in a determined hardware environment means the H value is fixed, so we should focus on T during testing. The T value provided by pgbench is a very simple transaction pressure (only four tables, simple CRUD operations), and the P value tested under this pressure can only be used as a reference. To guide application deployment, typical transaction pressure of the application itself must be abstracted for testing to have guiding significance.

Read More

Flashback Features in MogDB

2024-06-15

What is Database Flashback

The flashback capability of a database allows users to query the contents of a table at a specific point in the past or quickly restore a table or even the entire database to a time before an erroneous operation. This feature can be extremely useful, and sometimes even a lifesaver.

Database flashback capabilities are generally divided into three levels:

  • Row-level Flashback: Also known as flashback query, this typically involves using a SELECT statement to retrieve data from a table as it was at a specific point in time, such as right before a DELETE command was issued.
  • Table-level Flashback: Also known as flashback table, this usually involves using specialized DDL statements to recreate a table from the recycle bin. This is often used to recover from a TRUNCATE TABLE or DROP TABLE operation.
  • Database-level Flashback: Also known as flashback database, this involves using specialized DDL statements to restore the entire database to a previous point in time. Unlike PITR (Point-In-Time Recovery) which involves restoring from backups, flashback database is faster as it does not require restoring the entire database from backup sets.

In MogDB version 5.0, only flashback query and flashback table have been implemented so far. Flashback database has not yet been implemented (Oracle has supported flashback database since version 10g).

Scenario

Imagine a regrettable scenario (Note: do not run the DELETE SQL command in a production environment).

You have a table that records account names and account balances.

1
2
3
4
5
6
7
8
MogDB=# select * from accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

You intended to execute an SQL command to delete account records with a balance of 99 units. Normally, this should delete one record. To demonstrate, let’s use SELECT instead of DELETE.

1
2
3
4
5
MogDB=# select * from accounts where amount=99;
name | amount
------+--------
D | 99
(1 row)

However, the minus sign “-“ and the equals sign “=” are adjacent on the keyboard, and you have fat fingers, so you accidentally pressed the minus sign. As a result, the command you sent to the database looks like this:

1
delete from accounts where amount-99;

For demonstration purposes, we’ll still use SELECT instead of DELETE.

1
2
3
4
5
6
7
MogDB=# select * from accounts where amount-99;
name | amount
------+--------
A | 100
B | 100
C | 100
(3 rows)

A terrifying thing happens: all data except the record with a balance exactly equal to 99 is returned. This means that if this were the DELETE command mentioned above, you would have deleted all accounts with a balance not equal to 99.

The good news is that starting from MogDB version 3.0, validation for such dangerous syntax involving the minus sign has been added. Now, executing the same SQL will result in an error.

1
2
3
4
MogDB=# delete from accounts where amount-99;
ERROR: argument of WHERE must be type boolean, not type integer
LINE 1: delete from accounts where amount-99;
^

However, in the community editions of openGauss, MySQL, or MariaDB, such dangerous syntax can still be executed normally.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
gsql ((openGauss 6.0.0-RC1 build ed7f8e37) compiled at 2024-03-31 12:41:30 commit 0 last mr  )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

openGauss=# select version();
version
------------------------------------------------------------------------------------------------------------------------------------------------------------
(openGauss 6.0.0-RC1 build ed7f8e37) compiled at 2024-03-31 12:41:30 commit 0 last mr on aarch64-unknown-linux-gnu, compiled by g++ (GCC) 10.3.1, 64-bit
(1 row)

openGauss=# create table accounts (name varchar2, amount int);
CREATE TABLE
openGauss=# insert into accounts values ('A',100),('B',100),('C',100),('D',99);
INSERT 0 4

openGauss=# delete from accounts where amount-99;
DELETE 3
openGauss=# select * from accounts;
name | amount
------+--------
D | 99
(1 row)

No matter what kind of erroneous operation occurs, suppose a data deletion error really happens. In MogDB, you still have the flashback feature available for recovery.

Flashback Feature in MogDB

The flashback feature and its related implementations have undergone some changes since MogDB version 3.0.

  1. Applicable Only to Ustore Storage Engine:

    The flashback feature only works for tables using the ustore storage engine. The default astore no longer supports flashback queries. Therefore, you need to set enable_ustore=on. This parameter is off by default, and changing it requires a database restart to take effect.

    1
    2
    3
    MogDB=# alter system set enable_ustore=on;
    NOTICE: please restart the database for the POSTMASTER level parameter to take effect.
    ALTER SYSTEM SET
  2. Setting undo_retention_time:

    This parameter specifies the retention time for old version data in the rollback segment, equivalent to the allowable time span for flashback queries. The default value is 0, meaning any flashback query will encounter a “restore point not found” error. Changing this parameter does not require a database restart.

    1
    2
    MogDB=# alter system set undo_retention_time=86400; -- 86400 seconds = 24 hours
    ALTER SYSTEM SET
  3. Enabling Database Recycle Bin for Truncate or Drop Operations:

    To flashback a table from a truncate or drop operation, you need to enable the database recycle bin by setting enable_recyclebin=on. This parameter is off by default, and changing it does not require a database restart.

    1
    2
    MogDB=# alter system set enable_recyclebin=on;
    ALTER SYSTEM SET

Creating and Populating the Ustore Table

Create a ustore-based accounts table and insert some test data.

1
2
3
4
5
MogDB=# create table accounts (name varchar2, amount int) with (storage_type=ustore);
CREATE TABLE

MogDB=# insert into accounts values ('A',100),('B',100),('C',100),('D',99);
INSERT 0 4

Simulating an Erroneous Deletion

Now, due to some erroneous operation, you delete all account records with amounts not equal to 99.

1
2
3
4
5
6
7
8
MogDB=# delete from accounts where amount<>99;
DELETE 3

MogDB=# select * from accounts;
name | amount
------+--------
D | 99
(1 row)

Flashback Query

When you realize the mistake, it might be 1 minute or 1 hour later. As long as it is within 24 hours (due to the undo_retention_time setting), you can recover the data.

Check the current timestamp and estimate the timestamp at the time of the erroneous operation. For simplicity, let’s assume you noted the system’s timestamp before issuing the delete command.

1
2
3
4
5
MogDB=# select sysdate;
sysdate
---------------------
2024-06-13 18:40:18
(1 row)

Issue a flashback query to retrieve the data as it was at the specified timestamp.

1
2
3
4
5
6
7
8
MogDB=# select * from accounts timecapsule TIMESTAMP to_timestamp('2024-06-13 18:40:18','YYYY-MM-DD HH24:MI:SS');
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

Recovering Data

You can recover the data by creating a temporary table with the flashback query results and then inserting the data back into the original table.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
MogDB=# create table tmp_accounts as select * from accounts timecapsule TIMESTAMP to_timestamp('2024-06-13 18:40:18','YYYY-MM-DD HH24:MI:SS') where amount<>99;
INSERT 0 3

MogDB=# select * from tmp_accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
(3 rows)

MogDB=# insert into accounts select * from tmp_accounts;
INSERT 0 3

MogDB=# select * from accounts;
name | amount
------+--------
D | 99
A | 100
B | 100
C | 100
(4 rows)

Alternatively, if no new data has been added to the table after the erroneous operation, you can use the timecapsule table command to flashback the entire table to the specified timestamp.

1
2
3
4
5
6
7
8
9
10
11
MogDB=# timecapsule table accounts to TIMESTAMP to_timestamp('2024-06-13 18:40:18','YYYY-MM-DD HH24:MI:SS');
TimeCapsule Table

MogDB=# select * from accounts;
name | amount
------+--------
D | 99
A | 100
B | 100
C | 100
(4 rows)

Recovering from Truncate or Drop Table

If you accidentally issue a TRUNCATE or DROP command, In this situation, the before commands don’t help because the data has been truncated.

1
2
3
4
5
6
7
8
9
10
11
12
13
MogDB=# truncate table accounts;
TRUNCATE TABLE

MogDB=# select * from accounts timecapsule TIMESTAMP to_timestamp('2024-06-14 02:10:28','YYYY-MM-DD HH24:MI:SS');
ERROR: Snapshot too old, ScanRelation, the info: snapxmax is 45865, snapxmin is 45865, csn is 30047, relfrozenxid64 is 45877, globalRecycleXid is 17356.

MogDB=# timecapsule table accounts to TIMESTAMP to_timestamp('2024-06-14 02:10:28','YYYY-MM-DD HH24:MI:SS');
TimeCapsule Table

MogDB=# select * from accounts;
name | amount
------+--------
(0 rows)

For TRUNCATE or DROP TABLE operations, use the to before keyword to recover the table from the recycle bin.

1
2
3
4
5
6
7
8
9
10
11
MogDB=# timecapsule table accounts to before truncate;
TimeCapsule Table

MogDB=# select * from accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

Similarly, if the table was dropped, you can recover it from the recycle bin using the same to before keyword.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
MogDB=# drop table accounts;
DROP TABLE
MogDB=# select * from accounts;
ERROR: relation "accounts" does not exist on dn_6001
LINE 1: select * from accounts;
^
MogDB=# timecapsule table accounts to before drop;
TimeCapsule Table
MogDB=# select * from accounts;
name | amount
------+--------
A | 100
B | 100
C | 100
D | 99
(4 rows)

Conclusion

MogDB’s flashback feature is an essential tool for recovering from accidental data deletions, truncations, or drops. By enabling the ustore storage engine, setting an appropriate undo_retention_time, and activating the recycle bin, you can leverage flashback queries and the timecapsule command to restore your data efficiently. These features ensure that you can maintain data integrity and quickly recover from human errors, providing robust data protection and operational resilience.

Read More

Exploring Oracle Compatibility in MogDB (Series I) - Data Dictionary

2024-06-06

Install PTK

PTK is the optimal tool for installing the MogDB database (an enterprise-grade commercial distribution based on openGauss) or the openGauss database, offering a seamless and smooth installation experience.

To install PTK, simply run the following command:

1
curl --proto '=https' --tlsv1.2 -sSf https://cdn-mogdb.enmotech.com/ptk/install.sh | sh

This command will automatically install PTK in the user’s home directory under $HOME/.ptk. This directory will serve as the working directory for PTK, storing cache files, metadata information, cluster configuration information, backup information, and other related files. Additionally, the installation command will add the path $HOME/.ptk/bin to the PATH environment variable in the corresponding shell profile file, enabling the user to use the ptk command directly after logging into the server. In this tutorial, we installed PTK using the root user, although this is not mandatory.

For more detailed PTK installation instructions, please refer to: PTK Installation Guide.

The environment used for this series of articles is CentOS 7.6 for x86-64. MogDB must currently run on a Linux operating system. If you wish to install MogDB on macOS or Windows, you can do so using container deployment. For more information, refer to: Container-based MogDB Installation

To check your CentOS version, you can use the following command:

1
2
# cat /etc/centos-release
CentOS Linux release 7.6.1810 (Core)

MogDB can also run on ARM architecture CPUs. You can list all supported CPU brands using the ptk candidate cpu command:

1
2
3
4
5
6
7
8
9
10
# ptk candidate cpu
CPU Model
------------------
Cortex-ARM64
Kunpeng-ARM64
Phythium-ARM64
Hygon-x64
Intel-x64
AMD-x64
zhaoxin-x64

If PTK notifies you of a new version while running any command, you can upgrade it directly using the ptk self upgrade command.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
# ptk ls
cluster_name | id | addr | user | data_dir | db_version | create_time | comment
---------------+----+------+------+----------+------------+-------------+----------
Warning: New version '1.5.0' is available, you are using ptk version '1.1.3'.
You can upgrade ptk via command: 'ptk self upgrade'
You can also set environment by 'export PTK_NO_CHECK_VERSION=true' to disable this warning
# ptk self upgrade
INFO[2024-06-06T11:59:01.105] current version: 1.1.3 release, target version: latest
INFO[2024-06-06T11:59:01.105] download package from http://cdn-mogdb.enmotech.com/ptk/latest/ptk_linux_x86_64.tar.gz
INFO[2024-06-06T11:59:01.105] downloading ptk_linux_x86_64.tar.gz ...
> ptk_linux_x86_64.tar.gz: 17.78 MiB / 20.08 MiB [----------------------------------------------------------------------->_________] 88.52% 27.93 MiB p/s ETA 0s
> ptk_linux_x86_64.tar.gz: 20.08 MiB / 20.08 MiB [---------------------------------------------------------------------------------] 100.00% 29.07 MiB p/s 900ms
INFO[2024-06-06T11:59:02.956] upgrade ptk successfully
[root@mogdb-kernel-0004 ~]# ptk version
PTK Version: v1.5.0 release
Go Version: go1.19.10
Build Date: 2024-06-04T17:16:07
Git Hash: edd5dbb0
OS/Arch: linux/amd64

Install MogDB

With PTK, you can easily create multiple MogDB instances on a single server. Each MogDB instance can be assigned to a different operating system user, allowing you to manage multiple database instances with a single PTK installation.

To quickly create a test database without any configuration, you can use the ptk demo command:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
# ptk demo
Cluster Name: "demo_kVP"
+--------------+--------------+------------------------------+-------+---------+---------------------------------+----------+
| az(priority) | ip | user(group) | port | role | data dir | upstream |
+--------------+--------------+------------------------------+-------+---------+---------------------------------+----------+
| AZ1(1) | 172.16.0.164 | demo_user_kVP(demo_user_kVP) | 26000 | primary | /home/demo_user_kVP/demodb/data | - |
+--------------+--------------+------------------------------+-------+---------+---------------------------------+----------+
✔ Is cluster config correct (default=n) [y/n]: y
INFO[2024-06-06T12:12:05.948] start check hostname ...
INFO[2024-06-06T12:12:05.951] check hostname success
INFO[2024-06-06T12:12:05.951] skip check os
INFO[2024-06-06T12:12:05.951] start check distro ...
INFO[2024-06-06T12:12:05.972] check distro success
[....]
Demo Summary:

Deploy Status:

cluste_name | host | user | port | status | message
--------------+--------------+---------------+-------+---------------+----------
demo_kVP | 172.16.0.164 | demo_user_kVP | 26000 | start_success | success

Database Detail:

item | value
----------------+----------------------------------
user_password | Demo&kVP
db_password | Demo&kVP
base_dir | /home/demo_user_kVP/demodb
app_dir | /home/demo_user_kVP/demodb/app
data_dir | /home/demo_user_kVP/demodb/data
tool_dir | /home/demo_user_kVP/demodb/tool
tmp_dir | /home/demo_user_kVP/demodb/tmp

Mode Compatibility:

mode | database | plugins | other
-------+------------------------+-----------------------+-------------------------------------
PG | postgres_compatibility | none |
A | oracle_compatibility | whale[success] |
| | compat_tools[success] |
| | mogila[success] | db user: mogdb, password: Demo&kVP

Plugin Install Details:
mode | plugin | status | reference | error
-------+--------------+---------+-------------------------------------------+--------
A | whale | success | https://docs.mogdb.io/zh/mogdb/v3.0/whale |
| compat_tools | success | https://gitee.com/enmotech/compat-tools |
| mogila | success | https://gitee.com/enmotech/mogila |

The ptk demo command accomplishes the following tasks:

  1. Creates an operating system user named demo_user_kVP.
  2. Creates a single-instance MogDB database of the latest version (currently 5.0.7).
  3. Sets the database to listen on port 26000. Before running the ptk demo command, ensure that port 26000 is not occupied by other applications.
  4. Create an initial database user with the same name as the OS user: demo_user_kVP. Also, create a monitor user named mogdb. Both users will have their initial passwords specified in the “Database Detail” section.
  5. Creates two test databases: oracle_compatibility in Oracle-compatible mode and postgres_compatibility in PostgreSQL-compatible mode.
  6. For the automatically created oracle_compatibility database, the following enhancements are also performed:
    • Installs the mogila dataset. For more information about this dataset, refer to: Using Sample Dataset Mogila

    • Installs the whale plugin, which provides extensive Oracle compatibility features. For more information on the Whale plugin, refer to: whale

    • Installs the compat-tools utility, which supplements the whale plugin with additional Oracle compatibility features that are not yet implemented, such as Oracle-compatible data dictionary views. For more information on compat-tools, refer to: enmotech/compat-tools

      Note: compat-tools only work within a single database. This means that if you install compat-tools in the oracle_compatibility database (as is done automatically by the ptk demo command), you will only be able to query the corresponding Oracle-compatible data dictionary views when logged into that database. If you want to use these views in another database, you must install compat-tools in that database as well. For example, if you want to query Oracle-compatible data dictionary views in the postgres database, you need to download compat-tools separately from enmotech/compat-tools then run the following command:

      1
      gsql -h 127.0.0.1 -p 26000 -d postgres -f runMe.sql

Exploring Oracle-Compatible Data Dictionary in MogDB

Switch to the operating system user associated with the demo database. You can log in to the database using the gsql command.

Note: If you want to connect to the database remotely, you should use the mogdb user. The ptk demo command creates two users in the database. One starts with demo_user_, which is the initial user. Due to MogDB’s enhanced security features, the initial user is not allowed to connect remotely. The other user is mogdb, which can be used for remote connections. The initial password for the users is displayed at the end of the ptk demo command output. Additionally, you should modify the pg_hba.conf file to allow remote connections. For more information, refer to: Connecting to a Database Remotely.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
# ptk ls
cluster_name | id | addr | user | data_dir | db_version | create_time | comment
---------------+------+--------------------+---------------+---------------------------------+------------------------------+---------------------+----------
demo_kVP | 6001 | 172.16.0.164:26000 | demo_user_kVP | /home/demo_user_kVP/demodb/data | MogDB 5.0.7 (build c4707384) | 2024-06-06T12:12:24 |
# su - demo_user_kVP
$ gsql -d oracle_compatibility -r
gsql ((MogDB 5.0.7 build c4707384) compiled at 2024-05-24 10:51:53 commit 0 last mr 1804 )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

oracle_compatibility=# \dt
List of relations
Schema | Name | Type | Owner | Storage
--------+---------------+-------+-------+----------------------------------
public | actor | table | mogdb | {orientation=row,compression=no}
public | address | table | mogdb | {orientation=row,compression=no}
public | category | table | mogdb | {orientation=row,compression=no}
public | city | table | mogdb | {orientation=row,compression=no}
public | country | table | mogdb | {orientation=row,compression=no}
public | customer | table | mogdb | {orientation=row,compression=no}
public | film | table | mogdb | {orientation=row,compression=no}
public | film_actor | table | mogdb | {orientation=row,compression=no}
public | film_category | table | mogdb | {orientation=row,compression=no}
public | inventory | table | mogdb | {orientation=row,compression=no}
public | language | table | mogdb | {orientation=row,compression=no}
public | payment | table | mogdb | {orientation=row,compression=no}
public | rental | table | mogdb | {orientation=row,compression=no}
public | staff | table | mogdb | {orientation=row,compression=no}
public | store | table | mogdb | {orientation=row,compression=no}
(15 rows)

All these tables come from the Mogila test dataset.

Compat-tools brings a large number of Oracle-compatible data dictionary views (refer to: Oracle-Compatible Views). Here are some simple examples.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
-- sysdate function and dual table is supported
oracle_compatibility=# select sysdate from dual;
sysdate
---------------------
2024-06-06 12:47:57
(1 row)

-- V$ views are supported
oracle_compatibility=# select sid,username,status,program from v$session;
sid | username | status | program
----------------+---------------+--------+------------------------
47724487509760 | demo_user_kVP | active | gsql
47724085905152 | demo_user_kVP | idle | WLMArbiter
47724057593600 | demo_user_kVP | idle | WorkloadMonitor
47724032427776 | demo_user_kVP | active | workload
47723769759488 | demo_user_kVP | active | Asp
47723818518272 | demo_user_kVP | idle | statement flush thread
47723696359168 | demo_user_kVP | idle | CfsShrinker
47723730437888 | demo_user_kVP | idle | WDRSnapshot
47723597268736 | demo_user_kVP | idle | ApplyLauncher
47723658610432 | demo_user_kVP | idle | TxnSnapCapturer
47723563190016 | demo_user_kVP | active | JobScheduler
(11 rows)

-- DBA_ views are supported
oracle_compatibility=# select * from dba_users;
username | user_id | password | account_status | lock_date | expiry_date | profile
---------------+---------+----------+----------------+-----------+-------------+--------------
demo_user_kVP | 10 | ******** | NORMAL | | | DEFAULT_POOL
MOGDB | 18720 | ******** | NORMAL | | | DEFAULT_POOL
(2 rows)

-- USER_ views are supported
oracle_compatibility=# select table_name,num_rows,last_analyzed from user_tables;
table_name | num_rows | last_analyzed
---------------+----------+-------------------------------
COUNTRY | 0 |
CITY | 600 | 2024-06-06 12:13:24.504009+08
ACTOR | 0 |
FILM_ACTOR | 5462 | 2024-06-06 12:13:24.518718+08
CATEGORY | 0 |
FILM_CATEGORY | 1000 | 2024-06-06 12:13:24.531421+08
LANGUAGE | 0 |
FILM | 1000 | 2024-06-06 12:13:24.56158+08
PAYMENT | 0 |
CUSTOMER | 0 |
INVENTORY | 0 |
RENTAL | 0 |
STAFF | 0 |
ADDRESS | 0 |
STORE | 0 |
(15 rows)

oracle_compatibility=# select count(*) from CITY;
count
-------
600
(1 row)

-- some short name views are supported, like IND, OBJ, COLS
oracle_compatibility=# select index_name,table_owner,uniqueness,distinct_keys,partitioned from ind where table_name='CITY';
index_name | table_owner | uniqueness | distinct_keys | partitioned
-------------------+-------------+------------+---------------+-------------
CITY_PKEY | PUBLIC | UNIQUE | 600 | NO
IDX_FK_COUNTRY_ID | PUBLIC | NONUNIQUE | 109 | NO
(2 rows)

Note: Due to significant differences in the underlying structure between MogDB and Oracle, these Oracle-compatible views may not include all the fields present in the corresponding Oracle database views.

To be continued:

- Exploring Oracle Compatibility in MogDB (Series II) - System Functions

- Exploring Oracle Compatibility in MogDB (Series III) - DBMS Packages

Read More

Working Mechanism of synchronous_standby_names in MogDB

2024-06-05

In MogDB, the parameter synchronous_standby_names is used to configure synchronous replication settings. When this parameter is set to '*'(default setting), it indicates that any available standby server can be used as a synchronous standby. This configuration allows any currently connected standby to be utilized as a synchronous standby without explicitly specifying the standby’s name.

Working Mechanism

When synchronous_standby_names is set to '*', MogDB’s synchronous replication mechanism selects the synchronous standby through the following steps:

  1. Initial Connection: When the primary server starts or the parameter is changed, the primary server accepts all connected standby servers.

  2. Synchronous Standby Confirmation:

    • The primary server sends a synchronization request to all connected standby servers.
    • Each standby server, upon receiving the request, confirms it and reports its status back to the primary server.
  3. Selecting the Synchronous Standby:

    • The primary server selects the earliest responding standby server(s) as the synchronous standby.
    • This selection process is dynamic, meaning that if new standbys connect or current synchronous standbys disconnect, the primary server will reselect the synchronous standby.

Priority and Behavior

  • No Priority Order: Since '*' denotes any standby, all standbys have the same priority. The primary server simply selects the earliest responding standby as the synchronous standby.
  • Dynamic Adjustment: If a synchronous standby disconnects, the primary server automatically selects the next responding standby as the new synchronous standby, ensuring the continuity and reliability of synchronous replication.
  • Concurrent Management: If multiple standbys connect simultaneously, the primary server can handle these concurrent connections and select the synchronous standby based on the confirmation of synchronization requests.

Configuration Example and Usage

Suppose we have one primary server and three standby servers (standby1, standby2, and standby3). In the configuration file postgresql.conf, set synchronous_standby_names to '*':

1
synchronous_standby_names = '*'

Scenario Analysis

  1. Selection at Startup:

    • When the primary server starts, all connected standby servers send heartbeat signals and wait for the primary server’s synchronization request.
    • The primary server selects the earliest responding standby as the synchronous standby.
  2. Runtime Changes:

    • If the current synchronous standby standby1 disconnects, the primary server automatically selects the next responding standby standby2 as the new synchronous standby.
    • If a new standby standby3 connects to the primary server and there is no current synchronous standby, the primary server selects standby3 as the synchronous standby.

Dynamic Adjustment

If we start the primary server and three standby servers, setting synchronous_standby_names = '*', the following state transitions are possible:

  • Initial State: All standbys (standby1, standby2, and standby3) connect, and the primary server selects the earliest responding standby standby1 as the synchronous standby.
  • standby1 Disconnects: The primary server automatically selects the next responding standby standby2 as the new synchronous standby.
  • New Standby Connection: A new standby standby4 connects; the primary server will not change the current synchronous standby unless standby2 disconnects.

Summary

When synchronous_standby_names is set to '*', MogDB dynamically selects any currently available standby as the synchronous standby. This provides a flexible and highly available synchronous replication mechanism without requiring administrators to manually specify the standby names. The selection process is based on standby response times and automatically adjusts during runtime to ensure the continuity and reliability of synchronous replication.

Read More

MTK: The Ultimate Tool for Seamlessly Migrating Oracle Databases to MogDB

2024-06-03

Get the latest version MTK.

1
2
wget https://cdn-mogdb.enmotech.com/mtk/v2.6.3/mtk_2.6.3_linux_amd64.tar.gz
tar -xvf mtk_2.6.3_linux_amd64.tar.gz

Generate MTK trial license online

The trial license is valid for 1 month, and each email address can generate only one license. However, email addresses with the “enmotech.com” domain can generate licenses repeatedly. If clients wish to extend their trial of MTK beyond the initial month, they should contact Enmotech’s sales or pre-sales team to request an additional 1-month license.

1
2
3
4
5
6
7
[kamus@altlinux10 mtk_2.6.3_linux_amd64]$ ./mtk license gen
License File Not Found (default license.json)
The License code is invalid, start applying
✔ Email: kamus@enmotech.com█
Start applying for email kamus@enmotech.com authorization.
Start parsing the interface to return data.
Successful application for authorization. Please check the mail and save it as license.json.

Upon receiving the email, upload the attached license.json file to the MTK directory. Then, use the command mtk -v to verify the license validation.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
[kamus@altlinux10 mtk_2.6.3_linux_amd64]$ ./mtk -v
Using license file: /home/kamus/mogdb-tools/mtk_2.6.3_linux_amd64/license.json
Name: kamus@enmotech.com
Expiry: 2022-10-24 12:08:58.751194162 +0800 +0800
License key verified!
License checks OK!

MMMMMMMM MMMMMMMMTTTTTTTTTTTTTTTTTTTTTTTKKKKKKKKK KKKKKKK
M:::::::M M:::::::MT:::::::::::::::::::::TK:::::::K K:::::K
M::::::::M M::::::::MT:::::::::::::::::::::TK:::::::K K:::::K
M:::::::::M M:::::::::MT:::::TT:::::::TT:::::TK:::::::K K::::::K
M::::::::::M M::::::::::MTTTTTT T:::::T TTTTTTKK::::::K K:::::KKK
M:::::::::::M M:::::::::::M T:::::T K:::::K K:::::K
M:::::::M::::M M::::M:::::::M T:::::T K::::::K:::::K
M::::::M M::::M M::::M M::::::M T:::::T K:::::::::::K
M::::::M M::::M::::M M::::::M T:::::T K:::::::::::K
M::::::M M:::::::M M::::::M T:::::T K::::::K:::::K
M::::::M M:::::M M::::::M T:::::T K:::::K K:::::K
M::::::M MMMMM M::::::M T:::::T KK::::::K K:::::KKK
M::::::M M::::::M TT:::::::TT K:::::::K K::::::K
M::::::M M::::::M T:::::::::T K:::::::K K:::::K
M::::::M M::::::M T:::::::::T K:::::::K K:::::K
MMMMMMMM MMMMMMMM TTTTTTTTTTT KKKKKKKKK KKKKKKK

Release version: v2.6.3
Git Commit hash: da0ed8ee
Git Commit Date: 2022-09-22T01:17:49Z
Git Tag : v2.6.3
Build timestamp: 20220922011907

Install Oracle instant client

MTK requires the Oracle Instant Client to migrate Oracle objects to MogDB. In this tutorial, we will download the Oracle Instant Client for Linux x86-64 Basic Package. After downloading, unzip the file and set the LD_LIBRARY_PATH parameter as follows:

1
export LD_LIBRARY_PATH=/home/kamus/instantclient_21_7:$LD_LIBRARY_PATH

In this tutorial, we will migrate a sample schema “customer_orders” to MogDB. The scripts for Oracle can be downloaded from the db-sample-schemas repository on GitHub.

image-20220924140451621

Initialize migration project

1
./mtk init-project -s oracle -t mogdb -n ora2mogdb

Modify MTK configuration file

Modify the example MTK configuration file stored in the project_name_dir/config directory. Refer to the MTK documentation for detailed information on each parameter. The essential configuration sections for MTK are source, target, and object.

  • source section: This defines the connection to the source database. MTK needs to query the Oracle database dictionary to retrieve DDL. Therefore, it is typically recommended to use a DBA user, with the default system user being sufficient.

  • target section: This defines the connection to the target database.

  • object section: To migrate all objects within a schema, simply specify the schema name in the schemas section.

The mtk.json configuration file should look like this:

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
{
"source": {
"type": "oracle",
"connect": {
"version": "",
"host": "119.3.182.31",
"user": "system",
"port": 15221,
"password": "oracle",
"dbName": "ORACLE21C",
"charset": "",
"clientCharset": ""
}
},
"target": {
"type": "mogdb",
"connect": {
"version": "",
"host": "127.0.0.1",
"user": "co",
"port": 26000,
"password": "Enmo@123",
"dbName": "postgres",
"charset": "",
"clientCharset": ""
}
},
"object": {
"tables": [],
"schemas": ["co"],
"excludeTable": {},
"tableSplit": {}
},
"dataOnly": false,
"schemaOnly": false
}

For this tutorial, we plan to migrate all objects in the “CO” schema from the Oracle database to the same user in MogDB. For testing purposes, we will not create a new database in MogDB. Instead, we will create a new user “co” in the default database, postgres.

Run the following command to create the user:

1
2
3
4
5
6
7
[omm@altlinux10 ~]$ gsql -d postgres -p 26000 -r
gsql ((MogDB 3.0.2 build 9bc79be5) compiled at 2022-09-18 00:37:49 commit 0 last mr )
Non-SSL connection (SSL connection is recommended when requiring high-security)
Type "help" for help.

MogDB=# create user co identified by "Enmo@123";
CREATE ROLE

Start migration

Now, we can start migration.

1
./mtk -c ora2mogdb/config/mtk.json

Check migration report

The migration result report will be generated in the project report directory, available in both plain text and HTML formats. For simplicity, I have included the text format result in this tutorial.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
-----------------------
ObjectName Type Summary
-----------------------

+------------------+-------------------+-------------------+--------+-----------+-------------+-------------+-------------+--------------------|-------------+
| Type | StartTime | EndTime | Status | Total Num | Success Num | Warring Num | Failed Num |Failed(Invalid) Num | Time |
+------------------+-------------------+-------------------+--------+-----------+-------------+-------------+-------------+--------------------|-------------+
|Schema |2022-09-24 15:12:36|2022-09-24 15:12:36|finish |1 |1 |0 |0 |0 |282 ms |
|Sequence |2022-09-24 15:12:36|2022-09-24 15:12:36|finish |0 |0 |0 |0 |0 |210 ms |
|ObjectType |2022-09-24 15:12:36|2022-09-24 15:12:36|finish |0 |0 |0 |0 |0 |356 ms |
|Queue |2022-09-24 15:12:36|2022-09-24 15:12:37|finish |0 |0 |0 |0 |0 |177 ms |
|Table |2022-09-24 15:12:37|2022-09-24 15:12:47|finish |7 |7 |0 |0 |0 |9 s 952 ms |
|TableData |2022-09-24 15:12:47|2022-09-24 15:12:53|finish |7 |7 |0 |0 |0 |6 s 743 ms |
|Index |2022-09-24 15:12:53|2022-09-24 15:12:53|finish |7 |7 |0 |0 |0 |1 ms |
|Constraint |2022-09-24 15:12:53|2022-09-24 15:12:53|finish |24 |23 |0 |1 |0 |51 ms |
|DBLink |2022-09-24 15:12:53|2022-09-24 15:12:53|finish |0 |0 |0 |0 |0 |67 ms |
|View |2022-09-24 15:12:53|2022-09-24 15:12:54|finish |4 |2 |0 |2 |0 |723 ms |
|MaterializedView |2022-09-24 15:12:54|2022-09-24 15:12:54|finish |0 |0 |0 |0 |0 |138 ms |
|Function |2022-09-24 15:12:54|2022-09-24 15:12:54|finish |0 |0 |0 |0 |0 |113 ms |
|Procedure |2022-09-24 15:12:54|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |109 ms |
|Package |2022-09-24 15:12:55|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |77 ms |
|Trigger |2022-09-24 15:12:55|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |404 ms |
|Synonym |2022-09-24 15:12:55|2022-09-24 15:12:55|finish |0 |0 |0 |0 |0 |74 ms |
|TableDataCom |2022-09-24 15:12:55|2022-09-24 15:12:56|finish |7 |7 |0 |0 |0 |810 ms |
|AlterSequence |2022-09-24 15:12:56|2022-09-24 15:12:56|finish |0 |0 |0 |0 |0 |71 ms |
|CollStatistics |2022-09-24 15:12:56|2022-09-24 15:12:56|finish |7 |7 |0 |0 |0 |29 ms |
+------------------+-------------------+-------------------+--------+-----------+-------------+-------------+-------------+--------------------|-------------+

We can see that all tables and table data have been successfully migrated to MogDB without any errors. However, there is one failed constraint and two failed views.

The failed constraint is a JSON check constraint, which is not supported by MogDB.

image-20220924153609138

The failed views are due to the grouping_id function and the json_table function, which are not yet implemented in MogDB ().

SQL ERROR
image-20220924154621375 pq: function grouping_id(character varying, character varying) does not exist
image-20220924154529308 pq: syntax error at or near “columns”

Check migration result

Run the sample queries to ensure that all the data has been migrated to MogDB without any errors.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
/* 5 products with the highest revenue
With their corresponding order rank */
select p.product_name,
count(*) number_of_orders,
sum ( oi.quantity * oi.unit_price ) total_value,
rank () over (
order by count(*) desc
) order_count_rank
from products p
join order_items oi
on p.product_id = oi.product_id
group by p.product_name
order by sum ( oi.quantity * oi.unit_price ) desc
fetch first 5 rows only;
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
MogDB=> select p.product_name,
MogDB-> count(*) number_of_orders,
MogDB-> sum ( oi.quantity * oi.unit_price ) total_value,
MogDB-> rank () over (
MogDB(> order by sum ( oi.quantity * oi.unit_price ) desc
MogDB(> ) revenue_rank
MogDB-> from products p
MogDB-> join order_items oi
MogDB-> on p.product_id = oi.product_id
MogDB-> group by p.product_name
MogDB-> order by count(*) desc
MogDB-> fetch first 5 rows only;
product_name | number_of_orders | total_value | revenue_rank
-----------------------+------------------+-------------+--------------
Girl's Trousers (Red) | 148 | 15794.76 | 1
Boy's Hoodie (Grey) | 100 | 3754.08 | 35
Men's Pyjamas (Blue) | 100 | 3274.61 | 36
Men's Coat (Red) | 98 | 4230.30 | 31
Boy's Socks (White) | 98 | 3081.12 | 38
(5 rows)

Conclusion

Migrating tables, table data, and indexes from Oracle to MogDB typically proceeds without issues. However, for views, procedures, functions, and packages, some modifications to the source code may be necessary. With the ongoing development of Oracle compatibility in MogDB, we believe that the need for such modifications will decrease over time, making the migration process from Oracle to MogDB even smoother.

Read More

Ensuring Data Integrity: How Major Databases Handle Partial Writes and Atomic Operations

2024-05-30

In the world of databases, ensuring atomicity in write operations is crucial. This post explores the importance of atomic writes and how various databases handle potential issues like partial writes.

Understanding Partial Writes

In the I/O write chain, the size of atomic operations can vary. An operation at a higher level may be split into multiple operations at a lower level, leading to scenarios where some operations succeed while others fail. This phenomenon is known as a partial write.

  • Legacy Disk Drives: Default sector size is 512 bytes.
  • Modern Disk Drives: Advanced format sector size is 4k.
  • Filesystem Page Size: For example, ext4 has a default page size of 4k.
  • Database Page Sizes:
    • Oracle: 8K
    • MySQL: 16K
    • PostgreSQL: 8K
    • openGauss: 8K

The Reality of Partial Writes

While theoretically, databases must handle partial writes, in practice, such events are quite rare. The critical aspect is not just preventing partial writes but ensuring robust mechanisms for data integrity checks (checksums) and recovery when issues do arise.

How Oracle Handles Partial Writes

Oracle’s approach to partial writes is pragmatic:

  1. Automatic Repair: If a fractured block is detected, Oracle tries to repair it using the corresponding block from Active Data Guard (ADG) or the flashback log.
  2. Error Reporting: If automatic repair fails, an ORA-01578 error is logged, prompting the DBA to perform a media recovery.
  3. Block-Level Recovery: Oracle supports block-level recovery, making the process less cumbersome and time-consuming.

Oracle’s backup sets are designed to handle partial writes:

  • Using OS Commands: When backing up data files using OS commands, Oracle requires the tablespace to be in backup mode. Any changes to data blocks are logged in the redo log, ensuring a clean block can be restored during recovery.
  • Using RMAN: RMAN handles partial writes by ensuring that each backup block is clean, eliminating the need to worry about fractured blocks.
  • Block Checking: Oracle provides mechanisms like DB_BLOCK_CHECKING for comprehensive checksums.

How MySQL Handles Partial Writes

MySQL employs a double write buffer to safeguard against partial writes.

  1. Double Write Buffer: Despite its name, the double write buffer is not stored in memory; instead, it consists of persistent files on disk. The mechanism works as follows: before writing a data block to its final location on disk, MySQL first writes it to the double write buffer.
  2. Data Integrity: This approach ensures that there is always a clean copy of the block available—either in the double write buffer or in the data files—allowing for safe redo operations.

However, this method is not without its challenges. One of the primary concerns is its efficiency, as it effectively doubles the I/O workload for redo logging. This has sparked ongoing discussions about the trade-offs and effectiveness of using a double write buffer to address partial writes.

How PostgreSQL Handles Partial Writes

PostgreSQL employs full page writes to manage partial writes effectively:

  1. Full Page Writes: When a data block is updated for the first time after a checkpoint, the entire block is written to the WAL (Write-Ahead Logging) log. This is akin to the database always being in “Backup status,” similar to issuing the alter tablespace x begin backup command in an Oracle database.
  2. Data Integrity: This mechanism ensures that even if a partial write occurs, the database remains consistent because the original page can be used for redo operations.

There has been community discussion about adopting a double write buffer mechanism, but it remains an area of interest rather than active development. For more details, you can check out these discussions:

How MogDB Handles Partial Writes

MogDB (the enterprise edition of openGauss) combines strategies from MySQL and PostgreSQL:

  1. Double Write Buffer: Similar to MySQL, MogDB uses a double write buffer by setting enable_double_write=on.
  2. Full Page Writes: MogDB retains PostgreSQL’s full page write mechanism by setting full_page_writes=on.

Depending on the configuration of incremental checkpoints (enable_incremental_checkpoint):

  • Incremental Checkpoints Off:
    • With full_page_writes=on, full page writes are used.
    • With full_page_writes=off, partial writes are not a concern.
  • Incremental Checkpoints On:
    • With enable_double_write=on, double writes are used.
    • With enable_double_write=off, the behavior depends on full_page_writes settings.

Conclusion

Ensuring atomic writes in databases is essential for data integrity and consistency. While partial writes are rare, robust mechanisms for detection and recovery are vital. Oracle, MySQL, PostgreSQL, and MogDB/openGauss each have unique approaches to handling these scenarios, ensuring that data remains reliable and recoverable.

Read More

How to build blog site by using Hexo in Windows WSL and deploy to GitHub Pages

2024-05-27

Why use WSL

Using Windows Subsystem for Linux (WSL) instead of relying solely on Windows can offer several advantages, particularly for developers and IT professionals. WSL allows users to run a GNU/Linux environment directly on Windows, without the overhead of a traditional virtual machine or dual-boot setup. This enables seamless access to a wide range of Linux tools and utilities, which are often preferred for development tasks, scripting, and system administration. Additionally, WSL provides a more consistent and familiar environment for those accustomed to Unix-based systems, facilitating smoother workflows and integration with cloud-based services. By leveraging WSL, professionals can enjoy the best of both worlds: the robust software ecosystem of Windows and the powerful command-line capabilities of Linux.

Why use Hexo

Hexo offers several advantages over WordPress, particularly for developers and tech-savvy users. As a static site generator, Hexo provides faster load times and improved security since it does not rely on a database or server-side processing, which are common vulnerabilities in WordPress. Additionally, Hexo allows for greater customization through its use of Markdown and extensive plugin ecosystem, enabling users to tailor their sites to specific needs without the overhead of a complex content management system. Furthermore, Hexo’s deployment process is streamlined, often integrating seamlessly with version control systems like Git, making it an excellent choice for those who prioritize efficiency and performance in their web development projects.

Why use Github Pages

Using GitHub Pages instead of a traditional hosting server offers several distinct advantages. Firstly, GitHub Pages provides a seamless integration with GitHub repositories, enabling automatic deployment of websites directly from your codebase. This integration ensures that updates and changes to your site are effortlessly managed through version control, promoting a streamlined workflow. Additionally, GitHub Pages is cost-effective, offering free hosting with custom domain support, which can significantly reduce overhead costs for personal projects or small businesses. The platform also boasts robust security features, leveraging GitHub’s infrastructure to protect your site from common vulnerabilities. Furthermore, the simplicity and ease of use make GitHub Pages an attractive option for developers who want to focus on writing code rather than managing server configurations and maintenance. Overall, GitHub Pages combines efficiency, cost savings, and security, making it an excellent choice for hosting static websites and project documentation.

GO!

Assume already has a WSL environment running

If not, refer to “How to install Linux on Windows with WSL“.

To say we are running WSL distro - Ubuntu 22.04

1
2
3
4
5
6
7
8
9
10
11
12
13
$ cat /etc/os-release
PRETTY_NAME="Ubuntu 22.04.4 LTS"
NAME="Ubuntu"
VERSION_ID="22.04"
VERSION="22.04.4 LTS (Jammy Jellyfish)"
VERSION_CODENAME=jammy
ID=ubuntu
ID_LIKE=debian
HOME_URL="https://www.ubuntu.com/"
SUPPORT_URL="https://help.ubuntu.com/"
BUG_REPORT_URL="https://bugs.launchpad.net/ubuntu/"
PRIVACY_POLICY_URL="https://www.ubuntu.com/legal/terms-and-policies/privacy-policy"
UBUNTU_CODENAME=jammy

Install Node.js

Since Hexo is written by Node.js, nodejs must be installed.

1
2
sudo apt update
sudo apt install nodejs npm

Install GIT

Since Hexo uses Git to publish pages on GitHub, Git must be installed.

1
sudo apt install git

Install Hexo

1
sudo npm install -g hexo-cli

Init Hexo project

Our goal is to deploy the static HTML file generated by Hexo to GitHub Pages. To achieve this, we need to create a repository named “[username].github.io” on GitHub. Therefore, we will initialize this directory directly using Hexo. Note: Be sure to replace [username] with your actual GitHub username.

1
hexo init kamusis.github.io

Create Github pages

Creating GitHub Pages is very simple, reference: https://pages.github.com/

Connecting to Github with SSH

Reference: https://docs.github.com/en/authentication/connecting-to-github-with-ssh

Basically, generate a key using the ssh-keygen command and upload the public key to the GitHub.

Once the SSH connection is established, use the following command to verify it:

1
2
$ ssh -T git@github.com
Hi kamusis! You've successfully authenticated, but GitHub does not provide shell access.

Set URL for your new blog

1
2
3
4
5
6
$ vi _config.yml

~~~~~~~~~~~~~~~~~~ _config.yml ~~~~~~~~~~~~~~~~~~
# URL
## Set your site url here. For example, if you use GitHub Page, set url as 'https://username.github.io/project'
url: http://kamusis.github.io

Set Git information to let hexo can push contents into Github Pages

1
2
3
4
5
6
7
8
9
10
$ npm install hexo-deployer-git --save
$ vi _config.yml

~~~~~~~~~~~~~~~~~~ _config.yml ~~~~~~~~~~~~~~~~~~
# Deployment
## Docs: https://hexo.io/docs/one-command-deployment
deploy:
type: git
repo: git@github.com:kamusis/kamusis.github.io.git
branch: master

Generate the site and push to Github pages

1
2
$ hexo clean
$ hexo deploy

Assuming everything has been set up correctly, you can now view a hello-world article on a website by navigating to https://[username].github.io.

Write your articles and publish

1
hexo new "My first post"

A markdown file will automatically be created and placed in the source/_posts directory. It can then be edited and saved using your preferred text or markdown editor.

Of course, you can also clean up the initial hello-world article.

1
rm source/_posts/hello-world.md

Publish the articles.

1
2
$ hexo clean
$ hexo deploy

Reference:

https://hexo.io/docs/github-pages#One-command-deployment

https://gist.github.com/btfak/18938572f5df000ebe06fbd1872e4e39

Read More

How to Find the Corresponding Session in MogDB/openGauss from OS Thread ID

2024-05-27

Diagnostic Needs

When the MogDB database consumes a significant amount of system resources, such as nearly 100% CPU usage, how can we determine which session(s) in the database are using these resources?

In Oracle databases, diagnosing such issues typically involves associating v$session, v$process, and the OS process ID found using the top or ps commands. However, MogDB uses a thread model, and only one process ID is visible at the OS level. So, how do we pinpoint the problem?

Since MogDB uses a thread model, unlike PostgreSQL’s process model, checking network port access with the lsof command at the OS level will show multiple user sessions connected, but only the process ID will be displayed in the PID column, not the thread.

1
2
3
4
5
6
7
8
9
10
$ lsof -i 4 -a -p `pgrep -u omm3 mogdb`
COMMAND PID USER FD TYPE DEVICE SIZE/OFF NODE NAME
mogdb 12027 omm3 8u IPv4 20313752 0t0 TCP *:biimenu (LISTEN)
mogdb 12027 omm3 9u IPv4 20313753 0t0 TCP *:18001 (LISTEN)
mogdb 12027 omm3 325u IPv4 28320946 0t0 TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45206 (ESTABLISHED)
mogdb 12027 omm3 330u IPv4 28316174 0t0 TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45208 (ESTABLISHED)
mogdb 12027 omm3 336u IPv4 28302815 0t0 TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45210 (ESTABLISHED)
mogdb 12027 omm3 340u IPv4 28323140 0t0 TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45212 (ESTABLISHED)
mogdb 12027 omm3 360u IPv4 28323141 0t0 TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45214 (ESTABLISHED)
mogdb 12027 omm3 375u IPv4 28305050 0t0 TCP mogdb-kernel-0004:biimenu->softbank060132034173.bbtec.net:45216 (ESTABLISHED)

How to Get Thread ID

You can use htop. After opening htop, press F5 to display the process tree. The first PID is the process ID, and each line under the tree structure shows the corresponding thread ID for that process.

img

You can also use the ps command. The -L parameter displays threads, and the -o parameter specifies the columns of interest.

1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
# ps -Lp `pgrep -u omm3 mogdb` -o %cpu,tid,pid,ppid,cmd,comm
%CPU TID PID PPID CMD COMMAND
0.0 17847 17847 1 /opt/mogdb3/app/bin/mogdb - mogdb
0.0 17848 17847 1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
0.0 17854 17847 1 /opt/mogdb3/app/bin/mogdb - mogdb
0.0 17855 17847 1 /opt/mogdb3/app/bin/mogdb - syslogger
0.0 17856 17847 1 /opt/mogdb3/app/bin/mogdb - reaper
0.0 17857 17847 1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
0.0 17858 17847 1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
0.0 17860 17847 1 /opt/mogdb3/app/bin/mogdb - jemalloc_bg_thd
0.0 17884 17847 1 /opt/mogdb3/app/bin/mogdb - checkpointer
0.0 17885 17847 1 /opt/mogdb3/app/bin/mogdb - Spbgwriter
0.1 17886 17847 1 /opt/mogdb3/app/bin/mogdb - pagewriter
0.0 17887 17847 1 /opt/mogdb3/app/bin/mogdb - pagewriter
0.0 17888 17847 1 /opt/mogdb3/app/bin/mogdb - pagewriter
0.0 17889 17847 1 /opt/mogdb3/app/bin/mogdb - pagewriter
0.0 17890 17847 1 /opt/mogdb3/app/bin/mogdb - pagewriter
0.8 17891 17847 1 /opt/mogdb3/app/bin/mogdb - WALwriter
0.0 17892 17847 1 /opt/mogdb3/app/bin/mogdb - WALwriteraux
0.0 17893 17847 1 /opt/mogdb3/app/bin/mogdb - AVClauncher
0.0 17894 17847 1 /opt/mogdb3/app/bin/mogdb - Jobscheduler
0.0 17895 17847 1 /opt/mogdb3/app/bin/mogdb - asyncundolaunch
0.0 17896 17847 1 /opt/mogdb3/app/bin/mogdb - globalstats
0.0 17897 17847 1 /opt/mogdb3/app/bin/mogdb - applylauncher
0.0 17898 17847 1 /opt/mogdb3/app/bin/mogdb - statscollector
0.0 17899 17847 1 /opt/mogdb3/app/bin/mogdb - snapshotworker
0.1 17900 17847 1 /opt/mogdb3/app/bin/mogdb - TrackStmtWorker
0.0 17901 17847 1 /opt/mogdb3/app/bin/mogdb - 2pccleaner
0.0 17902 17847 1 /opt/mogdb3/app/bin/mogdb - faultmonitor
0.0 17904 17847 1 /opt/mogdb3/app/bin/mogdb - undorecycler
0.0 18372 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18373 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18374 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18375 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18376 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18377 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18378 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18379 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18380 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18381 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18382 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 18454 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 19475 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 19480 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 29529 17847 1 /opt/mogdb3/app/bin/mogdb - worker
0.0 30999 17847 1 /opt/mogdb3/app/bin/mogdb - worker

The rows in the comm column that display as “worker” are backend processes of user sessions in the database. Typically, user sessions that consume high CPU can be filtered to show only user sessions using the grep command.

How to Map OS Thread ID to Database Session

Suppose on this server, the thread with ID 18372 is consuming a lot of CPU. In MogDB, you can query the pg_os_threads view to find the session ID corresponding to this thread.

1
2
3
4
5
6
MogDB=# select * from pg_os_threads where lwpid=18372;

node_name | pid | lwpid | thread_name | creation_time
----------+------------------+-------+-------------+------------------------------
dn_6001 | 140545137571584 | 18372 | dn_6001 | 2022-05-30 19:54:42.459129+08
(1 row)

The pg_os_threads view records the relationship between lightweight thread IDs and session IDs. The lwpid column is the OS thread ID, and the pid column is the database session ID. For detailed information, refer to the MogDB documentation on PG_OS_THREADS.

If you have monadmin privileges, you can also query the os_threads view in the dbe_perf schema, which provides the same information.

After finding the database session ID, you can perform various actions, such as querying the dbe_perf.session_stat_activity view to get the application name, client IP address, and the SQL query being executed by that session.

1
2
3
4
5
6
MogDB=# select application_name, client_addr, query from dbe_perf.session_stat_activity where pid=140545137571584;

application_name | client_addr | query
-----------------+-------------+---------------------------------------------
dn_6001 | 172.16.0.176| SELECT cfg_value FROM bmsql_config WHERE cfg_name = $1
(1 row)

You can also query the dbe_perf.thread_wait_status view to get the current wait event of the session. Note that this view contains the lwtid field, which directly corresponds to the thread ID.

1
2
3
4
5
6
MogDB=# select lwtid, wait_status, wait_event from dbe_perf.thread_wait_status where sessionid=140545137571584;

lwtid | wait_status | wait_event
------+-------------+-----------
18372 | wait cmd | wait cmd
(1 row)

Read More